The Best Features Fast
Feature engineering is one of the most critical tasks in data science. The features you create often determine the success or failure of your machine learning projects. However, when performed manually, feature engineering is time-consuming and laborious. Highly complex data preparation activities are repeated over and over for each new feature generated. Feature engineering also requires careful validation to ensure you have not introduced errors into the process.
DataRobot’s Feature Discovery accelerates feature engineering through the automation of expert data science best practices. It uses the relationships across your data sources and within complex data schemas, to intelligently generate the right features for your models to significantly improve their overall performance.
Visual and Intuitive
Feature Discovery unlocks the art of advanced feature engineering for data scientists, data engineers, and business analysts. Using DataRobot's visual relationship editor, you can select all of the datasets you want to use in your project then quickly declare relationships between the datasets with just a few clicks. DataRobot even suggests joins for you when you don't know the relationships in advance. Feature Discovery makes it incredibly easy for anyone to define very complex data schemas upon which to perform automated feature engineering in minutes.
Built-In Awareness of Time
DataRobot Feature Discovery is fully time-aware. If your datasets are temporal in nature, you can set derivation windows to control how much history should be used when calculating new features. For example, you can tell DataRobot to only consider 30 days of history when predicting airline delays by flight number. Feature Discovery also has built in guardrails that avoid common leakage problems such as ensuring future data is excluded when generating new features.
Practical, Explainable, and Traceable
Like every automated capability in the DataRobot AI Platform, Feature Discovery is incredibly transparent. You can visualize, and explore every feature generated to understand predictive potential. Full lineage is also available for every feature created for traceability and auditing purposes. You can access detailed logs to know exactly which features were explored, discarded, and generated, and you can download the full training dataset, with all the new features derived for further analysis and use in other applications.
We used the DataRobot Feature Discovery tool on our high-frequency physiological data, which helped us find some new features that ended up being high impact throughout all of our subsequent analysis.