Feature Discovery header

Feature Discovery

More accuracy from powerful features automatically discovered in complex data schemas and multiple related data sources.

Feature Discovery is automated feature engineering taken to a new level. From compex data schemas and datasets within different source systems, DataRobot automatically discovers, tests, and creates hundreds of valuable new features for your machine learning models, dramatically improving their accuracy, and deepening your overall understanding of the problem at hand.

Next-Generation Automated Feature Engineering

The Best Features Fast

Feature engineering is one of the most critical tasks in data science. The features you create often determine the success or failure of your machine learning projects. However, when performed manually, feature engineering is time-consuming and laborious. Highly complex data preparation activities are repeated over and over for each new feature generated. Feature engineering also requires careful validation to ensure you have not introduced errors into the process.


DataRobot’s Feature Discovery accelerates feature engineering through the automation of expert data science best practices. It uses the relationships across your data sources and within complex data schemas, to intelligently generate the right features for your models to significantly improve their overall performance.

Visual and Intuitive

Feature Discovery unlocks the art of advanced feature engineering for data scientists, data engineers, and business analysts. Using DataRobot's visual relationship editor, you can select all of the datasets you want to use in your project then quickly declare relationships between the datasets with just a few clicks. DataRobot even suggests joins for you when you don't know the relationships in advance. Feature Discovery makes it incredibly easy for anyone to define very complex data schemas upon which to perform automated feature engineering in minutes.

Built-In Awareness of Time

DataRobot Feature Discovery is fully time-aware. If your datasets are temporal in nature, you can set derivation windows to control 
how much history should be used when calculating new features. For example, you can tell DataRobot to only consider 30 days of history when predicting airline delays by flight number. Feature Discovery also has built in guardrails that avoid common leakage problems such as ensuring future data is excluded 
when generating new features.

Practical, Explainable, and Traceable

Like every automated capability in the DataRobot AI Platform, Feature Discovery is incredibly transparent. You can visualize, and explore every feature generated to understand predictive potential. Full lineage is also available for every feature created for traceability and auditing purposes. You can access detailed logs to know exactly which features were explored, discarded, and generated, and you can download the full training dataset, with all the new features derived for further analysis and use in other applications.

Feature Discovery in Action
  • We used the DataRobot Feature Discovery tool on our high-frequency physiological data, which helped us find some new features that ended up being high impact throughout all of our subsequent analysis.
    Dr. Austin Chou, PhD
    Dr. Austin Chou, PhD

    Lead Data Scientist, AI for Good Collaboration, UCSF Brain and Spinal Injury Center Zuckerberg San Francisco General Hospital and Trauma Center

    Related resources

    Start Your Journey to Become and AI-Driven Enterprise Today