DataRobot PartnersUnify all of your data, ETL and AI tools in our open platform with our Technology Partners, extend your cloud investments with our Cloud Partners, and connect with DataRobot Services Partners to help you build, deploy or migrate to the DataRobot AI Platform.
As you think about what automated machine learning platforms should provide in order for you to take advantage of the latest developments in AI, it’s useful to look at some of the typical steps that data scientists themselves go through when they build machine learning models.
This article will walk you through examples of everything involved in the modeling process, along with highlights of the automation built into the DataRobot platform that get you up and running with ML. (Note: this article is adapted from a session at DataRobot’s AI Experience Worldwide Conference.)
1. Format data inputs
Data preprocessing
After assembling a machine learning dataset together from various data sources, there is still additional data processing required to use it for model building; DataRobot automates this processing with blueprints that are generated dynamically according to each project you create.
Blueprints in DataRobot
DataRobot can handle all kinds of data types, including text, images, and geospatial. The blueprints will automatically identify and process whatever type of variables you’ve included in your dataset.
Feature selection
Features are the columns in your dataset that are used to detect patterns related to your target outcome. After you upload a dataset, DataRobot’s automation will create feature lists that help you select the best features for model building.
Feature Selection
Feature engineering
In order for models to be successful, they need input features that have potentially useful signals. The process of creating features to improve your models is known as feature engineering. In addition to the feature engineering that’s automated within blueprints, DataRobot also generates additional features for date variables, such as “day of the week,” “day of the month,” etc.
2. Ensure that reliable patterns are found
Data partitioning & model validation
To ensure that models are learning reliable patterns in the data, they need to be built or “trained” on historical examples that are not the same as the examples they are tested or “validated” on. As part of its automated guardrails, DataRobot separates (or “partitions”) the rows in your uploaded dataset to prevent models from simply memorizing the examples they’re trained on.
Data partitioning
Once you have separate partitions within your dataset for building and validating, you can then measure how well a model is capturing patterns in the data and compare performance across different models.
3. Select and evaluate model options
Ranked models & evaluation metrics
After you start a DataRobot project in Autopilot mode, the automation runs a data science competition on your dataset to produce a leaderboard of models ranked by your preferred evaluation metric.
Leaderboard of Metrics in DataRobot
When building models, you don’t usually know ahead of time which machine learning algorithms will work the best, so you want to try out a variety of different approaches and let the best options bubble up to the top.
Model tuning
Every model on the Leaderboard is also “tuned” automatically, ensuring that the best settings are used when searching for patterns between input features and the target.
Whether you want to evaluate how accurately a model is capturing patterns related to the target, or you just want to understand what the discovered relationships look like, the interpretation tools provide all you need to gather insights from your data and explain them to your colleagues.
Prediction Explanations in DataRobot
5. Document the process
Compliance documentation & downloadable assets
All of the charts and visualizations in DataRobot can be downloaded, along with the data that was used to create them.
Export of Charts and Visualizations
With DataRobot’s auto-generated compliance documentation, you can also download a full writeup of everything that went into the model building process. From there you just have to fill in the details specific to you and your company, such as how a model will be used in your business operations and who the various stakeholders are.
Model Compliance Documentation
Next Steps
Even with the level of automation that DataRobot provides, it’s still up to you to use your analytical skills to investigate what a model has learned and interpret the insights that were discovered.
If you don’t feel like you have those skills, or you would like to refresh or increase your ML/AI analytical skills, check out the DataRobot University library of educational resources.
Also, if you’re not sure how to identify good opportunities to solve with AI in the first place, browse through some of the common use cases for your industry at DataRobot Pathfinder and see how they might relate to you.
To find the latest discussions—and to ask your own questions—look in the DataRobot Community.
DataRobot University
Enroll in a Course to Learn to use DataRobot to Solve Your Business Needs