Putting “Automation” in Automated Machine Learning

September 25, 2020
by
· 4 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about the DataRobot AI Platform, data science, and more.

As you think about what automated machine learning platforms should provide in order for you to take advantage of the latest developments in AI, it’s useful to look at some of the typical steps that data scientists themselves go through when they build machine learning models. 

This article will walk you through examples of everything involved in the modeling process, along with highlights of the automation built into the DataRobot platform that get you up and running with ML. (Note: this article is adapted from a session at DataRobot’s AI Experience Worldwide Conference.)

1. Format data inputs 

Data preprocessing

After assembling a machine learning dataset together from various data sources, there is still additional data processing required to use it for model building; DataRobot automates this processing with blueprints that are generated dynamically according to each project you create. 

Blueprints in DataRobot
Blueprints in DataRobot

DataRobot can handle all kinds of data types, including textimages, and geospatial. The blueprints will automatically identify and process whatever type of variables you’ve included in your dataset. 

Feature selection

Features are the columns in your dataset that are used to detect patterns related to your target outcome. After you upload a dataset, DataRobot’s automation will create feature lists that help you select the best features for model building. 

Feature Selection
Feature Selection

Feature engineering

In order for models to be successful, they need input features that have potentially useful signals. The process of creating features to improve your models is known as feature engineering. In addition to the feature engineering that’s automated within blueprints, DataRobot also generates additional features for date variables, such as “day of the week,” “day of the month,” etc.

2. Ensure that reliable patterns are found

Data partitioning & model validation

To ensure that models are learning reliable patterns in the data, they need to be built or “trained” on historical examples that are not the same as the examples they are tested or “validated” on. As part of its automated guardrails, DataRobot separates (or “partitions”) the rows in your uploaded dataset to prevent models from simply memorizing the examples they’re trained on. 

Data partitioning
Data partitioning

Once you have separate partitions within your dataset for building and validating, you can then measure how well a model is capturing patterns in the data and compare performance across different models. 

3. Select and evaluate model options  

Ranked models & evaluation metrics

After you start a DataRobot project in Autopilot mode, the automation runs a data science competition on your dataset to produce a leaderboard of models ranked by your preferred evaluation metric. 

Leaderboard of Metrics in DataRobot
Leaderboard of Metrics in DataRobot

When building models, you don’t usually know ahead of time which machine learning algorithms will work the best, so you want to try out a variety of different approaches and let the best options bubble up to the top. 

Model tuning

Every model on the Leaderboard is also “tuned” automatically, ensuring that the best settings are used when searching for patterns between input features and the target. 

Model Tuning in DataRobot
Model Tuning in DataRobot

4. Interpret and understand what was learned

Model visualizations & interpretation tools

DataRobot doesn’t automatically interpret your models for you, but it does provide the interpretation tools and visualizations you’ll need to investigate and understand your modeling results

ROC Curve in DataRobot
ROC Curve in DataRobot

Whether you want to evaluate how accurately a model is capturing patterns related to the target, or you just want to understand what the discovered relationships look like, the interpretation tools provide all you need to gather insights from your data and explain them to your colleagues. 

Prediction Explanations in DataRobot
Prediction Explanations in DataRobot

5. Document the process

Compliance documentation & downloadable assets

All of the charts and visualizations in DataRobot can be downloaded, along with the data that was used to create them. 

Export of Charts and Visualizations
Export of Charts and Visualizations

With DataRobot’s auto-generated compliance documentation, you can also download a full writeup of everything that went into the model building process. From there you just have to fill in the details specific to you and your company, such as how a model will be used in your business operations and who the various stakeholders are.

lhaviland_9-1600972312889.png
Model Compliance Documentation

Next Steps

Even with the level of automation that DataRobot provides, it’s still up to you to use your analytical skills to investigate what a model has learned and interpret the insights that were discovered. 

If you don’t feel like you have those skills, or you would like to refresh or increase your ML/AI analytical skills,  check out the DataRobot University library of educational resources.

Also, if you’re not sure how to identify good opportunities to solve with AI in the first place, browse through some of the common use cases for your industry at DataRobot Pathfinder and see how they might relate to you.

To find the latest discussions—and to ask your own questions—look in the DataRobot Community

DataRobot University
Enroll in a Course to Learn to use DataRobot to Solve Your Business Needs
Browse Now
About the author
Linda Haviland
Linda Haviland

Community Manager

Meet Linda Haviland
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog