Putting “Automation” in Automated Machine Learning

September 25, 2020

by

· 4 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about the DataRobot AI Platform, data science, and more.

As you think about what automated machine learning platforms should provide in order for you to take advantage of the latest developments in AI, it’s useful to look at some of the typical steps that data scientists themselves go through when they build machine learning models.

This article will walk you through examples of everything involved in the modeling process, along with highlights of the automation built into the DataRobot platform that get you up and running with ML. (Note: this article is adapted from a session at DataRobot’s AI Experience Worldwide Conference.)

1. Format data inputs

Data preprocessing

After assembling a machine learning dataset together from various data sources, there is still additional data processing required to use it for model building; DataRobot automates this processing with blueprints that are generated dynamically according to each project you create.

DataRobot can handle all kinds of data types, including text, images, and geospatial. The blueprints will automatically identify and process whatever type of variables you’ve included in your dataset.

Feature selection

Features are the columns in your dataset that are used to detect patterns related to your target outcome. After you upload a dataset, DataRobot’s automation will create feature lists that help you select the best features for model building.

Feature engineering

In order for models to be successful, they need input features that have potentially useful signals. The process of creating features to improve your models is known as feature engineering. In addition to the feature engineering that’s automated within blueprints, DataRobot also generates additional features for date variables, such as “day of the week,” “day of the month,” etc.

2. Ensure that reliable patterns are found

Data partitioning & model validation

To ensure that models are learning reliable patterns in the data, they need to be built or “trained” on historical examples that are not the same as the examples they are tested or “validated” on. As part of its automated guardrails, DataRobot separates (or “partitions”) the rows in your uploaded dataset to prevent models from simply memorizing the examples they’re trained on.

Once you have separate partitions within your dataset for building and validating, you can then measure how well a model is capturing patterns in the data and compare performance across different models.

3. Select and evaluate model options

Ranked models & evaluation metrics

After you start a DataRobot project in Autopilot mode, the automation runs a data science competition on your dataset to produce a leaderboard of models ranked by your preferred evaluation metric.

When building models, you don’t usually know ahead of time which machine learning algorithms will work the best, so you want to try out a variety of different approaches and let the best options bubble up to the top.

Model tuning

Every model on the Leaderboard is also “tuned” automatically, ensuring that the best settings are used when searching for patterns between input features and the target.

4. Interpret and understand what was learned

Model visualizations & interpretation tools

DataRobot doesn’t automatically interpret your models for you, but it does provide the interpretation tools and visualizations you’ll need to investigate and understand your modeling results.

Whether you want to evaluate how accurately a model is capturing patterns related to the target, or you just want to understand what the discovered relationships look like, the interpretation tools provide all you need to gather insights from your data and explain them to your colleagues.

5. Document the process

Compliance documentation & downloadable assets

All of the charts and visualizations in DataRobot can be downloaded, along with the data that was used to create them.

With DataRobot’s auto-generated compliance documentation, you can also download a full writeup of everything that went into the model building process. From there you just have to fill in the details specific to you and your company, such as how a model will be used in your business operations and who the various stakeholders are.

Next Steps

Even with the level of automation that DataRobot provides, it’s still up to you to use your analytical skills to investigate what a model has learned and interpret the insights that were discovered.

If you don’t feel like you have those skills, or you would like to refresh or increase your ML/AI analytical skills, check out the DataRobot University library of educational resources.

Also, if you’re not sure how to identify good opportunities to solve with AI in the first place, browse through some of the common use cases for your industry at DataRobot Pathfinder and see how they might relate to you.

To find the latest discussions—and to ask your own questions—look in the DataRobot Community.

About the author

Linda Haviland

Community Manager

Meet Linda Haviland

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.