Building AI with AutoML and Composable ML
As they strive to improve models, data scientists continually try new approaches to refine their predictions. To help data scientists experiment faster, DataRobot has added Composable ML to automated machine learning. This allows data science teams to incorporate any machine learning algorithm or feature engineering method and seamlessly combine them with hundreds of built-in methods. After adding the preferred code, teams can take advantage of the existing DataRobot capabilities, such as metrics, explainability, visualizations, deployment, monitoring, collaboration, and governance.
Composable ML is currently available through a private beta program, and I want to share what we see successful users doing. A common pattern is that they adopt a three-step model development process:
- Prepare the initial dataset and leverage automation, including automated feature engineering, to get an accurate baseline model in minutes.
- Iterate based on results from step one to further improve results. Iteration is often based on specific domain knowledge or the desire to try a new cutting-edge approach. Composable ML then lets you add new types of feature engineering or build entirely new models.
- Get a buy-in from stakeholders and deploy it into production in order to use the best model. The new, customized model can take advantage of the existing DataRobot code base for explainability and deployment.
So let’s dig in!
Step 1: Use Automation to Perform the Initial Feature Engineering and Modeling
In data science, the best results come through experimentation. To help data scientists experiment effectively and quickly, DataRobot offers a suite of built-in automation capabilities that tries out various machine learning algorithms and feature engineering, performs feature selection, and quickly surfaces what works the best.
Run Automated Feature Discovery
Automated Feature Discovery helps to build and discover important features in complex schemas. Register your datasets in AI Catalog, specify relations between those datasets, and Automated Feature Discovery will automatically generate features based on one-to-many relationships, perform feature selection, and make the engineered set of features available for modeling and deployment. It can even perform search for interactions and aggregate transaction data without introducing target leakage.
Autopilot runs on top of features engineered by Automated Feature Discovery or raw data, and quickly tries out various machine learning algorithms (“blueprints”) to see what works the best. Blueprints generated by Autopilot include not only modeling but also preprocessing steps to automatically incorporate signals from numeric, categorical, text, time, image, and geospatial features. As has been demonstrated repeatedly with our customers, accuracy of the top models generated by Autopilot tends to be on par with the best-in-class custom models.
Step 2: Iterate to Get Better Results
When Automated Feature Discovery and Autopilot finish, you will have a list of features and models they generated. Data scientists can now either select one of the models to deploy or try to get better results by tweaking the feature list, data, or modeling algorithms.
Leverage Built-in Insights to Debug Models
DataRobot offers over 30 built-in insights that can help you understand and debug models. These are the tools that help to detect a partial target leakage, identify missing features, or confirm if a model follows business rules. You can then use it to manually add or remove columns that go into modeling. DataRobot insights include a suite of model-agnostic visualizations so that you can get insights for any modeling algorithm.
New: Train a Custom Modeling Algorithm and Compare Results
Advanced data scientists can also leverage Composable ML to build and train custom blueprints. Perhaps you want to try out alternative preprocessing steps or different modeling steps? With Composable ML, you can either customize blueprints generated by Autopilot or build your own blueprints from scratch using built-in and custom steps. You can train the blueprints on top of features generated by Automated Feature Discovery, thus saving time spent on feature engineering.
In addition to using built-in tasks, you can use R or Python to define custom tasks, and you can even combine Python and R in the same blueprint. You can install any required dependency and define your own Docker container if required.
Select the Best Model
With DataRobot, you will end up with multiple different models (built by Autopilot or using your own algorithms), and you’ll need to select the best one based on accuracy, speed, or explainability requirements. DataRobot offers a few capabilities that help to make the decision:
- Leaderboard helps to track modeling iterations and compare accuracy. It ensures that the models are compared on exactly the same validation or holdout data.
- Model comparison provides a visual way to compare models, using Profit Curve, ROC, and Lift Charts.
- Speed vs. Accuracy helps to identify the most accurate model that is fast enough for your use case.
Step 3: Get the Best Models into Production
Once the best model has been identified, whether it has been generated using automated machine learning or Composable ML, DataRobot makes it easier to get a buy-in from stakeholders and bring it into production.
Use Insights and Compliance Documentation to Get a Buy-in from Stakeholders
DataRobot Insights make it easy to explain to business stakeholders how models work. A suite of model-agnostic techniques allows the interpretation of any model, even when you use sophisticated machine learning techniques or ensembles.
Automated Compliance documentation streamlines the creation of model documentation for model validators and regulators. In regulated industries such as banking and insurance, it’s commonly required that models are documented in detail before they can be approved and used. DataRobot automates the documentation creation using Automated Compliance documentation so that it can be generated in just a few clicks for any model. You can even use your own templates if required.
Deploy, Monitor, and Manage Models with MLOps
DataRobot MLOps offers state-of-the-art capabilities for model deployment and monitoring. You can deploy your new model to your MLOps production environment-of-choice in just a couple of clicks. You get highly specialized centralized model monitoring, lifecycle management, and retraining–all on our world-class, enterprise-grade MLOps platform.
All right, that’s it. That’s what our most successful users are doing. Hope it resonates and is insightful. And if you’re a DataRobot user already, do not hesitate to reach out and sign up to the private beta program for Composable ML! If you are new to DataRobot, check out the AI Experience session recording or contact us to see DataRobot in action.