The Value of Blended Models in DataRobot
When working with other people, it’s clear we can accomplish more when working together as a team than separately as individuals. Good managers inspire us to work together. But did you also know this applies when using DataRobot? Blending models together actually provides even better models! In this blog post, I want to illustrate this concept very simply.
Let’s start with the widely used California Housing dataset. The goal here is to predict a house’s value based on various features like the age, total rooms, and location. You can download the dataset here, with your Kaggle login.
Load this into DataRobot, but don’t hit Start. Instead, let’s set DataRobot to Manual mode for this exercise, then hit Start.
If you follow the DataRobot prompts, you should end up in the Repository; there you’ll see the Decision Tree Regressor model.
Go ahead and select that model and click Run Task to build it; this will first ‘member’ of our team. Now let’s add a few more team members. These will all be slightly different versions of the original model, created by modifying the random seed. To do this, go to Evaluate > Advanced Tuning and scroll down to the random_state section.
Type a value of 1, click Update Parameter, and then click Begin Tuning. In the queue, you will see the new model start to build. Do this a few more times; for example, I built six more models using this approach, and varied the random_state from 1 to 7.
Your Leaderboard should start filling up and show multiple models like this:
Now it’s time to bring the team together. Select all the models (check the box to the left of each model’s name) and then under Menu choose Average Blend.
You might look at the results and notice that the AVG Blender model is in the middle of the pack. So why is that? Well, remember that the Validation score is just one slice of the validation data. To get a more robust score, we should run Cross Validation. So, for the AVG Blender model, click Run (in the Cross Validation column). This will take a little longer as we are now building lots of models across all of the Validation data.
After some time, you’ll see Cross Validation values.
And, Boom! Guess what comes out on top? Yes, our Average blender is now a bit better. This is a very simple illustration of teamwork through blending in DataRobot. The lesson here is by blending models, we can improve accuracy. In fact, many types of algorithms in data science are actually built using ensembling techniques, such as random forest. DataRobot makes it very easy to create these blends. A DataRobot Customer-Facing Data Scientist (CFDS) can explain how DataRobot automatically builds these blended models during Autopilot, and present the various strategies you can use to help improve your models by taking this concept into account.
Find out how to create a blended model in DataRobot by visiting the related article on our public documentation portal.