The Value of Blended Models in DataRobot

February 18, 2020
by
· 3 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Cloud, data science, and more.

When working with other people, it’s clear we can accomplish more when working together as a team than separately as individuals. Good managers inspire us to work together. But did you also know this applies when using DataRobot? Blending models together actually provides even better models!  In this blog post, I want to illustrate this concept very simply. 

Let’s start with the widely used California Housing dataset. The goal here is to predict a house’s value based on various features like the age, total rooms, and location. You can download the dataset here, with your Kaggle login.

Load this into DataRobot, but don’t hit Start. Instead, let’s set DataRobot to Manual mode for this exercise, then hit Start.

lhaviland 0 1581281713529

If you follow the DataRobot prompts, you should end up in the Repository; there you’ll see the Decision Tree Regressor model.

lhaviland 1 1581281803026

Go ahead and select that model and click Run Task to build it; this will first ‘member’ of our team. Now let’s add a few more team members. These will all be slightly different versions of the original model, created by modifying the random seed. To do this, go to Evaluate > Advanced Tuning and scroll down to the random_state section.

lhaviland 2 1581281841556
lhaviland 3 1581281869157

Type a value of 1, click Update Parameter, and then click Begin Tuning.  In the queue, you will see the new model start to build. Do this a few more times; for example, I built six more models using this approach, and varied the random_state from 1 to 7.

Your Leaderboard should start filling up and show multiple models like this:

lhaviland 1 1581282066563

Now it’s time to bring the team together. Select all the models (check the box to the left of each model’s name) and then under Menu choose Average Blend.

lhaviland 0 1581282032485

You might look at the results and notice that the AVG Blender model is in the middle of the pack. So why is that? Well, remember that the Validation score is just one slice of the validation data. To get a more robust score, we should run Cross Validation. So, for the AVG Blender model, click Run (in the Cross Validation column). This will take a little longer as we are now building lots of models across all of the Validation data.

lhaviland 2 1581282146432

 After some time, you’ll see Cross Validation values.

lhaviland 3 1581282169939

And, Boom! Guess what comes out on top? Yes, our Average blender is now a bit better. This is a very simple illustration of teamwork through blending in DataRobot. The lesson here is by blending models, we can improve accuracy. In fact, many types of algorithms in data science are actually built using ensembling techniques, such as random forest. DataRobot makes it very easy to create these blends. A DataRobot Customer-Facing Data Scientist (CFDS) can explain how DataRobot automatically builds these blended models during Autopilot, and present the various strategies you can use to help improve your models by taking this concept into account.

Related

Find out how to create a blended model in DataRobot by visiting the related article on our public documentation portal.

WEBINAR
See DataRobot in Action

The Next Generation of AI

Watch now
About the author
Rajiv Shah
Rajiv Shah

Data Scientist, DataRobot

Rajiv Shah is a data scientist at DataRobot, where he works with customers to make and implement predictions. Previously, Rajiv has been part of data science teams at Caterpillar and State Farm. He enjoys data science and spends time mentoring data scientists, speaking at events, and having fun with blog posts. He has a PhD from the University of Illinois at Urbana Champaign.

Meet Rajiv Shah
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Thank you

    We will contact you shortly

    Thank You!

    We’re almost there! These are the next steps:

    • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
    • Click the confirmation link to approve your consent.
    • Done! You have now opted to receive communications about DataRobot’s products and services.

    Didn’t receive the email? Please make sure to check your spam or junk folders.

    Close
    Newsletter Subscription
    Subscribe to our Blog