Best Practices for Imbalanced Data and Partitioning

April 20, 2020

· 1 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Platform, data science, and more.

In this two-part learning session we discuss best practices around data partitioning and working with imbalanced datasets.

Five-fold cross-validation is often the silver bullet for partitioning your validation dataset, but there are some dangerous caveats you have to be aware of to make sure that you’re building robust models. In part 1 of this learning session, we talk about those pitfalls and outline strategies for handling them.

Binary target variables are very common in data science use cases, many of which are severely imbalanced. When you’re building models for infrequent events, such as predicting fraud or identifying product failures, it’s important to watch out for imbalance in your data. In part 2 of this learning session, we discuss strategies for working with imbalanced datasets and provide some rules-of-thumb for these types of use cases.

Hosts

Matt Marzillo (DataRobot, Customer Facing Data Scientist)
Mitch Carmen (DataRobot, Customer Facing Data Scientist)
Jack Jablonski (DataRobot, AI Success Manager)

Now what?

After watching this two-part learning session, you should check out these resources for more information.

DataRobot Platform Documentation:

About the author

Linda Haviland

Community Manager

Meet Linda Haviland

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

See other posts in AI & ML Expertise

Subscribe to our Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Best Practices for Imbalanced Data and Partitioning

Hosts

Now what?

How to Choose the Right LLM for Your Use Case

Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

Related Posts

Thanks! Check your inbox to confirm your subscription.