Data Cheats: How Target Leakage Affects Models

September 24, 2020
by
· 1 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Cloud, data science, and more.

Target leakage, also known as data leakage, is one of the most challenging problems when building machine learning models. Without proper checks and guardrails, you may not realize you have target leakage until you deploy a model and notice that its performance in a production environment is worse than it was during development.

During this session, we cover conceptual definitions of target leakage and the ways it can arise prior to model building, in particular during the data engineering and project setup phase. Then we demonstrate how DataRobot’s Data Quality Assessment performs Target Leakage Detection to ensure that projects follow data science best practices and resulting models will be robust to real-world data. Finally, we will provide a handy checklist to help you evaluate your projects for target leakage.

Hosts

  • Yuriy Guts (DataRobot, Engineer)
  • Alex Shoop (DataRobot, Engineer)
  • Rajiv Shah (DataRobot, Data Scientist)
  • Jack Jablonski (DataRobot, AI Success Manager)

More Information

DataRobot Documentation:

DataRobot University:

DataRobot.com:

Public documentation
Find All the Information to Succeed with DataRobot and Machine Learning
Learn More
About the author
Linda Haviland
Linda Haviland

Community Manager

Meet
  • Listen to the blog
     
  • Share this post