Data Cheats: How Target Leakage Affects Models
Target leakage, also known as data leakage, is one of the most challenging problems when building machine learning models. Without proper checks and guardrails, you may not realize you have target leakage until you deploy a model and notice that its performance in a production environment is worse than it was during development.
During this session, we cover conceptual definitions of target leakage and the ways it can arise prior to model building, in particular during the data engineering and project setup phase. Then we demonstrate how DataRobot’s Data Quality Assessment performs Target Leakage Detection to ensure that projects follow data science best practices and resulting models will be robust to real-world data. Finally, we will provide a handy checklist to help you evaluate your projects for target leakage.
- Yuriy Guts (DataRobot, Engineer)
- Alex Shoop (DataRobot, Engineer)
- Rajiv Shah (DataRobot, Data Scientist)
- Jack Jablonski (DataRobot, AI Success Manager)
- Target Leakage (free course)