DataRobot PartnersUnify your AI stack with our open platform, extend your cloud investments, and connect with service providers to help you build, deploy, or migrate to DataRobot.
This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Platform, data science, and more.
Target leakage, also known as data leakage, is one of the most challenging problems when building machine learning models. Without proper checks and guardrails, you may not realize you have target leakage until you deploy a model and notice that its performance in a production environment is worse than it was during development.
During this session, we cover conceptual definitions of target leakage and the ways it can arise prior to model building, in particular during the data engineering and project setup phase. Then we demonstrate how DataRobot’s Data Quality Assessment performs Target Leakage Detection to ensure that projects follow data science best practices and resulting models will be robust to real-world data. Finally, we will provide a handy checklist to help you evaluate your projects for target leakage.