Target-Based Insights in DataRobot
This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Cloud, data science, and more.
This article explains feature importance, target leakage detection, and the Feature Association Matrix. These are all calculated after the target has been selected and the Start button is pressed, as shown in this Target-Based Insights video.
Feature Importance is a column that is highlighted from the Data page and shows us the relationship between our features and the target, as shown in Figure 1. The feature importance is analogous to a correlation and is calculated for using an algorithm called Alternating Conditional Expectations.
DataRobot shows the relationship between the target and a feature using an orange line as shown in Figure 2. For this numerical feature, DataRobot shows that when the number of inpatients is between 4 and 6, there is about an 80% likelihood of readmissions.
Feature importance is available for text features as shown in Figure 3. The size of the letters is the frequency of the words, while the color refers to the strength of the relationship to the target. In this example, words in red have a much higher likelihood of readmittance than the words in blue.
If you see a red or a yellow indicator then DataRobot has identified this feature as target leakage, as shown in Figure 4. DataRobot may then remove the feature from the list of features, so you see an informative feature with target leakage removed. This automatic removal is one of the guardrails DataRobot has in not only identifying target leakage, but also acting on it.
Feature Association Matrix
The Feature Association Matrix shows the relationship between numerical and categorical information as shown in Figure 5. The colors indicate the strength of the association. The different colors here represent different clusters or groups of features that DataRobot has detected and are somewhat associated with each other. It is possible to sort this as well as run this analysis on different feature lists.
The Feature Associations tab also allows you to look at the relationship between any two features as shown in Figure 6.
For more information about the insights, automatically generated by DataRobot, please visit our public documentation portal and navigate to the Analyze Data section.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
How the DataRobot AI Platform Is Delivering Value-Driven AIMarch 16, 2023· 4 min read
New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and MonitoringMarch 16, 2023· 5 min read
On June 16 and 17, 2020, DataRobot hosted AI Experience Worldwide, our first-ever virtual conference, which brought together customers, partners, and AI visionaries to discuss how we all can accelerate the time-to-impact of AI solutions across the enterprise. Recognizing the obstacles presented by the uncertain times we live in today, the DataRobot team designed the conference agenda to address pragmatic…
“AI could contribute up to $15.7 trillion to the global economy by 2030, more than the current output of China and India combined,” according to PwC. The same report estimates that in 2018 alone, AI contributed $2 trillion to the global GDP. Despite the enormous rewards of implementing AI solutions, becoming an AI-driven organization is still a challenge. In our…
‘Predictions and insights without a Data Scientist’ By Ina Ko Senior Product Manager DataRobot You do not need to be a data scientist to compete in a Kaggle competition. I know this because I did it. My name is Ina Ko, and I am not a Data Scientist. Instead, I stand on the shoulders of giants — that is, the…