Target-Based Insights in DataRobot

February 18, 2020
by
· 2 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about the DataRobot AI Platform, data science, and more.

This article explains feature importance, target leakage detection, and the Feature Association Matrix. These are all calculated after the target has been selected and the Start button is pressed, as shown in this Target-Based Insights video.

Feature Importance

Feature Importance is a column that is highlighted from the Data page and shows us the relationship between our features and the target, as shown in Figure 1. The feature importance is analogous to a correlation and is calculated for using an algorithm called Alternating Conditional Expectations.

targetBased 11
Figure 1. Feature Importance

DataRobot shows the relationship between the target and a feature using an orange line as shown in Figure 2. For this numerical feature, DataRobot shows that when the number of inpatients is between 4 and 6, there is about an 80% likelihood of readmissions. 

targetBased 22
Figure 2. Relationship between the number of inpatients and the likelihood of readmission

Feature importance is available for text features as shown in Figure 3. The size of the letters is the frequency of the words, while the color refers to the strength of the relationship to the target. In this example, words in red have a much higher likelihood of readmittance than the words in blue.

targetBased 33
Figure 3. Word Cloud for Text Features

Leakage Detection

If you see a red or a yellow indicator then DataRobot has identified this feature as target leakage, as shown in Figure 4. DataRobot may then remove the feature from the list of features, so you see an informative feature with target leakage removed. This automatic removal is one of the guardrails DataRobot has in not only identifying target leakage, but also acting on it.

targetBased 44
Figure 4. Leakage Detection

Feature Association Matrix

The Feature Association Matrix shows the relationship between numerical and categorical information as shown in Figure 5. The colors indicate the strength of the association. The different colors here represent different clusters or groups of features that DataRobot has detected and are somewhat associated with each other. It is possible to sort this as well as run this analysis on different feature lists.

targetBased 55
Figure 5. Feature Association Matrix

The Feature Associations tab also allows you to look at the relationship between any two features as shown in Figure 6.

targetBased 66
Figure 6. Pairwise Relationships

More Information

For more information about the insights, automatically generated by DataRobot, please visit our public documentation portal and navigate to the Analyze Data section.

WEBINAR
See DataRobot in Action

The Next Generation of AI

Watch now
About the author
Linda Haviland
Linda Haviland

Community Manager

Meet Linda Haviland
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog