DataRobot PartnersUnify all of your data, ETL and AI tools in our open platform with our Technology Partners, extend your cloud investments with our Cloud Partners, and connect with DataRobot Services Partners to help you build, deploy or migrate to the DataRobot AI Platform.
This article explains feature importance, target leakage detection, and the Feature Association Matrix. These are all calculated after the target has been selected and the Start button is pressed, as shown in this Target-Based Insights video.
Feature Importance
Feature Importance is a column that is highlighted from the Data page and shows us the relationship between our features and the target, as shown in Figure 1. The feature importance is analogous to a correlation and is calculated for using an algorithm called Alternating Conditional Expectations.
Figure 1. Feature Importance
DataRobot shows the relationship between the target and a feature using an orange line as shown in Figure 2. For this numerical feature, DataRobot shows that when the number of inpatients is between 4 and 6, there is about an 80% likelihood of readmissions.
Figure 2. Relationship between the number of inpatients and the likelihood of readmission
Feature importance is available for text features as shown in Figure 3. The size of the letters is the frequency of the words, while the color refers to the strength of the relationship to the target. In this example, words in red have a much higher likelihood of readmittance than the words in blue.
Figure 3. Word Cloud for Text Features
Leakage Detection
If you see a red or a yellow indicator then DataRobot has identified this feature as target leakage, as shown in Figure 4. DataRobot may then remove the feature from the list of features, so you see an informative feature with target leakage removed. This automatic removal is one of the guardrails DataRobot has in not only identifying target leakage, but also acting on it.
Figure 4. Leakage Detection
Feature Association Matrix
The Feature Association Matrix shows the relationship between numerical and categorical information as shown in Figure 5. The colors indicate the strength of the association. The different colors here represent different clusters or groups of features that DataRobot has detected and are somewhat associated with each other. It is possible to sort this as well as run this analysis on different feature lists.
Figure 5. Feature Association Matrix
The Feature Associations tab also allows you to look at the relationship between any two features as shown in Figure 6.
Figure 6. Pairwise Relationships
More Information
For more information about the insights, automatically generated by DataRobot, please visit our public documentation portal and navigate to the Analyze Data section.