Target-Based Insights in DataRobot
This article explains feature importance, target leakage detection, and the Feature Association Matrix. These are all calculated after the target has been selected and the Start button is pressed, as shown in this Target-Based Insights video.
Feature Importance is a column that is highlighted from the Data page and shows us the relationship between our features and the target, as shown in Figure 1. The feature importance is analogous to a correlation and is calculated for using an algorithm called Alternating Conditional Expectations.
DataRobot shows the relationship between the target and a feature using an orange line as shown in Figure 2. For this numerical feature, DataRobot shows that when the number of inpatients is between 4 and 6, there is about an 80% likelihood of readmissions.
Feature importance is available for text features as shown in Figure 3. The size of the letters is the frequency of the words, while the color refers to the strength of the relationship to the target. In this example, words in red have a much higher likelihood of readmittance than the words in blue.
If you see a red or a yellow indicator then DataRobot has identified this feature as target leakage, as shown in Figure 4. DataRobot may then remove the feature from the list of features, so you see an informative feature with target leakage removed. This automatic removal is one of the guardrails DataRobot has in not only identifying target leakage, but also acting on it.
Feature Association Matrix
The Feature Association Matrix shows the relationship between numerical and categorical information as shown in Figure 5. The colors indicate the strength of the association. The different colors here represent different clusters or groups of features that DataRobot has detected and are somewhat associated with each other. It is possible to sort this as well as run this analysis on different feature lists.
The Feature Associations tab also allows you to look at the relationship between any two features as shown in Figure 6.
For more information about the insights, automatically generated by DataRobot, please visit our public documentation portal and navigate to the Analyze Data section.