Feature Impact. How to Customize the Sample Size in DataRobot
This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about DataRobot, AI Cloud, data science, and more.
This article explains how to customize the sample size when calculating Feature Impact scores.
Feature Impact is calculated using permutation importance which has proven to be an excellent measure for determining which features affect a model the most. Having said that, permutation importance is computationally expensive to calculate, which is why DataRobot only uses a subset of the data to do the actual math.
To minimize the chances of having a suboptimal Feature Impact calculation, you can choose the sample size that will be used in the calculations. Currently, the maximum sample you can choose is 100,000 rows.
Defining Custom Sample Size
To choose your own custom sample size for Feature Impact calculations, navigate to the Feature Impact tab for the model of interest (Figure 1).
In the Feature Impact tab click the + icon (to the right of the Sample Size field).
Now you can choose the appropriate sample size. There are four different ways to do that:
- Specify the percent of rows to use.
- Specify the absolute number of rows to use.
- Choose from the three predefined “Snap To” options (quick, half, max).
Use the sample size slider.
All of these options are shown in Figure 3.
After you have finished setting up, just click Set Sample and Enable Feature Impact to kick off the calculations.
Recalculating Feature Impact
Feature Impact can be recalculated with a new sample size as many times as needed. To do this, go to the Feature Impact tab of the model and click Adjust sample size as shown in Figure 4.
This will display a window with the same options (as shown in Figure 3). Specify a new sample size and start the calculations again.
See the in-app Platform Documentation for Feature Impact.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
How the DataRobot AI Platform Is Delivering Value-Driven AIMarch 16, 2023· 4 min read
New DataRobot and Snowflake Integrations: Seamless Data Prep, Model Deployment, and MonitoringMarch 16, 2023· 5 min read
DataRobot has long believed that to democratize machine learning (ML) on the path to Augmented Intelligence, any user must have seamless access to learning — for example, how to prepare, create, explore, deploy, monitor, and consume ML models. Those already familiar with the DataRobot experience are also familiar with our in-app documentation that makes it easy to keep up with…
AI projects have many more unknowns than traditional technology projects. You have to know the right use case to start with and know the value you can expect even before you start. You need to understand what data sources to go after and how to get the data ready. You have to pick the right model to meet expected performance goals. Train it, test it, tune it. The list goes on. While you are trying to figure all this out, organizational leaders expect results from their investments in AI faster than ever before.
As we see from countless examples, the demand for AI is at a fever pitch across every industry. Becoming AI-driven is no longer really optional. As AI continues to advance at such an aggressive pace, solutions built on machine learning are quickly becoming the new norm. To meet the demands of the modern world, we have to experiment fast, collaborate…