Deriving the Most Value Out of the DataRobot What-If App
This article is a walkthrough of the DataRobot What-If app. Here we describe how to access it, what it does, and how to derive the most value out of using it.
To find the What-If application, click Applications in the top menu. You’ll see the Applications Gallery where you can find a list of all the applications available to your organization (See Figure 1).
The What-If application helps business users create and compare side-by-side multiple prediction scenarios and select the option that provides the best outcome.
Some of the uses and advantages of this application include:
- Making base predictions, then changing one or more inputs to simulate a new scenario, and see how those changes affect the target.
- Comparing likely scenarios and their prediction explanations side-by-side in the comparison view.
- Using domain expertise to test and validate models. Domain experts can use this application to answer questions like, “Do changes in input values have the expected effect on the prediction score?”.
- Serving as a simple proof-of-concept decision-support tool for business users that work with any DataRobot machine learning model.
For this article we are going to use a dataset about homes in Ames, Iowa with sale price as the target variable. This dataset can be downloaded from the What-If demo page shown in Figure 2. Once that’s done you can use this data to create a project as depicted in Figure 3.
Once we have filled out the information on the Launch Application tab and launched it, DataRobot will open the Current Applications page shown in Figure 4. In this tab you can view all applications that you’ve deployed already. You can view details about each application by clicking the hamburger button next to each application’s name. Clicking that menu shows information such as rotating your authentication token, deactivating theapplication, sharing the application, or deleting it.
Figure 5 shows the What-If application interface once DataRobot is done deploying it and you have clicked the Open button. DataRobot selects the 10 most important features in your dataset and displays them for you to experiment with. You can experiment with entering different values for those variables and observe how the model’s predictions also change.
DataRobot displays only the top 10 features to make it easier for you to see the effect of changing the values of the most influential features without wasting time searching through a long list of other features that may have little to no effect on the predictive capability of your model. Thus said, you still have the ability to include features of your choice by them via the Manage Variables tab (Figure 6).
The next step is to spend some time in the application experimenting with different values of the listed features as depicted in Figure 7 and then click Update prediction. DataRobot returns the SalePrice given the values you entered. It also shows the associated prediction explanations (Figure 8). You can take this information and add it to the Comparison tab (Figure 9), which makes it possible to compare model results given different scenarios.
In Figure 9, we show results of experimenting with three different values for the feature Neighborhood: ClearCr, Edwards, and Gilbert while keeping all other features the same across three scenarios. We can see from that display how our SalePrice and prediction explanations changed given the different values for Neighborhood.
Now, let’s explore what the three scenarios in Figure 9 suggest:
- For Edwards, the Neighborhood feature has a negative prediction explanation. So it’s driving the predicted score down by being in the Edwards neighborhood.
- For ClearCr, the Neighborhood feature has a positive prediction explanation, which is a positive thing.
- For Gilbert, the Neighborhood feature doesn’t seem to have as strong an impact on the predicted score compared to the other listed features in Figure 9 under Scenario 3. It does, however, show that 2ndFlrSF (second floor square footage) has a negative prediction explanation. That’s because we didn’t enter a value for that. Does this mean that homes in Gilbert are nicer than the other two neighborhoods? We can’t quite conclude that because we didn’t have a value for the 2ndFlrSF variable.
What’s next after these results? You can click the share icon on the top right of Figure 9. This will allow you to copy, link and share these results with anybody in your organization so they can see what you’ve done.