# Prediction Intervals via Conformal Inference

Horizontal

This AI Accelerator demonstrates various ways for generating prediction intervals for any DataRobot model. The methods presented here are rooted in the area of conformal inference (also known as conformal prediction).

These types of approaches have become increasingly popular for uncertainty quantification because they do not require strict distributional assumptions to be met in order to achieve proper coverage (i.e., they only require that the testing data is exchangeable with the training data). While conformal inference can be applied across a wide array of prediction problems, the focus in this notebook will be prediction interval generation on regression targets. This notebook is formatted as follows:

1. Importing libraries
2. Notebook parameters and helper functions
4. Modeling building and making predictions
5. Method 1: Absolute conformal
6. Method 2: Signed conformal
7. Method 3: Locally-weighted conformal
8. Method 4: Conformalized quantile regression
9. Comparing methods
10. Conclusion

Note: the particulars for each method have been simplified (e.g., authors use an “adjusted” quantile level rather than the traditional quantile calculation implemented here). For a full treatment of each approach and specific algorithm details, see the cited reference papers below.

### 1. Importing libraries

Read about different options for connecting to DataRobot from the client. Load the remaining libraries below in the usual way.

``````In [1]:

# Establish connection
import datarobot as dr

print(f"DataRobot version: {dr.__version__}")
dr.Client()``````

`DataRobot version: 3.2.0`

``````Out [1]:

<datarobot.rest.RESTClientObject at 0x10c74f970>
``````
``````In [2]:

# Imports
import datarobot as dr
from datarobot.models.modeljob import wait_for_async_model_creation
import numpy as np
import pandas as pd``````

### 2. Notebook parameters and helper functions

Below, you’ll have two parameters for this notebook:

1. COVERAGE_LEVEL: fraction of prediction intervals that should contain the target
2. TEST_DATA_FRACTION: fraction of data to holdout out and use as a testing dataset to evaluate each method

In addition, a couple functions are provided to make the following analysis easier. One of which computes the two metrics that will be used for comparison:

1. Coverage: fraction of computed prediction intervals that contain the target across all rows
2. Average Width: the average width of the prediction intervals across all rows

A desirable method should achieve the proper coverage at the smallest width possible.

``````In [3]:

# Coverage level
COVERAGE_LEVEL = 0.9

# Fraction of data to use to evaluate each method
TEST_DATA_FRACTION = 0.2``````
``````In [4]:

# Courtesy of https://github.com/yromano/cqr/blob/master/cqr/helper.py

def compute_coverage(
y_test: np.array,
y_lower: np.array,
y_upper: np.array,
significance: float,
name: str = "",
) -> (float, float):
"""
Computes coverage and average width

Parameters
----------
y_test: true labels (n)
y_lower: estimated lower bound for the labels (n)
y_upper: estimated upper bound for the labels (n)
significance: desired significance level
name: optional output string (e.g. the method name)

Returns
-------
coverage : average coverage
avg_width : average width

"""
# Compute coverage
in_the_range = np.sum((y_test >= y_lower) & (y_test <= y_upper))
coverage = in_the_range / len(y_test) * 100
print(
"%s: Percentage in the range (expecting %.2f): %f"
% (name, 100 - significance * 100, coverage)
)

# Compute average width
avg_width = np.mean(abs(y_upper - y_lower))
print("%s: Average width: %f" % (name, avg_width))

return coverage, avg_width

def compute_training_predictions(model: dr.Model) -> pd.DataFrame:
"""
Computes (or gathers) the out-of-sample training predictions from a model

Parameters
----------
model: DataRobot model

Returns
-------
DataFrame of training predictions

"""

# Get project to unlock holdout
project = dr.Project.get(model.project_id)
project.unlock_holdout()

# Request or gather predictions
try:
training_predict_job = model.request_training_predictions(
dr.enums.DATA_SUBSET.ALL
)
training_predictions = training_predict_job.get_result_when_complete()

except dr.errors.ClientError:
training_predictions = [
tp
for tp in dr.TrainingPredictions.list(project.id)
if tp.model_id == model.id and tp.data_subset == "all"
][0]

return training_predictions.get_all_as_dataframe()

def quantile_rearrangement(
test_preds: pd.DataFrame,
quantile_low: float,
quantile_high: float,
) -> pd.DataFrame:
"""
Produces monotonic quantiles
Based on: https://github.com/yromano/cqr/blob/master/cqr/torch_models.py#L66-#L94

Parameters
----------
test_preds: dataframe of quantile predictions to rearrange, sorted from lowest quantile to highest
quantile_low: desired low quantile in the range (0,1)
quantile_high: desired high quantile in the range (0,1)

Returns
-------
Dataframe of rearranged quantile predictions

References
----------
.. [1]  Chernozhukov, Victor, Iván Fernández‐Val, and Alfred Galichon.
"Quantile and probability curves without crossing."
Econometrica 78.3 (2010): 1093-1125.
"""

# Based on the code in the referenced function, "all_quantiles" is defined as the following:
# See https://github.com/yromano/cqr/blob/master/cqr/helper.py#L423
all_quantiles = np.linspace(0.01, 0.99, 99)

# This part remains remains the same
scaling = all_quantiles[-1] - all_quantiles[0]
low_val = (quantile_low - all_quantiles[0]) / scaling
high_val = (quantile_high - all_quantiles[0]) / scaling

# Get new values
q_fixed = np.quantile(
test_preds.values, (low_val, high_val), method="linear", axis=1
)

return pd.DataFrame(q_fixed.T, columns=test_preds.columns)``````

The dataset you’ll use comes from this DataRobot blog post. Each row represents a player in the National Basketball Association (NBA) and the columns signify different NBA statistics from various repositories, fantasy basketball news sources, and betting information. The target, `game_score`, is a single statistic that attempts to quantify player performance and productivity.

Additionally, you’ll partition the data into a training and testing sets. The training set will be used for modeling building / evaluation while the testing set will be used to compare each method.

``````In [5]:

"https://s3.amazonaws.com/datarobot_public_datasets/DR_Demo_NBA_2017-2018.csv"
)

Out [5]:

``````In [6]:

# Distribution of target
target_column = "game_score"
df[target_column].hist()``````
``````Out [6]:

<Axes: >``````
``````In [7]:
# Split data
df_train = df.sample(frac=1 - TEST_DATA_FRACTION, replace=False, random_state=10)
df_test = df.loc[~df.index.isin(df_train.index)]
df_train = df_train.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)
print(df_train.shape)
print(df_test.shape)``````

`(7999, 52) `

`(2000, 52)`

### 4. Modeling building and making predictions

To create a DataRobot project and start building models, you can use the convenient `Project.start` function, which chains together project creation, file upload, and target selection. Once models are finished training, you’ll retrieve the one DataRobot recommends for deployment and request predictions for both the training and testing sets. Note that the predictions made on the training dataset are not in-sample, but rather out-of-sample (i.e., also referred to as stacked predictions). These out-of-sample training predictions are a key component to each prediction interval method discussed in this notebook.

``````In [8]:

# Starting main project
project = dr.Project.start(
sourcedata=df_train,
project_name="Conformal Inference AIA - NBA",
target=target_column,
worker_count=-1,
)

# Wait
project.wait_for_autopilot(check_interval=120)``````
``````In progress: 8, queued: 0 (waited: 0s)
In progress: 8, queued: 0 (waited: 1s)
In progress: 8, queued: 0 (waited: 1s)
In progress: 8, queued: 0 (waited: 2s)
In progress: 8, queued: 0 (waited: 4s)
In progress: 8, queued: 0 (waited: 6s)
In progress: 8, queued: 0 (waited: 10s)
In progress: 8, queued: 0 (waited: 17s)
In progress: 8, queued: 0 (waited: 30s)
In progress: 7, queued: 0 (waited: 56s)
In progress: 1, queued: 0 (waited: 108s)
In progress: 4, queued: 0 (waited: 211s)
In progress: 1, queued: 0 (waited: 332s)
In progress: 0, queued: 0 (waited: 453s)
In progress: 0, queued: 0 (waited: 574s)``````
``````In [9]:

# Get recommended model
best_model = dr.ModelRecommendation.get(project.id).get_model()
best_model``````
``````Out [9]:

Model('RandomForest Regressor')
``````
``````In [10]:

# Compute training predictions (necessary for each method)
training_preds = compute_training_predictions(model=best_model)

Out [10]:

``````In [11]:

# Request predictions on testing data
pred_dataset = project.upload_dataset(sourcedata=df_test, max_wait=60 * 60 * 24)
predict_job = best_model.request_predictions(dataset_id=pred_dataset.id)
testing_preds = predict_job.get_result_when_complete(max_wait=60 * 60 * 24)

Out [11]:

``````In [12]:

# Join predictions training and testing datasets
df_train = df_train.join(training_preds.set_index("row_id"))
df_test = df_test.join(testing_preds.set_index("row_id"))
display(df_train[[target_column, "prediction"]])
display(df_test[[target_column, "prediction"]])``````

``````In [13]:

# Compute the residuals on the training data
df_train["residuals"] = df_train[target_column] - df_train["prediction"]
df_train["residuals"].hist()
``````

Out[13]:

`<Axes: >`

### 5. Method: Absolute conformal

The first method you’ll implement, regarded here as “absolute conformal”, is as follows:

1. Take the absolute value of the out-of-sample residuals (these will be the conformity scores)
2. Compute the quantile associated with the specified COVERAGE_LEVEL on the conformity scores
3. Add and subtract this quantile value to the prediction

The resulting prediction intervals are guaranteed to be symmetric and the same width (since you’re simply applying a scalar value across all rows). For more information regarding this approach, see Section 2.3.

``````In [14]:

# Compute the conformity

df_train["abs_residuals"] = df_train["residuals"].abs()
abs_residuals_q = df_train["abs_residuals"].quantile(COVERAGE_LEVEL)
abs_residuals_q``````

Out [14]:

`11.028431310477108`

``````In [15]:

# Using the conformity score, create the prediction intervals
df_test["method_1_lower"] = df_test["prediction"] - abs_residuals_q
df_test["method_1_upper"] = df_test["prediction"] + abs_residuals_q``````
``````In [16]:

# Compute metrics
method_1_coverage = compute_coverage(
y_test=df_test[target_column].values,
y_lower=df_test["method_1_lower"].values,
y_upper=df_test["method_1_upper"].values,
significance=1 - COVERAGE_LEVEL,
name="Absolute Conformal",
)``````
``````Absolute Conformal: Percentage in the range (expecting 90.00): 89.000000
Absolute Conformal: Average width: 22.056863``````

### 6. Method: Signed conformal

“Signed conformal” follows a very similar procedure as the previous one:

1. Compute lower and upper quantile levels based on the specified COVERAGE_LEVEL
2. Apply these quantile levels to the out-of-sample residuals (i.e., conformity scores)
3. Add these quantile values to the prediction

The main advantage to this approach over the previous one is that the prediction intervals are not forced to be symmetric, which can lead to better coverage for skewed targets. For more information regarding this approach, see Section 3.2.

``````In [17]:

# Compute lower and upper quantile levels to use based on the coverage
lower_coverage_q = round((1 - COVERAGE_LEVEL) / 2, 2)
upper_coverage_q = COVERAGE_LEVEL + (1 - COVERAGE_LEVEL) / 2
lower_coverage_q, upper_coverage_q``````

Out [17]:

`(0.05, 0.95)`

``````In [18]:

# Compute quantiles on the conformity scores
residuals_q_low = df_train["residuals"].quantile(lower_coverage_q)
residuals_q_high = df_train["residuals"].quantile(upper_coverage_q)
residuals_q_low, residuals_q_high``````

Out [18]:

`(-10.573999229291612, 11.617478915155703)`

``````In [19]:

# Using the quantile levels, create the prediction intervals
df_test["method_2_lower"] = df_test["prediction"] + residuals_q_low
df_test["method_2_upper"] = df_test["prediction"] + residuals_q_high``````
``````In [20]:

# Compute coverage / width
method_2_coverage = compute_coverage(
y_test=df_test[target_column].values,
y_lower=df_test["method_2_lower"].values,
y_upper=df_test["method_2_upper"].values,
significance=1 - COVERAGE_LEVEL,
name="Signed Conformal",
)``````

`Signed Conformal: Percentage in the range (expecting 90.00): 88.900000 `

`Signed Conformal: Average width: 22.191478`

### 7. Method: Locally-weighted conformal

While the primary advantage of the previous two methods is their simplicity, the disadvantage is that each prediction interval ends up being the exact same width. In many cases, it’s desirable to have varying widths that reflect the degree of confidence (i.e., harder to predict rows get a larger prediction interval and vice versa). To this end, you can make them more adaptive by using an auxiliary model to help augment the width on a per-row basis, depending on how much error we’d expect to see in a particular row. The “locally-weighted conformal” method is as follows:

1. Take the absolute value of the out-of-sample residuals
2. Build a model that regresses against the absolute residuals using the same feature set
3. Compute the out-of-sample predictions from the absolute residuals model
4. Scale the out-of-sample residuals using the the auxillary model’s predictions to create the conformity scores
5. Compute the quantile associated with the specified COVERAGE_LEVEL on the conformity scores
6. Multiply this quantile value and the auxillary model’s predictions together (this will result in a locally-weighted offset to apply, specific to each row)
7. Add and subtract this multiplied value to each prediction

Although this approach is more involved, it addresses the disadvantage above by making the prediction intervals adaptive to each row while still being symmetric. For more information regarding this approach, see Section 5.2.

``````In [21]:

# Starting project to predict absolute residuals
project_abs_residuals = dr.Project.start(
sourcedata=df_train.drop(
columns=[
target_column,
"partition_id",
"prediction",
"residuals",
],
axis=1,
),
project_name=f"Predicting absolute residuals from {project.project_name}",
target="abs_residuals",
worker_count=-1,
)
project_abs_residuals.wait_for_autopilot(check_interval=120)``````
``````In progress: 8, queued: 0 (waited: 0s)
In progress: 8, queued: 0 (waited: 1s)
In progress: 8, queued: 0 (waited: 1s)
In progress: 8, queued: 0 (waited: 2s)
In progress: 8, queued: 0 (waited: 3s)
In progress: 8, queued: 0 (waited: 5s)
In progress: 8, queued: 0 (waited: 9s)
In progress: 8, queued: 0 (waited: 16s)
In progress: 8, queued: 0 (waited: 29s)
In progress: 7, queued: 0 (waited: 55s)
In progress: 1, queued: 0 (waited: 108s)
In progress: 3, queued: 0 (waited: 211s)
In progress: 1, queued: 0 (waited: 331s)
In progress: 0, queued: 0 (waited: 452s)
In progress: 0, queued: 0 (waited: 572s)``````
``````In [22]:

# Get recommended model
best_model_abs_residuals = dr.ModelRecommendation.get(
project_abs_residuals.id
).get_model()
best_model_abs_residuals
``````
``````Out [22]:

Model('RandomForest Regressor')

``````
``````In [23]:

# Compute training predictions and join
df_train = df_train.join(
compute_training_predictions(model=best_model_abs_residuals)
.rename(columns={"prediction": f"abs_residuals_prediction"})
.set_index("row_id")
.drop(columns=["partition_id"], axis=1)
)
``````
``````In [24]:

# Now compute prediction on testing data and join
sourcedata=df_test, max_wait=60 * 60 * 24
)
df_test = df_test.join(
best_model_abs_residuals.request_predictions(
dataset_id=pred_dataset_abs_residuals.id
)
.get_result_when_complete(max_wait=60 * 60 * 24)
.rename(columns={"prediction": f"abs_residuals_prediction"})
.set_index("row_id")
)``````
``````In [25]:

# Now we need to compute our locally-weighted conformity score and take the quantile
scaled_abs_residuals = df_train["abs_residuals"] / df_train["abs_residuals_prediction"]
scaled_abs_residuals_q = scaled_abs_residuals.quantile(COVERAGE_LEVEL)
scaled_abs_residuals_q``````
``````Out [25]:

2.0517307447009157
``````
``````In [26]:

# Using the conformity score and absolute residuals model, create the prediction intervals
df_test["method_3_lower"] = (
df_test["prediction"] - df_test["abs_residuals_prediction"] * scaled_abs_residuals_q
)
df_test["method_3_upper"] = (
df_test["prediction"] + df_test["abs_residuals_prediction"] * scaled_abs_residuals_q
)``````
``````In [27]:

# Compute coverage / width
method_3_coverage = compute_coverage(
y_test=df_test[target_column].values,
y_lower=df_test["method_3_lower"].values,
y_upper=df_test["method_3_upper"].values,
significance=1 - COVERAGE_LEVEL,
name="Locally-Weighted Conformal",
)``````
``````Locally-Weighted Conformal: Percentage in the range (expecting 90.00): 89.800000
Locally-Weighted Conformal: Average width: 21.710734
``````

### 8. Method: Conformalized Quantile Regression

If you consider “locally-weighted conformal” to be a model-based extension of “absolute conformal”, then you could consider “conformalized quantile regression” to be a model-based extension of “signed conformal.” The goal is similar – create more adaptive prediction intervals, but it inherits the quality that the prediction intervals are not forced to be symmetric. The reference paper offers a symmetric and asymmetric formulation for the conformity scores. The former (Theorem 1) “allows coverage errors to be spread arbitrarily over the left and right tails” while the latter (Theorem 2) controls “the left and right tails independently, resulting in a stronger coverage guarantee” at the cost of slightly wider prediction intervals. Here, you’ll use the symmetric version. The full method is as follows:

1. Compute lower and upper quantile levels based on the specified COVERAGE_LEVEL
2. Train two quantile regression models at the lower and upper quantile levels on the training set
3. Compute the out-of-sample predictions for both quantile models
4. Compute the conformity scores, E, such that E = max[^qlowery, y^qupper]  where у is the target, ^qlower is the lower quantile predictions, and ^qupper is the upper quantile predictions
5. Compute the quantile associated with the specified COVERAGE_LEVEL on E
6. Add and subtract this quantile value to the quantile predictions

Notably, this approach is completely independent of your main model. That is, it doesn’t use any information about the recommended model defined earlier. This may or may not be desired, depending on the user’s preference or use case requirements. To ensure the main model’s predictions fall within the prediction interval, you’ll simply extend the interval’s boundary to be equal to the prediction itself (if the prediction lies outside of the respective prediction interval). Additionally, when using non-linear quantile regression methods (e.g., tree-based approaches, neural networks), it’s possible to experience quantile crossing (i.e., non-monotonic quantile predictions). To combat this, the referenced paper offers a solution via rearrangement, which is implemented here.

There are two ways to run quantile regression models in DataRobot:

1. Set the project metric to quantile loss (which is currently a public-preview feature)
2. Use certain blueprints with algorithms that support quantile loss as a hyperparameter in your current project. These includes gradient boosted trees from scikit-learn and Keras neural networks.

In this notebook, you’ll use the second approach, since it’s generally available. This involves using DataRobot’s advanced tuning functionality to change the loss function to the desired quantile loss.

``````In [28]:

# Get a GBT from scikit-learn (using the first one)
models = project.get_models()
gbt_models = [x for x in models if x.model_type.startswith("Gradient Boosted")]

# Check for GBT model. If none, make one.
if gbt_models:
# Get most accurate one on validation set
gbt_model = gbt_models[0]

else:
# Pull models (will usually be at least one blueprint with a scikit-learn GBT)
gbt_bps = [
x
for x in project.get_blueprints()
]

# Get first one
gbt_bp = gbt_bps[0]

# Train it
gbt_model = wait_for_async_model_creation(
project_id=project.id,
model_job_id=project.train(gbt_bp),
max_wait=60 * 60 * 24,
)

gbt_model``````
``````Out [28]:

Model('Gradient Boosted Greedy Trees Regressor with Early Stopping (Least-Squares Loss)')
``````
``````In [29]:

# Train it on all the data
model_job_id = gbt_model.train(
sample_pct=100,
featurelist_id=gbt_model.featurelist_id,
)
gbt_model_100 = wait_for_async_model_creation(
project_id=project.id, model_job_id=model_job_id, max_wait=60 * 60 * 24
)
gbt_model_100
``````
``````Out [29]:

Model('Gradient Boosted Greedy Trees Regressor with Early Stopping (Least-Squares Loss)')

``````
``````In [30]:

# Train quantile models
quantile_models = {lower_coverage_q: None, upper_coverage_q: None}

# Tune main keras model
for q in quantile_models.keys():
# Start

# Set loss and level
tune.set_parameter(
)
tune.set_parameter(
)

# Save job
quantile_models[q] = tune.run()

# Wait and get resulting models
for q in quantile_models.keys():
quantile_models[q] = quantile_models[q].get_result_when_complete(
max_wait=60 * 60 * 24
)

quantile_models
``````
``````Out [30]:

{0.05: Model('Gradient Boosted Greedy Trees Regressor with Early Stopping (Quantile Loss)'),
0.95: Model('Gradient Boosted Greedy Trees Regressor with Early Stopping (Quantile Loss)')}``````
``````In [31]:

# Compute training predictions
for q in quantile_models.keys():
df_train = df_train.join(
compute_training_predictions(model=quantile_models[q])
.rename(columns={"prediction": f"quantile_prediction_{q}"})
.set_index("row_id")
.drop(columns=["partition_id"], axis=1)
)

# Check
df_train[
[
target_column,
f"quantile_prediction_{lower_coverage_q}",
f"quantile_prediction_{upper_coverage_q}",
]
]``````

Out[31]:

``````In [32]:

# Making prediction on test data
quantile_models_test_predict = quantile_models.copy()
for q in quantile_models.keys():
quantile_models_test_predict[q] = quantile_models[q].request_predictions(
dataset_id=pred_dataset.id
)

# Joining the results
for q in quantile_models.keys():
df_test = df_test.join(
quantile_models_test_predict[q]
.get_result_when_complete(max_wait=60 * 60 * 24)
.rename(columns={"prediction": f"quantile_prediction_{q}"})
.set_index("row_id")
)

# Check
df_test[
[
target_column,
f"quantile_prediction_{lower_coverage_q}",
f"quantile_prediction_{upper_coverage_q}",
]
]``````

Out[32]:

``````In [33]:

# Implement quantile rearrangement
q_crossing_train = (
df_train[f"quantile_prediction_{lower_coverage_q}"]
> df_train[f"quantile_prediction_{upper_coverage_q}"]
).sum()
q_crossing_test = (
df_test[f"quantile_prediction_{lower_coverage_q}"]
> df_test[f"quantile_prediction_{upper_coverage_q}"]
)
print(
f"Number of rows with quantile crossing in training set (before rearrangement): {q_crossing_train}"
)
print(
f"Number of rows with quantile crossing in testing set (before rearrangement): {q_crossing_test}"
)

# Capture quantile columns
quantile_pred_cols = [x for x in df_train.columns if x.startswith("quantile")]

# On training set
df_train = df_train.drop(quantile_pred_cols, axis=1).join(
quantile_rearrangement(
test_preds=df_train[
[
f"quantile_prediction_{lower_coverage_q}",
f"quantile_prediction_{upper_coverage_q}",
]
],
quantile_low=lower_coverage_q,
quantile_high=upper_coverage_q,
)
)

# On testing set
df_test = df_test.drop(quantile_pred_cols, axis=1).join(
quantile_rearrangement(
test_preds=df_test[
[
f"quantile_prediction_{lower_coverage_q}",
f"quantile_prediction_{upper_coverage_q}",
]
],
quantile_low=lower_coverage_q,
quantile_high=upper_coverage_q,
)
)

# Check again
q_crossing_train = (
df_train[f"quantile_prediction_{lower_coverage_q}"]
> df_train[f"quantile_prediction_{upper_coverage_q}"]
).sum()
q_crossing_test = (
df_test[f"quantile_prediction_{lower_coverage_q}"]
> df_test[f"quantile_prediction_{upper_coverage_q}"]
)
print(
f"Number of rows with quantile crossing in training set (after rearrangement): {q_crossing_train}"
)
print(
f"Number of rows with quantile crossing in testing set (after rearrangement): {q_crossing_test}"
)``````
``````Number of rows with quantile crossing in training set (before rearrangement): 7
Number of rows with quantile crossing in testing set (before rearrangement): 0       False
1       False
2       False
3       False
4       False
...
1995    False
1996    False
1997    False
1998    False
1999    False
Length: 2000, dtype: bool
Number of rows with quantile crossing in training set (after rearrangement): 0
Number of rows with quantile crossing in testing set (after rearrangement):
0       False
1       False
2       False
3       False
4       False
...
1995    False
1996    False
1997    False
1998    False
1999    False
Length: 2000, dtype: bool``````
``````In [34]:

# Now we compute our conformity score and take the quantile
E_cqr = np.maximum(
df_train[f"quantile_prediction_{lower_coverage_q}"] - df_train[target_column],
df_train[target_column] - df_train[f"quantile_prediction_{upper_coverage_q}"],
)
E_cqr_q = E_cqr.quantile(COVERAGE_LEVEL)
E_cqr_q``````
``````In [34]:

1.7189887578306196``````
``````In [35]:

# Create the prediction intervals
df_test["method_4_lower"] = df_test[f"quantile_prediction_{lower_coverage_q}"] - E_cqr_q
df_test["method_4_upper"] = df_test[f"quantile_prediction_{upper_coverage_q}"] + E_cqr_q``````
``````In [36]:

# Extend to make sure the prediction is inside the interval
df_test["method_4_lower"] = df_test[["method_4_lower", "prediction"]].min(axis=1)
df_test["method_4_upper"] = df_test[["method_4_upper", "prediction"]].max(axis=1)``````
``````In [37]:

# Compute coverage / width
method_4_coverage = compute_coverage(
y_test=df_test[target_column].values,
y_lower=df_test["method_4_lower"].values,
y_upper=df_test["method_4_upper"].values,
significance=1 - COVERAGE_LEVEL,
name="Conformalized Quantile Regression",
)``````
``````Conformalized Quantile Regression: Percentage in the range (expecting 90.00): 89.700000
Conformalized Quantile Regression: Average width: 21.706370
``````

### 9. Comparing methods

Below you can see that the more advanced methods (i.e., “locally-weighted conformal” and “conformalized quantile regression”) yield similar coverage rates while producing smaller prediction intervals on average. Notably, this is just one dataset, and it’s suggested to empirically experiment with your own data to find the best method for your use case.

``````In [38]:

# Organize
summary = pd.DataFrame(
{
"Coverage": [
method_1_coverage[0],
method_2_coverage[0],
method_3_coverage[0],
method_4_coverage[0],
],
"Average Width": [
method_1_coverage[1],
method_2_coverage[1],
method_3_coverage[1],
method_4_coverage[1],
],
"Method": [
"Absolute Conformal",
"Signed Conformal",
"Locally-Weighted Conformal",
"Conformalized Quantile Regression",
],
}
)
summary``````

Out[38]:

### 10. Conclusion

This notebook demonstrates how one could build prediction intervals for any DataRobot model using methods derived from the conformal inference space. Conformal inference is a popular framework to use for generating such prediction intervals because they don’t require strict distributional assumptions to achieve the desired coverage, so long as the testing data is exchangeable with the training data. This characteristic was confirmed in the analysis done here. Because each approach offers different pros and cons, it’s worthwhile to use this AI Accelerator as a starting point for your own experiments to decide which one to implement for your use case. DataRobot offers the ability to easily implement each of these methods, even for the more advanced techniques. For more information on the topic of conformal inference, see the following introductory paper.

Get Started with Free Trial

Experience new features and capabilities previously only available in our full AI Platform product.

#### Get Started with Prediction Intervals

Explore more Industry Agnostic AI Accelerators