Platt Scaling for Classification Models

January 24, 2020
by
· 3 min read

This post was originally part of the DataRobot Community. Visit now to browse discussions and ask questions about the DataRobot AI Platform, data science, and more.

This article introduces the popular calibration method, Platt Scaling. For many problems, it is convenient to get a probability P(y=1|x) which is a classification that not only gives an answer, but also a degree of certainty about the answer. However, some classification models like (SVM and Decision Trees) do not provide such a probability, or they provide poor probability estimates.

Platt Scaling amounts to training a logistic regression model on the classifier outputs—has a way of transforming the outputs of a non-probabilistic classification model into a probability distribution over classes.

We will see an example where we train an SVM and then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities.

(mathrm{P}(y=1 | x) = frac{1}{1 + exp(Af(x) + B)}=)

We would like to obtain AB, — two scalar parameters that are learned by the algorithm.

This idea was suggested in Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods published in 1999 by John C. Platt.

Packages Needed

  • caret
  • kernlab

Functions Needed

  1. createDataPartition()
  2. train()
  3. predict()

We will use spam, a Spam Email database that comes with the kernlab package.

Dataset Description: A dataset collected at Hewlett-Packard Labs, that classifies 4601 emails as spam or non-spam. In addition to this class label there are 57 variables indicating the frequency of certain words and characters in the email.

The last column (i.e., variable 58) indicates the type of the mail and is either “nonspam” or “spam”, (i.e. unsolicited commercial email).

Load the caret and kernlab packages:

library(caret)
library(kernlab)

Load the spam data:

data(spam)

Create training and test sets:

inTrain <- createDataPartition(y=spam$type, p=0.75, list=FALSE) # creates test/training partitions # returns Training Set Indeces
training <- spam[inTrain,] # Training Set
testing <- spam[-inTrain,] # Test Set
dim(training) 
## [1] 3451   58

Fit predictive models over different tuning parameters:

set.seed(32343) # to allow reproducibility of results
modelFit <- train(type ~.,data=training, method="svmLinear") # Use the 'type' variable as labels; 'training' data to train
modelFit
## Support Vector Machines with Linear Kernel 
## 
## 3451 samples
##   57 predictors
##    2 classes: 'nonspam', 'spam' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## 
## Summary of sample sizes: 3451, 3451, 3451, 3451, 3451, 3451, ... 
## 
## Resampling results
## 
##   Accuracy  Kappa  Accuracy SD  Kappa SD
##   0.9       0.8    0.007        0.01    
## 
## Tuning parameter 'C' was held constant at a value of 1
## 

Final model using the best parameters

In the train control statement, you must specify classProbs = TRUE if the class probabilities must be returned.

modelFit <- train(type ~.,data=training, method="svmLinear", trControl = trainControl(method = "repeatedcv", repeats = 2, 
classProbs =  TRUE))
modelFit$finalModel
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 1 
## 
## Linear (vanilla) kernel function. 
## 
## Number of Support Vectors : 686 
## 
## Objective Function Value : -638 
## Training error : 0.067806 
## Probability model included.

Make test data predictions

Note: The returned values are probabilities themselves. Returned type can be Votes; however, such an option is not available.

predictProbs <- predict(modelFit,newdata=testing, type="prob")
head(predictProbs)
##    nonspam  spam
## 1 5.22e-05 1.000
## 2 3.07e-01 0.693
## 3 4.24e-01 0.576
## 4 8.45e-03 0.992
## 5 1.22e-01 0.878
## 6 7.43e-02 0.926

Train a Logistic Regression model:

labels <- testing$type
labels <- as.numeric(labels)-1
processed_data <- data.frame(predictProbs[,2],labels)
LOGISTIC_model <- train(labels ~.,data=processed_data, method="glm",family=binomial(logit))
LOGISTIC_model$finalModel
## 
## Call:  NULL
## 
## Coefficients:
##       (Intercept)  predictProbs...2.  
##             -3.78               8.76  
## 
## Degrees of Freedom: 1149 Total (i.e. Null);  1148 Residual
## Null Deviance:       1540 
## Residual Deviance: 449   AIC: 453

Display the Logistic Regression model coefficients:

LOGISTIC_model$finalModel$coefficients
##       (Intercept) predictProbs...2. 
##             -3.78              8.76

A and B are now estimated.

References

TRIAL
Try DataRobot for Free

Take your machine learning game to the next level

Sign up
About the author
Linda Haviland
Linda Haviland

Community Manager

Meet Linda Haviland
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog