DataRobot PartnersUnify all of your data, ETL and AI tools in our open platform with our Technology Partners, extend your cloud investments with our Cloud Partners, and connect with DataRobot Services Partners to help you build, deploy or migrate to the DataRobot AI Platform.
In 2017, the Property and Casualty Insurance industry saw few new customers enter the market: just 1% in auto and 4% in home, for a total of 2% overall (Bain). Due to the industry’s low volume of new policyholders, being able to retain existing customers becomes a significant priority. Although insurers put tremendous effort into accurately assessing risk and offering the lowest prices, Bain reports that more than half of US policyholders lapse because they can receive (on average) a 20% reduction in price elsewhere.
Unfortunately, many of the strategies insurers use today to reduce churn are largely reactive. Retention rates are important KPIs that allow insurers to keep track of their relationships with policyholders; however, KPIs only assess historical performance and don’t help insurers learn which policies will churn in the future. For underwriters to apply the appropriate intervention strategies, it’s critical that insurers understand which policies are at risk of churning.
AI helps insurers proactively increase their retention rates by predicting which policies are likely to churn in their upcoming renewals. After learning the complex patterns behind why policies churned in the past, AI models can apply those patterns to policies in the future. These models not only show underwriters the general drivers of churn across their portfolio, but also reveal the top reasons of predicted churn for each individual policy. Using these insights, underwriters can address policies at risk based on their unique attributes.
Senior managers can leverage the aggregated predictions of churn to develop data-driven forecasting on renewals, while pricing actuaries can also use insights from these models to improve the competitiveness of their pricing plans across the various segments of their book.
How valuable is this use case?
Depending on the action taken based on model predictions and the size of the book, an improvement in the overall retention rate by 1% could mean a significant improvement in renewal income. Take a book of $1 billion written premium for example: 1% of improved retention rate amounts to a $10 million increase in gross income from this book of business. In addition, model prediction guided actions could potentially improve overall loss ratio; with an improved bottom line, insurers can also profitably grow the top line, improving the health of the book of business.
About the Data
For illustrative purposes, we are going to use a synthetic historical dataset of a personal auto line where we already know whether past policies churned or not. Insurance churn rate is often evaluated at the policy level, therefore, all the features in this dataset are also organized at the policy level.
The target variable for this use case is a binary variable: a policy churned (1) or not (0). So this is a binary classification problem.
The features relevant to predicting this target revolve around policy data. Below are several examples of features that may be relevant. That said, beyond these features, we suggest incorporating any additional data your organization may collect that could be relevant to identify predicting churn. DataRobot will help you distinguish which ones are important and which ones aren’t.
Sample Feature List
Avg Driver Age
The average age for all drivers on the policy
Avg Premium per Vehicle
Average premium per vehicle for the current policy term
Avg Vehicle Age
The average age of all vehicles on the policy
Full Coverage Proportion
percentage of vehicles with full coverage (both liability and physical damage)
# drivers on the policy
Gender of drivers: 0 = all Female, 1 = all Male; 2 = mixed
Min Driver Age
Minimum driver age
Policy premium for the current term
Pct Premium Change
Relative premium change
Policy Credit Indicator
Policy Credit Indicator
Policy Lapse Indicator
Whether the policy has coverage lapses in the past year
Bodily Injury Limit
Multiple Policies Indicator
Target: Whether a policy has churned or not, 1 = Churned; 0 = Not Churned
Number of years policy has been insured by the carrier
Underwriting/Pricing Tiers, 1 = Best tier; 15 = worst tier
Years Prior Insurer
# years the policyholder was insured by the prior carrier
Number of vehicles on the policy
Personal auto insurers usually have several databases: policy, vehicle, and claims. The necessary features from the separate tables should be joined so that churn is evaluated on a policy level. A policy can have more than one vehicle, and churn is defined as when all of the vehicles are removed from the policy, not just one.
DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.
While we will jump straight to the model results, take a look here to see how to use DataRobot from start to finish and how to understand the data science methodologies embedded in its automation.
Interpret the Results
Feature Impact—Which features are important to the model:
For a selected model, it would be helpful to know which features are the key drivers of the model. The Feature Impact plot ranks the features from the most important to the least important and also shows the relative importance of those features. In the below example, we can see that BI_Limit is the most important feature for this model, followed by Avg Driver Age, Tier, Vehicle Count, and so forth.
Feature Effects—How does each feature drive the model prediction:
Now that we know which features are important to the model, we can use the Partial Dependence graph to learn how each feature affects the predictions. In the Partial Dependence plot for Tenure (see below), it can be observed that the probability of churn decreases monotonically with policy tenure. In other words, the longer a policy stays with a carrier, the less likely it will churn with everything else held equal.
Prediction Explanation—What are the drivers for each individual prediction:
People like explanations. When an underwriter sees a very high or low prediction for policy churn, they might be wondering what features are contributing to the predictions. The insights at each prediction level cannot only help the underwriter understand how a prediction is made, but also increase their confidence in using the model. DataRobot, by default, provides the top 3 Prediction Explanations while the user can request up to 10 explanations. Model predictions and explanations can be downloaded in a CSV file and you can control which predictions will be populated in the downloaded CSV file by specifying the thresholds for high and low prediction. The graph below shows the top 3 explanations for the 3 highest and lowest predictions. From this graph, you can tell that, in general, the high predictions (i.e., high retention or low churn) are associated with long tenure and higher liability limits; while the low predictions (i.e., low retention or high churn) are associated with younger drivers and higher average premium per vehicle.
A Lift Chart is one of the approaches to evaluate model accuracy and effectiveness. The Lift Chart below shows how effective the model is in terms of differentiating policy holders who are less likely to renew (on the left) from those who are more likely to renew (on the right). And the fact that the actual (orange curve) closely tracks the predicted (blue curve) tells us that the model is fitting the data well.
After you are able to find the right model that best learns patterns in your data, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process.
Automation | Augmentation | Blend
Ideally, the policy Churn model should be integrated with the insurer’s policy administration system so that for every renewal policy, a policy churn score can be produced. Complicated business rules are normally put in the same system to trigger an underwriter review before a renewal policy is processed. “Policy churn score over xx” can be one of the defined business rules.
Actuaries, product managers, and other management teams may want to receive monthly reports about policy retention, both actual and predicted.
There are several ways the model can be deployed, depending on how ready it is to be deployed.
DataRobot Drag and Drop or REST API—Before the model is fully integrated into production, a pilot may be beneficial for 1) testing the model performance using new data; 2) monitoring unexpected scenarios so business rules can be adjusted accordingly; and 3) increasing the end-users’ confidence in using the model outputs to assist business decision making.
Connection to Other Systems—once everybody feels comfortable about the model and also the process, integration of the model to production systems (or policy center, in this case) can maximize the value of the model.
Senior management team
Underwriters can use the predictions to determine whether any action can be taken proactively to avoid a policy churn. Product managers and Pricing Actuaries will use the predictions to assist their understanding of the competitive position of existing pricing plans so product managers can adjust its new business model and actuaries can take into account the findings in the next rate review.
Regular reports are going to be produced and distributed to different stakeholders. For organizations with dashboard capabilities, model predictions can be integrated with the dashboard tool so real-time reports can be accessed by different stakeholders.
If the REST API is used to deploy the model, various metrics such as service health, data drift, and accuracy can all be monitored within DataRobot’s platform.
If the user chooses to deploy the model outside of DataRobot, DataRobot MLOps can be leveraged to monitor essentially all the models deployed across the organization.
Fail to make predictions intuitive for underwriters to understand
Fail to help underwriters interpret the predictions and understand why the model makes the predictions
Fail to build in proper business rules to capture abnormal activities
Experience the DataRobot AI Platform
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Insurance companies are using machine learning and AI to increase top and bottom line through gaining competitive advantages, reducing expenses, and improving efficiencies. They are optimizing all areas of their business from underwriting to marketing in order to make data-driven decisions to lead to increased profitability.