Predict Overpaid Medical Claims (Fraud, Waste, Abuse)
Overview
Business Problem
According to a study published in the Journal of the American Medical Association (JAMA), overpayments (or fraud, waste, abuse) account for approximately 25% of total healthcare spend each year. The FBI estimates overpayments cost between 3% and 10% of total healthcare spending, or $109 to $365 billion annually. For payers, this not only results in direct losses from overpayments made to providers, but also leads to significant inefficiencies in their resource allocation. Payers accrue costly administrative expenses in both claims adjudication and outdated payment integrity programs as a result of these fraudulent activities.
Traditional payment integrity solutions leave healthcare payers vulnerable to overpayments primarily due to several common challenges. First, since the rules engines payers rely on to flag claims frequently overgeneralize certain assumptions, they have a high margin of error and often wrongly classify providers as high risk. Second, because these rules engines do not help investigative teams prioritize flagged claims by their probabilistic scores, investigative resources are not optimized to efficiently recover the greatest possible number of overpayments.
Intelligent Solution
AI helps investigative teams at healthcare payers and agencies maximize their impact to the bottom line by making better decisions faster. By learning the complex patterns in the data behind those past cases of overpayments, AI can predict the likelihood of overpayments in new, incoming claims. Healthcare payers and agencies can enable their investigative teams to identify high risk claims as soon as evidence emerges in their data. AI will not only show which claims are likely to be overpayments, but will also reveal the top reasons behind its predictions. By understanding each claim’s unique drivers, investigators are able to accelerate their review and recovery process.
In addition to using supervised machine learning to identify known behaviors of overpayments, unsupervised machine learning models identify unknown behaviors of overpayments by discovering claims that appear to be anomalous. Investigators can use this information to prioritize the review of anomalous claims and to retrain their supervised machine learning models with results from their latest investigations. This prevents loopholes from appearing in their models as they can constantly adapt to the latest patterns of overpayments. As a result, through their operations, payers and agencies can always stay ahead of the curve and reduce the number of overpaid claims that go unrecovered.
Technical Implementation
About the Data
For this tutorial, we are going to be using a sample dataset provided by CMS Medicare that relates to healthcare overpayments involving physicians, beneficiaries, and medical equipment companies. The data has been aggregated at the physician level. There are almost 12K rows with 17 columns, including the target variable.
Problem Framing
The target variable for this use case is whether or not a claim was overpaid (fraud, waste, or abuse), which makes this a binary classification problem.
The input variables are the prescriptions, costs related to the prescriptions, specialty of the physicians, and other aggregated variables. Each row in the data represents a NPI (National Physician ID) and various attributes related to the physician.
Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to the use case. As you will see later, DataRobot is able to quickly differentiate important vs unimportant features.
Sample Feature List
Feature Name | Data Type | Description | Example |
---|---|---|---|
overpaid | Binary (Target) | Whether the claim was historically overpaid or not | 0 | 1 |
npi | Numeric | Unique ID | 1 |
city | Categorical | City of the Physician | DETROIT |
state | Categorical | State of the Physician | MI |
drug_list | Text | Drug Name | AZITHROMYCIN |
total_prescr_count | Numeric | Count of total no. of drugs prescribed by Physician | 22 |
total_prescr_days | Numeric | Total of drugs prescribed historically | 560 |
total_prescr_cost | Numeric | Cost of the total prescribed drugs | 1837 |
max_prescr_count | Numeric | Max. count of drugs prescribed by Physician | 56 |
max_days | Numeric | Max supply of drugs prescribed historically | 3082 |
max_cost | Numeric | Max. cost of drugs prescribed historically | 2484 |
prescripts_per_drug | Numeric | 73 | |
mean_days_per_drug | Numeric | Average supply of drugs historically | 2032 |
mean_cost | Numeric | Average cost of drugs historically | 111 |
mean_unit_cost | Numeric | Average per unit cost of drugs | 1.30 |
payments_from_pharma_comps | Numerical | Payments made from pharma companies | 92 |
speciality | Categorical | Speciality associated with payments data from pharmaceutical companies | Optometry |
*In DataRobot every variable except the target is called feature. We will use the same terminology for the rest of the document.
Model Training
DataRobot automates many parts of the modeling pipeline. Instead of having to hand-code and manually test dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. Before training and evaluating a diverse set of models on your data, DataRobot also automates preprocessing, feature transformation, and post processing.
We will jump straight to interpreting the model results. Take a look here to see how to use DataRobot’s platform from start to finish and how to understand the data science methodologies embedded in its automation
Interpret Results
Before we dive into individual plots to understand the model output, it is worth noting that the DataRobot AutoML interpretability suite is based on a model-agnostic framework. This means all of the DataRobot models can be easily understood by all users, business and technical alike.
Feature Impact
With Feature Impact, DataRobot lists all of the features which are most important based on their feature impact score. DataRobot uses permutation importance technique to compute these impact scores.
The feature with the highest impact on the model performance is given a score of 100% and all others feature scores are scaled relative to that top feature. In our case, we learn that drug_list, max_prescr_count, max_days, and speciality are the top four most impactful features that contribute to the performance of this model.

Feature Effects
Through Feature Impact we know which features are most important for the model performance; however, it is equally critical to understand the marginal effect that the top features have on the predicted outcome. Feature Effects serves this purpose by providing a partial dependence graph that illustrates how changes in the value within a feature affects the average predicted value.
In this case, we see that max_prescr_count, which is the maximum number of prescriptions that a physician prescribed, has a linear relationship with the probability of overpayments.

Similarly we see that claims submitted by physicians with specialties in ‘Psychiatry & Neurology’ have the highest probability of being overpayments, followed by ‘Internal Medicine.’

Prediction Explanations
For each prediction, DataRobot provides an ordered list of explanations. Each explanation is a feature from the dataset and its corresponding value, accompanied by a qualitative indicator of the explanation’s strength (strong (+++), medium (++), or weak (+)) and influence (positive or negative).

Text Mining
Using a Word Cloud, DataRobot indicates which prescription names are related to medical overpayments.
The red prescription names are correlated with high risk of overpayments, while the blue prescription names with low risk. The size of the word represents how frequently it occurs in the dataset.
Here we see that Levothyroxine sodium is a bi-gram and is the most common substitute drug of levothyroxine, one of the most prescribed drugs in the US. Therefore, levothyroxine is more commonly related to overpayments especially when prescribed as levothyroxine sodium. Conversely, prescriptions for amoxicillin (a common antibiotic) are very rarely associated with overpaid claims.

Business Implementation
Decision Environment
After you are able to find the right model that best learns patterns in your data, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the way in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make better decisions using the predictions, benefiting the overall process.
This is a critical piece of implementation for the use case as it ensures that predictions are used in the real world to reduce overpayments in medical claims. In practice, Special Investigations Units (SIU) in payers and agencies would be the teams responsible for consuming these predictions in to investigate potentially overpaid claims.
Decision Maturity
Automation | Augmentation | Blend
The purpose of this model is not to automate the procedure of identifying overpaid claims completely. Instead, the purpose is to use the model as the first line of defense against overpaid claims and separate obviously safe claims from those likely to be overpayments. This way the SIU can really focus on what matters.
Model Deployment
There are multiple ways to integrate the model into the business with little disruption:
- Microsoft Power BI (as demonstrated above) would work well for the SIU.
- Create an application that the SIU can interact with. This can be done with the Applications Gallery provided by DataRobot.
Here we will be using Power BI to link DataRobot predictions and associated prediction explanations in a Dashboard form. We assume that the SIU is not familiar with machine learning and that this would be the easiest way for them to consume the predictions of our model as they would be using a tool they are already familiar with.



Decision Stakeholders
Decision Executors
The Special Investigations Unit will be the direct consumer of the predictions.
Decision Managers
The decisions would flow through to the Chief Risk Officer who would be responsible for making sure that the process is working correctly.
Decision Authors
Depending on the company’s structure, either the data scientist team would work together with the SIU or the data scientists within the SIU would work directly with DataRobot to generate and deploy the models.
Decision Process
The people responsible for looking into potentially overpaid claims will be able to get the predictions alongside the Prediction Explanations. Then they will have to also apply their own expertise to identify the actual overpaid cases and go through the procedure of cancelling the related claims. In this way, your SIU team can triage your claims and prioritize the ones with the highest probability of FWA first. They can accelerate the review process by having transparency into the top risk drivers described by the model. Overall, this helps your payer or agency maximize the utilization and return of your SIU team’s efforts.
Model Monitoring
Predictions could be done in batch at the end of each working day after all medical claims have been received. Those cases most likely to be overpayments could then be made available the next day for review and further investigation. Both accuracy and drift should be monitored to evaluate when the model needs retraining.
Implementation Risks
If integrating the results of this use case into an already existing application or system, there are few implementation risks. The most probable causes of failure would be:
- Overpaid cases that have never been identified and then “trick” the algorithm into not finding patterns.
- Resistance from the stakeholders to accept the solution and have it as part of their processes.

Experience the DataRobot AI Platform
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeExplore More Use Cases
-
HealthcareImprove Patient Satisfaction Scores
Increase patient satisfaction scores by predicting which patients are likely to submit poor scores and the primary reasons. Design interventions to improve their satisfaction.
Learn More -
HealthcarePredict Suicide Warning Signs
Provide a supplementary assessment that helps prevent suicides and save lives by predicting ahead of time who is likely to commit suicide.
Learn More -
HealthcarePredict Which Patients Will Admit
Predict which patients are likely to be admitted to proactively improve their health.
Learn More -
HealthcarePredict Outpatient Appointment No Shows
Predict in advance which patients are likely to miss their appointments to reduce clinician downtime.
Learn More
-
HealthcareImprove Patient Satisfaction Scores
Increase patient satisfaction scores by predicting which patients are likely to submit poor scores and the primary reasons. Design interventions to improve their satisfaction.
Learn More -
HealthcarePredict Suicide Warning Signs
Provide a supplementary assessment that helps prevent suicides and save lives by predicting ahead of time who is likely to commit suicide.
Learn More -
HealthcarePredict Which Patients Will Admit
Predict which patients are likely to be admitted to proactively improve their health.
Learn More -
HealthcarePredict Outpatient Appointment No Shows
Predict in advance which patients are likely to miss their appointments to reduce clinician downtime.
Learn More