Predict Employee Happiness and Anticipate Turnover
Overview
Business Problem
In the current competitive landscape, employee happiness and retention are key driving factors behind an organization’s success. Companies of all sizes are in a constant battle for talent: to both hire and most importantly to retain their employees. Getting an employee trained and productive is very resource-intensive and requires considerable upfront investments of both money and time, which makes it even more important to ensure that those employees are happy and don’t end up leaving. Being able to predict which employees are unhappy and the drivers behind their unhappiness would allow organizations to tactfully intervene and modify business practices that contribute to employee unhappiness and attrition.
Intelligent Solution
AI empowers organizations to identify drivers of employee unhappiness and enables company leaders to address business practices that are adversely impacting employee satisfaction, which then leads to employee attrition. Additionally, being able to identify why a particular employee is unhappy enables the organization to custom-tailor a retention strategy to the individual level instead of simply relying on broader policies that might end being too generic.
Value Estimation
How would I measure ROI for my use case?
1) Reduced costs associated with experienced employees such as recruiting fees, compensation premiums, and new hire trainings typically associated when bringing in new talent:
(Pre-DR Attrition Pct – Post-DR Attrition Pct)
x Avg Compensation
x Compensation Premium Pct
x Recruiting Fee Multiplier
x New Hire Training Cost Multiplier
= ROI (Savings)
2) Increased operational efficiencies by reducing the number of new hire transitions:
Business Unit (BU) Operational Output (# of units or $) Post-DR Attrition Model
– Business Unit (BU) Operational Output (# of units or $) Pre-DR Attrition Model
= ROI (Savings)
3) Reduced operational risk by lowering the impact of institutional knowledge walking out the door:
This one is more qualitative in nature and not easily quantifiable; however, the impact could also be significant.
Technical Implementation
Problem Framing
The target variable for this use case is whether or not the employee will churn or go on a leave of absence due to stress or negative results on employee surveys (Binary; True or False, 1 or 0, etc.). This choice in target makes this a binary classification problem.
The features below represent key factors that are important for predicting employee churn. They encompass each employee’s background, current role, and survey history, which will help DataRobot find relevant patterns across the employee’s HR profile to assess their churn risk.
Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to your specific employees. For example, one organization found that the distance between where a person lived and where they worked was predictive for happiness. You should apply your understanding of your specific organization’s context to identify what creative features might work well. As you will see later, DataRobot is able to help you quickly differentiate important/unimportant features.
Other external data sources that may also be relevant include: More detailed company activities (e.g., average number of daily emails they send), effective job offers, competitors’ average salary, and data for activity outside the company.
Sample Feature List
Feature Name | Data Type | Description | Data Source | ||
---|---|---|---|---|---|
Churn, Attrition or Leave of absence | Binary (Target) | Whether or not the employee churned or had leave of absence after between 60 to 180 days | HR data | False | – |
Age | Numeric | Age of the employee | HR data | 35 | Medium |
Gender | Category | Male or Female | HR data | Female | Low |
Education | Category | Education level | HR data | Below College | Medium |
Location | Geo | Where does the employee lives | HR data | NY | Low |
MaritalStatus | Category | Marital status of employee | HR data | single | Low |
EducationField | Text | Major of the employee | HR data | Computer Science | Low |
Home_ownership | Category | MORTGAGE,RENT,OWN,OTHER | HR data | Lent | Low |
Department | Category | Which Department does an employee belong to. | HR data | Information Technology | Low |
JobTitle | Category | Job title of employee | HR data | DataScientist | |
Salary | Numeric | Employee’s salary | HR data | 80,000 | High |
OverTime | Numeric | Avg hour of employee’s overtime for last 3month | HR data | 46 | High |
ManagerID | Category | Who is the manager of that employee | HR data | J.J. | |
RelationshipwithManager | Numeric | How long does the employee work under the same Manager | HR data | 10 | Low |
Personnel Evaluation | Numeric | Personnel evaluation score | HR data | 4 | Medium |
JobSatisifactionScore | Numeric | Job satisfaction score from survey | Survey | 3 | Medium |
ManagerSatisifactionScore | Numeric | Manager satisfaction score from survey | Survey | 4 | Medium |
SalarySatisifactionScore | Numeric | Salary satisfaction score from survey | Survey | 3 | Medium |
AvgSurveyScore | Numeric | Average of survey 1 to X | Survey | 3.3 | Medium |
Data Preparation
It is very common for HR data to be stored and used as snapshots. Historicity though, is very important in machine learning; what matters most is not the current values of the different variables but how these variables have changed when compared to the past. For example, Personnel Evaluation doesn’t have high relevance for employees with mid to high scores, but seeing how the Personnel Evaluation scores change from year-to-year and quarter-to-quarter will be a key factor. Some other factors to take into account:
- How does the salary change from last year?
- Has the employee’s overtime hours increased this year?
- Has the employee’s job title changed recently?
- Has the employee’s survey score drastically changed from last year?
- Has the employee’s manager changed recently?
Depending on the number of employees that have churned, you might need to format your data differently. More specifically, if you only have a few employees churning, then you would need to format your data, hence change your unit of analysis to person per period. If you format your data this way, you would need to use a group partition from within DataRobot with the column employeeID as the key.
Also, the reason for attrition tends to change by employees’ job titles and roles. For causal analytics we need to split the dataset to each job title and role.
Model Training
DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.
Post-Processing
Most of the managers are only able to see the attrition rate of their team members so it is hard to notice which employees are at high risk of churning compared to other employees. You can turn probabilities into intuitive labels by using two types of thresholds: one for all employees and one for the specific department level (low threshold in all employees, mid threshold in specific department).
Business Implementation
Decision Environment
After you are able to find the right model that best learns patterns in your data to predict employee happiness, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization and how they will ultimately make decisions using the predictions that impact your process.
Decision Maturity
Automation | Augmentation | Blend
At its core, DataRobot empowers your HR professionals and business managers with the information they need to make informed decisions about employees and drive increased employee satisfaction.
While there are use cases where decisions can be automated in a data pipeline, an employee churn model is geared to augment the decisions of your management team. It acts as an intelligent machine that, when combined with the expertise of your HR professionals, will help improve your overall employees experience.
Model Deployment
DataRobot will provide your trusted decision makers with complete transparency on the top key drivers of employee happiness as well the specific ones at the employee level. This approach will empower them to make both strategic and tactical decisions to ensure employee happiness. Below are a few approaches on how to incorporate the model’s predictions in your decision-making process:
- Connect model output to a centralized HR database by storing the list in a database that can be accessed by other business systems
- Connection with other HR systems that are embedded in the prediction consumer’s day-to-day business workflow. Results can be integrated into Workday or other HR systems and BI dashboards. Common business intelligence integrations include Tableau, Power BI, and Excel.
Decision Stakeholders
Decision executors are both the HR professionals as well as the business managers entrusted with enhancing employee experience. The diagram below depicts a workflow leveraging the same model.

Decision Process
Thresholds can be set to determine whether a prediction constitutes a foreseen churn event or not. Assign clear action items for each level of threshold so that the HR professionals and business managers can prescribe the necessary intervention strategies.
- Low Risk: No immediate action.
- Medium Risk: Investigate underlying features that might be contributing to higher attrition risk. Identify the ones that that business can immediately act on to improve, and do so accordingly. Inspecting the Feature Impact and Feature Importance charts will help the business make more optimal decisions.
- High Risk: For employees with a higher risk of churning, a more tactful approach might be required. Not every employee will have the same reason for being unhappy and considering leaving. Here the Prediction Explanations (read more here) and the What-If application (read more here) can be great tools to help explain potential alternatives that might improve a specific employee’s happiness at work and make them reconsider decisions for leaving. Coupled with your talented HR team, this highly personalized strategy can certainly help in stemming the attrition of your valuable, talented employees.
Model Monitoring
- Decision Operators: HR IT/System Operations, HR Data Scientists
- Prediction Cadence: Both batch predictions on a quarterly basis and ad hoc ones
- Model Retraining Cadence: Models retrained once data drift reaches an assigned threshold; otherwise, retrain the models at the beginning of every new operating quarter.
Implementation Risks
The monitoring risk of this model is relatively low. For this type of use case, the model is normally retrained once a quarter or potentially even less frequently. On the prediction side, given the frequency we highlighted above (batch & ad hoc), the overall prediction implementation risk is very low. There are potential operational risks regarding the proper use of the model and the results it produces; in most organizations, this type information will be strictly contained within the HR organization and only shared with key business leaders.
Trusted AI
In addition to traditional risk analysis, the following elements of AI Trust may require attention in this use case.
Target leakage: Target leakage describes information that should not be available at the time of prediction being used to train the model. That is, particular features may leak information about the eventual outcome that will artificially inflate the performance of the model in training. This use case requires the aggregation of historical data, making it vulnerable to potential target leakage. In the design of this model and the preparation of data, it is pivotal to identify the point of prediction and ensure no data (e.g., survey-related data, hours worked, salary, title, manager changes, etc.) be included past that time. DataRobot additionally supports robust target leakage detection in the second round of exploratory data analysis and the selection of the Informative Features feature list during autopilot. (Learn more here about target leakage.)
Bias & Fairness: This use case leverages features that may be categorized as protected or may be sensitive (age, gender, location, education, salary, etc.). Therefore, it may be advisable to assess if there are equal error rates across protected or sensitive groups in the model; for example, compare if male and female employees have equivalent false negative and false positive rates so that no one (gender) group is disproportionately targeted for intervention erroneously. Mitigation techniques may be explored at various stages of the modeling process, if it is deemed necessary. Special handling may also be needed if a protected or sensitive characteristic is surfaced as one of the primary drivers of an individual’s potential attrition in prediction explanations. (Learn more about how DataRobot helps explain model bias here.)

Experience the DataRobot AI Platform
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
Sign Up for FreeExplore More Use Cases
-
Industry AgnosticScore Incoming Job Applicants
Identify the most-qualified candidates from a broader pool of job applicants.
Learn More -
Industry AgnosticPredict Optimal Marketing Attribution
Optimize your marketing attribution by discovering which combination of touch points will lead to the highest amount of conversions.
Learn More -
Industry AgnosticClassify Customers into Predefined Categories
Better understand your customers by categorizing them into predefined customer segments.
Learn More -
Industry AgnosticMaximize Conversion Rates for Online Email Promotions
Increase advertising ROI and customer retention rates by matching email promotions to the customers most likely to take action.
Learn More
-
Industry AgnosticScore Incoming Job Applicants
Identify the most-qualified candidates from a broader pool of job applicants.
Learn More -
Industry AgnosticPredict Optimal Marketing Attribution
Optimize your marketing attribution by discovering which combination of touch points will lead to the highest amount of conversions.
Learn More -
Industry AgnosticClassify Customers into Predefined Categories
Better understand your customers by categorizing them into predefined customer segments.
Learn More -
Industry AgnosticMaximize Conversion Rates for Online Email Promotions
Increase advertising ROI and customer retention rates by matching email promotions to the customers most likely to take action.
Learn More