Predict Employee Happiness and Anticipate Turnover

Industry Agnostic Human Resource Improve Company Culture Augmentation Binary Classification End to End
Nurture your workforce by predicting the future happiness of employees and proactively reduce the likelihood of employee churn or turnover.
Build with Free Trial


Business Problem

In the current competitive landscape, employee happiness and retention are key driving factors behind an organization’s success. Companies of all sizes are in a constant battle for talent: to both hire and most importantly to retain their employees. Getting an employee trained and productive is very resource-intensive and requires considerable upfront investments of both money and time, which makes it even more important to ensure that those employees are happy and don’t end up leaving. Being able to predict which employees are unhappy and the drivers behind their unhappiness would allow organizations to tactfully intervene and modify business practices that contribute to employee unhappiness and attrition.

Intelligent Solution

AI empowers organizations to identify drivers of employee unhappiness and enables company leaders to address business practices that are adversely impacting employee satisfaction, which then leads to employee attrition. Additionally, being able to identify why a particular employee is unhappy enables the organization to custom-tailor a retention strategy to the individual level instead of simply relying on broader policies that might end being too generic.

Value Estimation

How would I measure ROI for my use case? 

1) Reduced costs associated with experienced employees such as recruiting fees, compensation premiums, and new hire trainings typically associated when bringing in new talent: 

(Pre-DR Attrition Pct – Post-DR Attrition Pct) 

x Avg Compensation 

x Compensation Premium Pct 

x Recruiting Fee Multiplier

x New Hire Training Cost Multiplier

= ROI (Savings) 

2) Increased operational efficiencies by reducing the number of new hire transitions:

Business Unit (BU) Operational Output (# of units or $) Post-DR Attrition Model 

– Business Unit (BU) Operational Output (# of units or $) Pre-DR Attrition Model 

= ROI (Savings)

3) Reduced operational risk by lowering the impact of institutional knowledge walking out the door:

This one is more qualitative in nature and not easily quantifiable; however, the impact could also be significant.

Technical Implementation

Problem Framing

The target variable for this use case is whether or not the employee will churn or go on a leave of absence due to stress or negative results on employee surveys (Binary; True or False, 1 or 0, etc.). This choice in target makes this a binary classification problem.

The features below represent key factors that are important for predicting employee churn. They encompass each employee’s background, current role, and survey history, which will help DataRobot find relevant patterns across the employee’s HR profile to assess their churn risk.

Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to your specific employees. For example, one organization found that the distance between where a person lived and where they worked was predictive for happiness. You should apply your understanding of your specific organization’s context to identify what creative features might work well. As you will see later, DataRobot is able to help you quickly differentiate important/unimportant features. 

Other external data sources that may also be relevant include: More detailed company activities (e.g., average number of daily emails they send), effective job offers, competitors’ average salary, and data for activity outside the company.

Sample Feature List
Feature NameData TypeDescriptionData Source
Churn, Attrition or Leave of absenceBinary (Target)Whether or not the employee churned or had leave of absence after between 60 to 180 daysHR dataFalse
AgeNumericAge of the employeeHR data35Medium
GenderCategoryMale or FemaleHR dataFemaleLow
EducationCategoryEducation levelHR dataBelow CollegeMedium
LocationGeoWhere does the employee livesHR dataNYLow
MaritalStatusCategoryMarital status of employeeHR datasingleLow
EducationFieldTextMajor of the employeeHR dataComputer ScienceLow
Home_ownershipCategoryMORTGAGE,RENT,OWN,OTHERHR dataLentLow
DepartmentCategoryWhich Department does an employee belong to.HR dataInformation TechnologyLow
JobTitleCategoryJob title of employeeHR dataDataScientist
SalaryNumericEmployee’s salaryHR data80,000High
OverTimeNumericAvg hour of employee’s overtime for last 3monthHR data46High
ManagerIDCategoryWho is the manager of that employeeHR dataJ.J.
RelationshipwithManagerNumericHow long does the employee work under the same ManagerHR data10Low
Personnel EvaluationNumericPersonnel evaluation scoreHR data4Medium
JobSatisifactionScoreNumericJob satisfaction score from surveySurvey3Medium
ManagerSatisifactionScoreNumericManager satisfaction score from surveySurvey4Medium
SalarySatisifactionScoreNumericSalary satisfaction score from surveySurvey3Medium
AvgSurveyScoreNumericAverage of survey 1 to XSurvey3.3Medium
Data Preparation 

It is very common for HR data to be stored and used as snapshots. Historicity though, is very important in machine learning; what matters most is not the current values of the different variables but how these variables have changed when compared to the past. For example, Personnel Evaluation doesn’t have high relevance for employees with mid to high scores, but seeing how the Personnel Evaluation scores change from year-to-year and quarter-to-quarter will be a key factor. Some other factors to take into account: 

  • How does the salary change from last year?
  • Has the employee’s overtime hours increased this year?
  • Has the employee’s job title changed recently?
  • Has the employee’s survey score drastically changed from last year?
  • Has the employee’s manager changed recently?

Depending on the number of employees that have churned, you might need to format your data differently. More specifically, if you only have a few employees churning, then you would need to format your data, hence change your unit of analysis to person per period. If you format your data this way, you would need to use a group partition from within DataRobot with the column employeeID as the key. 

Also, the reason for attrition tends to change by employees’ job titles and roles. For causal analytics we need to split the dataset to each job title and role.

Model Training

DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.


Most of the managers are only able to see the attrition rate of their team members so it is hard to notice which employees are at high risk of churning compared to other employees. You can turn probabilities into intuitive labels by using two types of thresholds: one for all employees and one for the specific department level (low threshold in all employees, mid threshold in specific department).

Business Implementation

Decision Environment 

After you are able to find the right model that best learns patterns in your data to predict employee happiness, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization and how they will ultimately make decisions using the predictions that impact your process. 

Decision Maturity 

Automation | Augmentation | Blend

At its core, DataRobot empowers your HR professionals and business managers with the information they need to make informed decisions about employees and drive increased employee satisfaction. 

While there are use cases where decisions can be automated in a data pipeline, an employee churn model is geared to augment the decisions of your management team. It acts as an intelligent machine that, when combined with the expertise of your HR professionals, will help improve your overall employees experience. 

Model Deployment 

DataRobot will provide your trusted decision makers with complete transparency on the top key drivers of employee happiness as well the specific ones at the employee level. This approach will empower them to make both strategic and tactical decisions to ensure employee happiness. Below are a few approaches on how to incorporate the model’s predictions in your decision-making process:

  • Connect model output to a centralized HR database by storing the list in a database that can be accessed by other business systems 
  • Connection with other HR systems that are embedded in the prediction consumer’s day-to-day business workflow. Results can be integrated into Workday or other HR systems and BI dashboards. Common business intelligence integrations include Tableau, Power BI, and Excel.
Decision Stakeholders 

Decision executors are both the HR professionals as well as the business managers entrusted with enhancing employee experience.  The diagram below depicts a workflow leveraging the same model. 

Decision Process

Thresholds can be set to determine whether a prediction constitutes a foreseen churn event or not. Assign clear action items for each level of threshold so that the HR professionals and business managers can prescribe the necessary intervention strategies.  

  • Low Risk: No immediate action.
  • Medium Risk: Investigate underlying features that might be contributing to higher attrition risk. Identify the ones that that business can immediately act on to improve, and do so accordingly. Inspecting the Feature Impact and Feature Importance charts will help the business make more optimal decisions. 
  • High Risk:  For employees with a higher risk of churning, a more tactful approach might be required. Not every employee will have the same reason for being unhappy and considering leaving. Here the Prediction Explanations (read more here) and the What-If application (read more here) can be great tools to help explain potential alternatives that might improve a specific employee’s happiness at work and make them reconsider decisions for leaving. Coupled with your talented HR team, this highly personalized strategy can certainly help in stemming the attrition of your valuable, talented employees.
Model Monitoring
  • Decision Operators: HR IT/System Operations, HR Data Scientists 
  • Prediction Cadence: Both batch predictions on a quarterly basis and ad hoc ones 
  • Model Retraining Cadence: Models retrained once data drift reaches an assigned threshold; otherwise, retrain the models at the beginning of every new operating quarter.
Implementation Risks

The monitoring risk of this model is relatively low. For this type of use case, the model is normally retrained once a quarter or potentially even less frequently. On the prediction side, given the frequency we highlighted above (batch & ad hoc), the overall prediction implementation risk is very low. There are potential operational risks regarding the proper use of the model and the results it produces; in most organizations, this type information will be strictly contained within the HR organization and only shared with key business leaders. 

Trusted AI

In addition to traditional risk analysis, the following elements of AI Trust may require attention in this use case. 

Target leakage: Target leakage describes information that should not be available at the time of prediction being used to train the model. That is, particular features may leak information about the eventual outcome that will artificially inflate the performance of the model in training. This use case requires the aggregation of historical data, making it vulnerable to potential target leakage. In the design of this model and the preparation of data, it is pivotal to identify the point of prediction and ensure no data (e.g., survey-related data, hours worked, salary, title, manager changes, etc.) be included past that time. DataRobot additionally supports robust target leakage detection in the second round of exploratory data analysis and the selection of the Informative Features feature list during autopilot. (Learn more here about target leakage.)

Bias & Fairness: This use case leverages features that may be categorized as protected or may be sensitive (age, gender, location, education, salary, etc.). Therefore, it may be advisable to assess if there are equal error rates across protected or sensitive groups in the model; for example, compare if male and female employees have equivalent false negative and false positive rates so that no one (gender) group is disproportionately targeted for intervention erroneously. Mitigation techniques may be explored at various stages of the modeling process, if it is deemed necessary. Special handling may also be needed if a protected or sensitive characteristic is surfaced as one of the primary drivers of an individual’s potential attrition in prediction explanations. (Learn more about how DataRobot helps explain model bias here.)

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free
build models
Explore More Industry Agnostic Use Cases
AI can help organizations across the board, no matter their industry, with a variety of internal and external challenger - from driving operational efficiency and optimizing expenditures to transforming marketing activities and improving forecasting.