HR Recruiting Use Case Series: Part 2
One of the most fundamental challenges of machine learning for HR is defining success. How do you define success for a new hire? Would all your hiring managers and HR leaders answer this question the same way? Does success mean getting hired, or does a new hire need to stay for a certain amount of time? Does success mean getting promoted? What about people who outperform expectations, but don’t move up the ladder? All of these are valid definitions of success or could be part of a composite definition of success that cuts across the dimensions of attendance, behaviors, retention, trajectory and performance.
In the context of recruiting algorithms, whatever you (the analyst or data scientist) define as success is the profile that your machine learning algorithm will learn to optimize for. My blog post today is about crafting the best possible target when designing a recruiting algorithm. The target feature encodes your definition of success and does not need to be the same definition that other recruiting teams use. I will present choices and considerations specific to recruiting algorithms.
Alternate Definitions of Success
Here are nine approaches for defining success, separated into pre-hire and post-hire stages. In my experience #4 and #7 are the most common, but I have a personal preference for #2, #3, and blend of #6 thru #9.
Pre-Hire Definitions of Success
- Who gets an initial interview?
- Who reaches the penultimate stage in the recruiting process?
- Who gets an offer?
- Who gets hired?
- Who starts?
Post-Hire Definitions of Success
- Who reaches X months of tenure?
- Who achieves a high performance rating in their first review cycle?
- Who meets or surpassess quantitative metrics for the role? (e.g., call resolution rate, customer satisfaction scores, sales revenue closed, attendance, safety incidents)
- Who gets promoted within a certain number of years?
Hires vs Offers
Instead of training your model on the applicants that get hired (#4), consider training on the applicants that receive an offer (#3). The difference in the two definitions is simple: “applicants hired” excludes people that your hiring managers wanted, but who chose not to join. It may even be that the hiring managers wanted these people more than the person who was eventually hired, since additional offers of employment are made in response to a top choice candidate declining. ‘Offers’ is the better choice if your goal is to teach a machine to make decisions that reflect hiring managers’ preferences.
Furthermore, a model trained on hires (#4) may not perform well because candidates decline job offers for reasons that have nothing to do with their qualifications. Let’s say that our organization is looking for accounting professionals. If two applicants with the resume text feature “Master in Accounting” both receive offers, but only one is hired because the other declines, a model trained on hires cannot learn that this was a positive attribute. With ‘hires’ as the definition of success, one applicant would be in the positive class and the other in the negative class which is akin to the text feature being irrelevant. With ‘offers’ (#3) as the definition of success, both records would be in the positive class.
Offers vs Finalists
Defining applicant success as ‘reaching the final stage’ (#2) is another option to consider and has a few advantages over defining success based on ‘offers’ (#3), especially for highly selective organizations. In an organization that makes offers to 1 in 100 applicants, but has four very high-quality final stage candidates for every opening, it may be better to balance your data set as 4:96 instead of 1:99. The four final-stage applicants have all already surpassed 96 others and have all achieved the quality bar of meeting the ultimate decision maker. Furthermore, the decision maker’s choice among the four finalists is probably much more strongly related to interview performance, which your model probably doesn’t have access to, than to any attributes on the resume or application.
This is also more practical from a change-management perspective. Hiring managers usually want to interview a short list of finalists rather than allowing an algorithm to actually pick which applicant will receive an offer. Training an algorithm to identify the finalists who the hiring manager will meet most closely resembles the real-world model usage.
Modeling on post-hire outcomes (#6–#9) has challenges of its own.
My number one tip for post-hire targets is to define them in such a way that two-thirds or more of your workforce is in the successful ‘positive’ class. After all, you wouldn’t be in business if only 10% of your workforce was successful, so don’t set impossible standards. An excellent way to achieve this is with OR statements applied to tenure, performance, and mobility outcomes.
For example, an employee could be successful if they have above average performance in the first review cycle OR tenure in their organization that exceeds the 50th percentile. As an OR statement, this should result in more than half the workforce being classified as successful. In practice I’ve seen organizations use mostly AND statements, resulting in success classes that are small and unrealistic. This leads to a falsely pessimistic view of pipeline quality and delays in filling roles.
Another reason to use OR statements in defining post-hire success is that it recognizes the greater diversity of employee strengths. While one call center representative might have long tenure and know all the company policies, another may have fantastic customer service scores, another may be very fast at resolving complaints and another may have been promoted to a managerial role. A set of OR statements applied to tenure, performance, and mobility outcomes recognizes many paths to success creating a more trustworthy and ethical AI system.
If the CHRO (Chief Human Resources Officer) or hiring managers at your organization want you to build a model based on only the ‘top performers,’ use a picture such as Figure 2 and ideas from the first post in this series to explain why this may be a bad idea. You’ll also need to know your historical hire rate (i.e., the percent of applicants hired to fill vacancies). Look at the sample picture. The historical hire rate is 30%, so this organization needs a model that will assign roughly 30% of applicants to the positive class or their open requisitions will pile up. The 20/80 positive/negative training split shown in the image will result in a useful model. There should be sufficient applicants that score well on the model if the applicant pool stays roughly the same.
However, if success is very narrowly defined as the top 10% of existing employees (which would be 1.5 robots in the image), then only 10% of 30% (or 3 out of 100) would be part of the positive class. This will result in a model looking for elusive characteristics learned from the 3% that you simply won’t find in 30% of your applicant pool.
To wrap up, remember that all post-hire choices for defining success are subsets of ‘hires.’ In the pre-hire section, I described why ‘finalists’ (#2) and ‘offers’ (#3) may be better choices than ‘hires’ (#4). By defining your target using any post-hire attribute (#6–#9), you are accepting that finalists and applicants receiving offers but not joining will be part of the negative class in your models.
This is the second post in a series on designing recruiting algorithms for HR. (In the first post, I covered selecting between screen-in and screen-out approaches.) In future posts, I’ll be covering topics that include the choice of initial training data, common pitfalls when retraining, model blindspots, and ideas for improving baseline model performance.