Classify Incoming Emails/Tickets into Predefined Categories

Industry Agnostic Customer Service Marketing / Sales Decrease Costs Improve Customer Experience Automation End to End Multiclass Classification RPA Topic Classification
Predict the email or ticket category in order to route it to the team that is best able to resolve it.
Build with Free Trial


Business Problem

Every day organizations receive thousands of emails that people read and interpret to determine which team or employee the email should be sent to. For example, in a support function for an organization, numerous tickets can arrive every couple minutes. Someone will read each email request to determine the criticality of the ticket and the person/team who should handle the issue.

Intelligent Solution

AI can train models to understand where these emails/tickets should route to along with the possible urgency behind the request. We can leverage Robotic Process Automation (RPA) to perform the steps of gathering the data within the email to submit to DataRobot. After the prediction, we can use RPA to route those emails/tickets to the correct category.

Value Estimation

What has ROI looked like for this use case? 

Organizations that have implemented the DataRobot + RPA solution have seen savings of over $1 million. Some of the larger corporations receive 8,000+ emails/tickets daily and have a team (usually offshore) manually reading and routing the new emails/tickets. (See this public reference for American Fidelity.)

How would I measure ROI for my use case? 

Here’s a way to calculate the ROI: Assume there are 5,000 emails/tickets a day and each email/ticket takes 1 minute to read (or, 5,000 minutes per day). That’s 83 hours for the teams to sort the email/tickets to the correct team each day. If each Full-Time Equivalent (FTE) handling the emails/tickets makes an all-in cost (including benefits) of $50.00 per hour, the DataRobot + RPA solution will save the organization around $4,000 per day. When multiplied by 261 working days per year, the total ROI is over $1 million dollars per year.

Technical Implementation

About the Data

For illustrative purposes, in this tutorial we work with a sample dataset of email/ticket requests that are directed to a central “Technology Admin” group which addresses various user requests such as supplies, and systems and facilities access. 

This real-world dataset has already been cleansed by the original authors to remove any identifying information. In addition to the email/ticket content, the data also contains additional information; for this use case we focus primarily on the ticket’s category and urgency. 

We’ll build two separate modelsCategory classification and Urgency classification—which independently have learned to both categorize the ticket and also assign urgency to it.

Problem Framing

For the Category classification model, the target variable is category which consists of four categories: system & database access, server access, facilities access, equipment and supplies

For the Urgency classification model the target variable is urgency which consists of four urgency categories: low, moderate, high, critical

For both Category and Urgency models the independent variables that the models are learning from are the “title” and “body” of the email/ticket. 

Sample Feature List

Ticket Category Features

Feature NameData TypeDescriptionData SourceExample
titletextemail/ticket titleemail/ticket system“Need new monitor”
bodytextemail/ticket bodyemail/ticket system“Hi, changed office location, need new monitor.” 
categorycategoricalrequest categoryemail/ticket systemequipment & supplies

Ticket Urgency Features

Feature NameData TypeDescriptionData SourceExample
titletextemail/ticket titleemail/ticket system“Need new monitor”
bodytextemail/ticket bodyemail/ticket system“Hi, changed office location, need new monitor.” 
urgencycategoricalrequest urgencyemail/ticket systemmoderate
Data Preparation 

The original dataset was relatively clean; however, we did do some additional data preparation as listed below: 

  • Dropped some of the other columns that were not relevant to our modeling exercise.
  • Labeled both the categories and the urgency thresholds.
  • Sampled 10K observations without replacement from the original dataset.

In general the data prep tasks will differ depending on how your own data is structured. Email or ticket data is normally stored in the underlying application’s database, and in addition to the text features there’s also additional metadata that can be effectively leveraged for this type of modeling. 

Model Training

DataRobot automates many parts of the modeling pipeline, so for the sake of this tutorial we will be more focused on the specific use case rather than the generic parts of the modeling process. For more details on the DataRobot modeling process, take a look here

Interpret Results
  1. By taking a look at Feature Impact, we see that title is the most impactful feature of the two for category classification. For ticket urgency, on the other hand, body is the most impactful.

Ticket Category     

Ticket Urgency 

  1. In assessing the word clouds we can clearly see strong word associations with their respective categories and urgencies. For example, in the classification model (below) we can see words such as access, card, access card that are highly associated (dark red color) with the facilities access category. And separately on the ticket urgency model, we see words such as servers and access being highly associated with the critical ticket level. 

Ticket Category

Ticket Urgency 

Evaluate Accuracy

Given that these are both multi-class classification models, below we’ve included the Multi-class Confusion Matrices for both Ticket Category and Ticket Urgency. For example in the Ticket Category matrix (below) for the facilities access category we can see the model has produced very high Precision and Recall results. Similarly for the Ticket Urgency, if we take the low urgency for example we can also see that the model has achieved almost perfect Precision and Recall scores. 

Ticket Category

Ticket Urgency 


There’s not much need to post process the results in this use case. However, given that this is a multi class classification, a business rule might be required to help establish a cutoff prediction probability. An easy/default solution can be to route the ticket in accordance with the highest-predicted probability class.

Business Implementation

Decision Environment 

After you find the model that best learns the patterns in your data in order to predict ticket categories and levels of urgency, use DataRobot to deploy the model into your desired decision environment. Decision environments are the different ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization and how they will ultimately make decisions using the predictions that impact your process. 

A very common integration is through one of the major RPA tools where the RPA Bot can feed new records to the deployed models and also consume those predictions on the other end and take appropriate actions as instructed. Alternatively the models can be integrated directly into your email or ticket applications via REST API. 

Decision Maturity 

Automation | Augmentation | Blend

For this use case, DataRobot enables your organization to intelligently automate mundane tasks so that  your employees can spend their time on more creative work. Additionally, it provides a better experience to your employees and customers by quickly and efficiently prioritizing and resolving their issues. 

Model Deployment 

The models output for this use case will be connected directly to either the RPA Bot or to the underlying application via the REST API interface. A possible parallel route can also involve a “human in the middle” approach, allowing for a trained analyst to intervene and make a tie-breaker decision whenever the model predictions are below a certain cutoff threshold. View here to learn more about RPA integrations with DataRobot

Model Monitoring

Decision Operators: IT/System Operations, Data Scientists 

Prediction Cadence: Daily predictions through the API or RPA DataRobot connector

Model Retraining Cadence: Model retraining will be mainly driven by specific procedural changes. If a new process is introduced or an old one is phased out, then the models will need to be retrained. For example adding a new ticket category or merging multiple categories into one because of organizational changes on the teams that are tasked with addressing those ticket requests. As a safety precaution models can be scheduled for monthly or quarterly retraining to leverage the most recently available data. 

Implementation Risks

Implementation risk is closely tied to the integration option we highlighted in the “Decision Flow” section: 

  • RPA Bot failure: RPA Bots normally string together multiple steps; sometimes they can fail, causing interruption of the entire flow such that intended actions do not get executed.
  • Pipeline failure: The API call to the model can fail for various reasons including network failures. This would break the flow of the orchestrated steps so that the subsequent steps after the breakpoint would not execute. 

Email or Ticket Application Failure: The underlying email or ticketing application itself could fail during data extraction or ingestion, causing the process flow to break.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free
build models
Explore More Industry Agnostic Use Cases
AI can help organizations across the board, no matter their industry, with a variety of internal and external challenger - from driving operational efficiency and optimizing expenditures to transforming marketing activities and improving forecasting.