DataRobot PartnersUnify all of your data, ETL and AI tools in our open platform with our Technology Partners, extend your cloud investments with our Cloud Partners, and connect with DataRobot Services Partners to help you build, deploy or migrate to the DataRobot AI Platform.
Every day organizations receive thousands of emails that people read and interpret to determine which team or employee the email should be sent to. For example, in a support function for an organization, numerous tickets can arrive every couple minutes. Someone will read each email request to determine the criticality of the ticket and the person/team who should handle the issue.
Intelligent Solution
AI can train models to understand where these emails/tickets should route to along with the possible urgency behind the request. We can leverage Robotic Process Automation (RPA) to perform the steps of gathering the data within the email to submit to DataRobot. After the prediction, we can use RPA to route those emails/tickets to the correct category.
Value Estimation
What has ROI looked like for this use case?
Organizations that have implemented the DataRobot + RPA solution have seen savings of over $1 million. Some of the larger corporations receive 8,000+ emails/tickets daily and have a team (usually offshore) manually reading and routing the new emails/tickets. (See this public reference for American Fidelity.)
How would I measure ROI for my use case?
Here’s a way to calculate the ROI: Assume there are 5,000 emails/tickets a day and each email/ticket takes 1 minute to read (or, 5,000 minutes per day). That’s 83 hours for the teams to sort the email/tickets to the correct team each day. If each Full-Time Equivalent (FTE) handling the emails/tickets makes an all-in cost (including benefits) of $50.00 per hour, the DataRobot + RPA solution will save the organization around $4,000 per day. When multiplied by 261 working days per year, the total ROI is over $1 million dollars per year.
Technical Implementation
About the Data
For illustrative purposes, in this tutorial we work with a sample dataset of email/ticket requests that are directed to a central “Technology Admin” group which addresses various user requests such as supplies, and systems and facilities access.
This real-world dataset has already been cleansed by the original authors to remove any identifying information. In addition to the email/ticket content, the data also contains additional information; for this use case we focus primarily on the ticket’s category and urgency.
We’ll build two separate models—Category classification and Urgency classification—which independently have learned to both categorize the ticket and also assign urgency to it.
Problem Framing
For the Category classification model, the target variable is category which consists of four categories: system & database access, server access, facilities access, equipment and supplies.
For the Urgency classification model the target variable is urgency which consists of four urgency categories: low, moderate, high, critical.
For both Category and Urgency models the independent variables that the models are learning from are the “title” and “body” of the email/ticket.
Sample Feature List
Ticket Category Features
Feature Name
Data Type
Description
Data Source
Example
title
text
email/ticket title
email/ticket system
“Need new monitor”
body
text
email/ticket body
email/ticket system
“Hi, changed office location, need new monitor.”
category
categorical
request category
email/ticket system
equipment & supplies
Ticket Urgency Features
Feature Name
Data Type
Description
Data Source
Example
title
text
email/ticket title
email/ticket system
“Need new monitor”
body
text
email/ticket body
email/ticket system
“Hi, changed office location, need new monitor.”
urgency
categorical
request urgency
email/ticket system
moderate
Data Preparation
The original dataset was relatively clean; however, we did do some additional data preparation as listed below:
Dropped some of the other columns that were not relevant to our modeling exercise.
Labeled both the categories and the urgency thresholds.
Sampled 10K observations without replacement from the original dataset.
In general the data prep tasks will differ depending on how your own data is structured. Email or ticket data is normally stored in the underlying application’s database, and in addition to the text features there’s also additional metadata that can be effectively leveraged for this type of modeling.
Model Training
DataRobot automates many parts of the modeling pipeline, so for the sake of this tutorial we will be more focused on the specific use case rather than the generic parts of the modeling process. For more details on the DataRobot modeling process, take a look here.
Interpret Results
By taking a look at Feature Impact, we see that title is the most impactful feature of the two for category classification. For ticket urgency, on the other hand, body is the most impactful.
Ticket Category
Ticket Urgency
In assessing the word clouds we can clearly see strong word associations with their respective categories and urgencies. For example, in the classification model (below) we can see words such as access, card, access card that are highly associated (dark red color) with the facilities access category. And separately on the ticket urgency model, we see words such as servers and access being highly associated with the critical ticket level.
Ticket Category
Ticket Urgency
Evaluate Accuracy
Given that these are both multi-class classification models, below we’ve included the Multi-class Confusion Matrices for both Ticket Category and Ticket Urgency. For example in the Ticket Category matrix (below) for the facilities access category we can see the model has produced very high Precision and Recall results. Similarly for the Ticket Urgency, if we take the low urgency for example we can also see that the model has achieved almost perfect Precision and Recall scores.
Ticket Category
Ticket Urgency
Post-Processing
There’s not much need to post process the results in this use case. However, given that this is a multi class classification, a business rule might be required to help establish a cutoff prediction probability. An easy/default solution can be to route the ticket in accordance with the highest-predicted probability class.
Business Implementation
Decision Environment
After you find the model that best learns the patterns in your data in order to predict ticket categories and levels of urgency, use DataRobot to deploy the model into your desired decision environment. Decision environments are the different ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization and how they will ultimately make decisions using the predictions that impact your process.
A very common integration is through one of the major RPA tools where the RPA Bot can feed new records to the deployed models and also consume those predictions on the other end and take appropriate actions as instructed. Alternatively the models can be integrated directly into your email or ticket applications via REST API.
Decision Maturity
Automation | Augmentation | Blend
For this use case, DataRobot enables your organization to intelligently automate mundane tasks so that your employees can spend their time on more creative work. Additionally, it provides a better experience to your employees and customers by quickly and efficiently prioritizing and resolving their issues.
Model Deployment
The models output for this use case will be connected directly to either the RPA Bot or to the underlying application via the REST API interface. A possible parallel route can also involve a “human in the middle” approach, allowing for a trained analyst to intervene and make a tie-breaker decision whenever the model predictions are below a certain cutoff threshold. View here to learn more about RPA integrations with DataRobot.
Model Monitoring
Decision Operators: IT/System Operations, Data Scientists
Prediction Cadence: Daily predictions through the API or RPA DataRobot connector
Model Retraining Cadence: Model retraining will be mainly driven by specific procedural changes. If a new process is introduced or an old one is phased out, then the models will need to be retrained. For example adding a new ticket category or merging multiple categories into one because of organizational changes on the teams that are tasked with addressing those ticket requests. As a safety precaution models can be scheduled for monthly or quarterly retraining to leverage the most recently available data.
Implementation Risks
Implementation risk is closely tied to the integration option we highlighted in the “Decision Flow” section:
RPA Bot failure: RPA Bots normally string together multiple steps; sometimes they can fail, causing interruption of the entire flow such that intended actions do not get executed.
Pipeline failure: The API call to the model can fail for various reasons including network failures. This would break the flow of the orchestrated steps so that the subsequent steps after the breakpoint would not execute.
Email or Ticket Application Failure: The underlying email or ticketing application itself could fail during data extraction or ingestion, causing the process flow to break.
Experience the DataRobot AI Platform
Less Friction, More AI. Get Started Today With a Free 30-Day Trial.
AI can help organizations across the board, no matter their industry, with a variety of internal and external challenger - from driving operational efficiency and optimizing expenditures to transforming marketing activities and improving forecasting.