• Blog
  • Choosing a prediction modeling technique

Choosing a prediction modeling technique

May 7, 2020
by
· 6 min read

This article was originally published at Algorithimia’s website. The company was acquired by DataRobot in 2021. This article may not be entirely up-to-date or refer to products and offerings no longer in existence. Find out more about DataRobot MLOps here.

We encounter the results of prediction models everyday. We get offered the right incentive to buy a product. We get a text from the bank when something strange is happening with our accounts. And we rarely have to see spam in our inboxes. This is all thanks to the models built on the data that we generate everyday.

Read on to learn more about the business of prediction modeling and how you can develop and train your own. 

Is predictive modeling machine learning?

Predictive modeling and machine learning are related, but have slightly different definitions. Predictive modeling is often defined as the use of statistical models to predict outcomes. Machine learning is a subset of artificial intelligence that refers to the use of computers to construct predictive models. In more recent years, however, the terms have been used synonymously.

How is prediction modeling used in business?

Here are a few examples of how predictive modeling is used in the business environment: 

Sales 

One of the most important first steps in outbound sales is identifying the right potential customers to contact. Sales teams often use lead scoring models based on predictive analytics to identify the most likely and lucrative future customers. 

Marketing

Predictive modeling is used in marketing in several different ways. Take for example, recommended products that a customer sees on an  e-commerce site. These items are the results of models based on the purchasing behavior of the customer and users with a similar profile. 

In email marketing, content like the subject line and the body of text are often drafted based on models predicting the likelihood of someone opening the email and clicking through a call to action.

Customer retention 

Customer churn is one of the most common use cases for prediction modeling in business. This is because it’s typically cheaper to keep an existing customer than to onboard a new one. Churn models predict how likely it is for a customer to discontinue using your product or service based on their previous actions. These models can also be used by customer service agents to offer relevant incentives to customers most at risk of churn to keep them as customers longer.

Fraud detection 

Financial services companies use anomaly detection models to detect possible fraudulent activities. For example, when a credit card customer is sent an alert requesting that they verify a transaction, this is triggered by actions that don’t match the person’s typical behavior.

Demand forecasting

Operations and supply chain professionals use predictive analytics to help them determine the type and number of products to produce and ship. These models are based on past customer/buyer behavior in addition to more current global and economic factors. 

What are common predictive modeling techniques?

Before we get into too much detail, let’s discuss the two main types of machine learning models: supervised and unsupervised learning.

Supervised learning models have a specified target output which is either a classification (label) or a continuous variable. The purpose of supervised learning models is to predict a specified outcome. Unsupervised learning models, on the other hand, don’t have any sort of target variable. These models are often used during exploratory data analysis to uncover patterns or natural groupings within data. 

Although we are going to focus primarily on supervised machine learning models in this piece, note that unsupervised techniques do play a role in predictive modeling. For example, a customer churn model may actually begin with an unsupervised task like clustering to uncover groups of similar people within a high risk for churn group. This more nuanced information can help you build models to predict the right incentives to retain an at-risk customer. 

Classification v regression

Supervised learning techniques can be split roughly into two categories: classification and regression. The target variable of a classification model is the class or category that a new observation belongs to. A variation on this is class probability estimation which predicts the likelihood of a new observation belonging to a particular class. In regression models, the target variable is a numerical value. 

The following are a few examples of prediction algorithms. Note that some can be used for classification or regression.

Linear regression 

Linear regression is one of the easiest machine learning algorithms to understand. It models the relationship between the target response (dependent variable) and one or more independent variables.

In a linear regression model used to predict a potential customer’s spending, independent variables could include factors such as income, age, and how frequently they’ve used your services over a period of time. 

K-nearest neighbor 

Nearest neighbor models can be used for classification or regression. Predictions are based on the distance between a new observation and existing data points. The “k” in this model represents the number of data points or “neighbors” to compare the new observation. 

In a classification model, the new observation is put in the same class as the majority of its neighbors. In a regression model, the prediction is typically an average of the numerical value of the neighbors. 

Decision tree

Decision trees can be used for classification or regression. They work by splitting a complete dataset into successive subsets based on classification features or a predetermined numerical value. The splitting continues until it reaches a terminal node where it cannot be divided any longer. 

Banks sometimes use some form of a decision tree to make decisions about whether or not to offer a customer a loan. The very visual nature of decision trees are also useful for when a modeler needs to transparently demonstrate how they came to a decision. 

Logistic regression 

Logistic regression is actually a class probability estimation model. In marketing, logistic regression is often the basis of propensity models that predict how likely it is for a customer to make a purchase. Providing a more granular view of a customer’s possible choice provides marketers with the information they need to develop more targeted and relevant outreach.

How do you train a predictive model?

You can find more on developing machine learning models here, but here are few steps that are important to emphasize: 

Exploring the data 

Before diving into the data in an attempt to answer a question, it’s important to see and understand what’s there. You may want to visualize it with charts or graphs in order to uncover patterns that might not be obvious from a spreadsheet.

Dividing the data

Building a model will require you to divide your dataset into at least three different subsets. 

  • Training data –  This is the largest subset and will be the one on which you build the model. The model also learns from this data. 
  • Validation data – This is the subset for which the model is continuously evaluated. As you refine the model, you will continue to test it against the validation dataset.
  • Test data – This is the final dataset you will use to evaluate the model’s fit. This dataset should only be used once. 

Developing the model

The algorithms for which you choose to base your model need to be relevant to the question you are trying to answer. In addition, your algorithm options may be expanded or limited by the data that you have.

Demo
See DataRobot in Action
Request a demo
About the author
DataRobot

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog

    Related Posts

    Newsletter Subscription
    Subscribe to our Blog