Predict Whether a Parts Shortage Will Occur

Manufacturing Operations Decrease Costs Augmentation Binary Classification Demand Forecasting End to End
Predict part shortages or late shipments in a supply chain network so that businesses can prepare for foreseeable delays and take data-driven corrective action
Build with Free Trial


Business Problem

A critical component of any supply chain network is to prevent parts shortages, especially when they occur at the last minute. Parts shortages not only lead to underutilized machines and transportation, but also cause a domino effect of late deliveries through the entire network. In addition, the discrepancies between the forecasted and actual number of parts that arrive on time prevent supply chain managers from optimizing their materials plans.

Parts shortages are often caused by delays in their shipment. To mitigate the impact delays will have on their supply chain, manufacturers adopt approaches such as holding excess inventory, optimizing product designs for more standardization, and moving away from single-sourcing strategies. However, most of these approaches add up to unnecessary costs for parts, storage, and logistics.

In many cases, late shipments persist until supply chain managers can evaluate root cause and then implement short term and long term adjustments that prevent them from occurring in the future. Unfortunately, supply chain managers have been unable to efficiently analyze historical data available in MRP systems because of the time and resources required.

Intelligent Solution

AI helps supply chain managers reduce parts shortages by predicting the occurrence of late shipments, giving them time to intervene. By learning from past cases of late shipments and their associated features, AI applies these patterns to future shipments to predict the likelihood that those shipments will also be delayed. Unlike complex MRP systems, AI provides supply chain managers with the statistical reasons behind each late shipment in an intuitive but scientific way. For example, when AI notifies supply chain managers of a late shipment, it will also explain why, offering reasons such as the shipment’s vendor, mode of transportation, or country.

Then, using this information, supply chain managers can apply both short term and long term solutions to preventing late shipments. In the short term, based on their unique characteristics, shipment delays can be prevented by adjusting their transportation or delivery routes. In the long term, supply chain managers can conduct aggregated root-cause analyses to discover and solve the systematic causes of delays. They can use this information to make strategic decisions, such as choosing vendors located in more accessible geographies or reorganizing their shipment schedules and quantities.

Value Estimation

How would I measure ROI for my use case? 

The ROI for implementing this solution can be estimated by considering the following factors: 

  1. Starting with the manufacturing company and production line stoppage, the cycle time of the production process can be used to understand how much of the production loss relates to part shortages. For example, if the cycle time (time taken to complete one part) is 60 seconds and each day 15 minutes of production are lost to part shortages, then total production loss is equivalent to 15 products, which can be translated to loss in profit of 15 products in a day. A similar calculation can be used to estimate annual loss due to part shortage.
  2. For a logistic provider, predicting part shortages early can increase savings in terms of reduced inventory. This can be roughly measured by capturing the difference in maintaining parts stock before and after implementation of the AI solution. The difference in stock when multiplied with holding and inventory cost per unit gives the overall ROI. Furthermore, in cases when the demand for parts is left unfulfilled (because of part shortages), the opportunity cost related to the unsatisfied demand would directly result in loss of respective business opportunity.

Technical Implementation

About the Data

For illustrative purposes, we use a sample dataset provided by the President’s Emergency plan for AIDS relief (PEPFAR), which is publicly available on Kaggle. This dataset provides supply chain health commodity shipment and pricing data. Specifically, the dataset identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the dataset provides the commodity pricing and associated supply chain expenses necessary to move the commodities to other countries for use. We use this dataset to represent how a manufacturing or logistics company can leverage AI models to improve their decision making.

Problem Framing

The target variable for this use case is whether or not the shipment would be delayed (Binary; True or False, 1 or 0, etc.). This choice in target (Late_delivery) makes this a binary classification problem. The distribution of the target variable is imbalanced, with 11.4% being 1 (late delivery) and 88.6% being 0 (on time delivery). (See here for more information about imbalanced data in machine learning.)

The features below represent some of the factors that are important in predicting delays. The feature list encompasses all of the information in each purchase order sent to the vendor, which would eventually be used to make predictions of delays when new purchase orders are raised.

Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to delays. As you will see later, DataRobot is able to quickly differentiate important/unimportant features. 

These features are generally stored across proprietary data sources available in the ERP systems of the organization. 

Sample Feature List
Feature NameData TypeDescriptionData SourceExample
Supplier nameCategoricalName of the vendor who would be shipping the deliveryPurchase orderRanbaxy, Sun Pharma etc.
Part descriptionTextThe details of the part/item that is being shippedPurchase order30mg HIV test kit, 600mg Lamivudine capsules
Order quantityNumericThe amount of item that was ordered Purchase order1000, 300 etc.
Line item valueNumericThe unit price of the line item orderedPurchase order0.39, 1.33
Scheduled delivery dateDateThe date at which the order is scheduled to be deliveredPurchase order2-Jun-06
Delivery recorded dateDateThe date at which the order was eventually deliveredERP system2-Dec-06
Manufacturing siteCategoricalThe site of the vendor where the manufacturing was done since the same vendor can ship parts from different sitesInvoiceSun Pharma, India
Product GroupCategoricalThe category of the product that is orderedPurchase orderHRDT, ARV
Mode of delivery CategoricalThe mode of transport for part deliveryInvoiceAir, Truck
Late DeliveryTarget (Binary)Whether the delivery was late or on-timeERP System, Purchase Order0 or 1
Data Preparation 

The dataset contains historical information on procurement transactions. Each row of analysis in the dataset is an individual order that is placed and whose delivery needs to be predicted. Every order has a scheduled delivery date and actual delivery date, and the difference between these were used to define the target variable (Late_delivery). If the delivery date surpassed the scheduled date, then the target variable had a value 1, else 0. Overall, the dataset contains about 10,320 rows and 26 features, including the target variable. 

Model Training

DataRobot Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.

While we will jump straight to interpreting the model results, you can take a look here to see how DataRobot works from start to finish, and to understand the data science methodologies embedded in its automation. 

Something to highlight is, since we are dealing with an imbalanced dataset, DataRobot automatically recommends using LogLoss as the optimization metric to identify the most accurate model, being that it is an error metric which penalizes wrong predictions

For this dataset, DataRobot found the most accurate model to be Extreme Gradient Boosting Tree Classifier with unsupervised learning features using open source XGboost library.

Interpret Results

To give transparency on how the model works, DataRobot provides both global and local levels of model explanations. In broad terms, the model can be understood by looking at the Feature Impact graph, which shows the relative importance of the features in the dataset in relation to the selected target variable. The technique adopted by DataRobot to build this plot is called Permutation Importance

As you can see, the model identified Pack Price, Country, Vendor, Vendor INCO Term, and Line item Insurance as some of the most critical factors affecting delays in the parts shipments. 

Feature impact - DataRobot AI Platform

Moving to the local view of explainability, DataRobot also provides Prediction Explanations that enable you to understand the top 10 key drivers for each prediction generated. This offers you the granularity you need to tailor your actions to the unique characteristics behind each part shortage. 

For example, if a particular country is a top reason for a shipment delay, such as Nigeria or South Africa, you can take actions by reaching out to vendors in these countries and closely monitoring the shipment delivery across these routes.

Similarly, if there are certain vendors that are amongst the top reasons for delays, you can reach out to these vendors upfront and take corrective actions to avoid any delayed shipments which would affect the supply chain network. These insights help businesses make data-driven decisions to improve the supply chain process by incorporating new rules or alternative procurement sources.

Prediction explanations - DataRobot AI Platform

For text variables, such as Part description (included in the dataset), we can look at Word Clouds to discover the words or phrases that are highly associated with delayed shipments. Text features are generally the most challenging and time consuming to build models for, but with DataRobot each individual text column is automatically fitted as an individual classifier and is directly preprocessed with NLP techniques (tf-idf, n grams, etc.) In this case, we can see that the items described as nevirapine 10 mg are more likely to get delayed in comparison to other items.

Word cloud - DataRobot AI Platform
Evaluate Accuracy

To evaluate the performance of the model, DataRobot by default ran five-fold cross validation and the resulting AUC score (for ROC Curve) was around 0.82. Since the AUC score on the holdout set (unseen data) was also around 0.82, we can be reassured that the model is generalizing well and is not overfitting. The reason we look at the AUC score for evaluating the model is because AUC ranks the output (i.e., the probability of delayed shipment) instead of looking at actual values. The Lift Chart below shows how the predicted values (blue line) compared to actual values (red line) when the data is sorted by predicted values. We see that the model has slight under-predictions for the orders which are more likely to get delayed. But overall, the model does perform well. Furthermore, depending on the problem being solved, you can review the confusion matrix for the selected model and, if required, adjust the prediction threshold to optimize for precision and recall. 

Lift chart - DataRobot

Business Implementation

Decision Environment 

After the right model has been chosen, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process. 

Decision Maturity 

Automation | Augmentation | Blend

The predictions from this use case can augment the decisions of the supply chain managers as they foresee any upcoming delays in logistics. It acts as an intelligent machine that, combined with the decisions of the managers, help improve your entire supply chain network. 

Model Deployment 

The model can be deployed using the DataRobot Prediction API. A REST API endpoint which would be used to bounce back predictions in near real time when new scoring data from new orders are received. 

Once the model has been deployed (in whatever way the organization decides), the predictions can be consumed in several ways. For example, a front-end application that acts as the supply chain’s reporting tool can be used to deliver new scoring data as an input to the model, which then bounces back predictions and Prediction Explanations in real time.

Decision Stakeholders

The predictions and Prediction Explanations would be used by supply chain managers or logistic analysts to help them understand the critical factors or bottlenecks in the supply chain.

Decision Executors

Decision executors are the supply chain managers and procurement teams who are empowered with the information they need to ensure that the supply chain network is free from bottlenecks. These personnel have strong relationships with vendors and the ability to take corrective action using the model’s predictions.

Decision Managers

Decision managers are the executive stakeholders such as the Head of Vendor Development who manage large scale partnerships with key vendors. Based on the overall results, these stakeholders can perform quarterly reviews of the health of their vendor relationships to make strategic decisions on long-term investments and business partnerships.

Decision Authors

Decision authors are the business analysts or data scientists who would build this decision environment. These analysts could be the engineers/analysts from the supply chain, engineering, or vendor development teams in the organization who usually work in collaboration with the supply chain managers and their teams.

Decision Process

The decisions that the managers and executive stakeholders would take based on the predictions and Prediction Explanations for identifying potential bottlenecks would be reaching out and collaborating with appropriate vendor teams in the supply chain network based on data-driven insights. The decisions could be both long- and short-term based on the severity of the impact of shortages on the business.

Model Monitoring 

One of the most critical components in implementing an AI is having the ability to track the performance of the model for data drift and accuracy. With DataRobot MLOps, you can deploy, monitor and manage all models across the organization through a centralized platform. Tracking model health is very important for proper model lifecycle management, similar to product lifecycle management. 

Implementation Risks

One of the major risks in implementing this solution in the real world is adoption at the ground level. Having strong and transparent relationships with vendors is also critical in taking corrective action. The risk is that vendors may not be ready to adopt a data-driven strategy and trust the model results. 

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free
robotic hand manufacturing production quality dark
Explore More Manufacturing Use Cases
Manufacturers use AI to deliver the best products on the market as quickly and ethically as possible, while increasing productivity and profits. They can significantly improve demand forecasting, supply chain management, predictive maintenance, and many other operational areas with the help of artificial intelligence.