Impute Missing Values in a Dataset

Industry Agnostic Information Technology Reduce Risk Data Quality Executive Summary
Predicting/imputing missing values in a dataset for predictive modeling.
Build with Free Trial


Business Problem

Regardless of the context, conducting sound, reliable analyses, is largely determined by the quality of the underlying data sources informing the analysis. Whether the data is used for simple summary statistics or machine learning, inaccurate and/or incomplete data will unequivocally harm the integrity of the analysis on some level. Missing values are especially problematic for AI and machine learning applications. This is because it is very difficult to incorporate that attribute into the model when the data doesn’t provide it with consistent examples – without consistently complete records, the model will have trouble identifying patterns between affected variables as well as throwing off predictions for the target outcome. To use that attribute in a machine learning model, the missing values must be imputed in some manner. This requires estimating the missing values, which can pose a tremendous risk if implemented without caution. While attributes with too many missing values are unusable in machine learning applications, inaccurate imputed values are likely more dangerous because they are misleading.

Intelligent Solution

Believe it or not, machine learning can actually help your team prepare a dataset for (yes, you guessed it) machine learning by imputing the missing values more intelligently. Instead of predicting future outcomes, AI uses known historical data to predict unknown historical data. When simple historical averaging doesn’t get the job done, machine learning can predict missing values by identifying patterns in complete records that would not be accounted for otherwise. When accuracy is a priority, AI can provide granular insights into the rationale behind imputed values and evaluate how generalizable the predictions are throughout the dataset. Whether those imputed values are subsequently used in AI applications or other analyses, imputing missing values with machine learning will give you more information than a historical average, which is valuable whether the imputed values are ultimately usable or not.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free
build models
Explore More Industry Agnostic Use Cases
AI can help organizations across the board, no matter their industry, with a variety of internal and external challenger - from driving operational efficiency and optimizing expenditures to transforming marketing activities and improving forecasting.