Assessing Accuracy of Your Models
Accuracy in machine learning is multidimensional. Accuracy refers to a subset of model performance indicators that measure the model’s aggregated errors in different ways. Although it is necessary to decide what error metric to optimize for, fixating on only one score is a myopic perspective for building production-ready AI systems. You can best evaluate accuracy through multiple tools and visualizations, along with explainability features and bias and fairness testing when appropriate.
How Do I Evaluate the Accuracy of My Model?
The characteristics of your predictive target help determine the best error metric by which to optimize your model. As a classic example, consider naive accuracy in a binary classification problem: that is, the percent of the time your model predicts the class of true or false correctly. If your target distribution is heavily imbalanced such that 97% of your training sample has the value of false, a model that predicts false every single time would have a naive accuracy score of 97%. Although impressive on paper, this is obviously useless to your goal of training a model that can find those rare cases of true.
How Do I Explain How Accurately a Model Is Performing?
When reporting the accuracy of a model, visualizations like the ones described above can be instrumental in communicating to both technical and nontechnical audiences. Comparisons to other modeling approaches are also valuable. For instance, it is always possible to generate some sort of naive or baseline model, such as a Majority Class Classifier, against which you can more clearly see the predictive lift of the chosen approach. Ideally, you should attempt and compare multiple competitive modeling approaches. The DataRobot leaderboard is populated with each end-to-end modeling approach built against a given dataset and enables direct comparisons between diverse methods. Ultimately, it might not be the most purely accurate choice because the other dimensions of performance and trust weigh in, but it is still critical to understand holistically.
Robustness and Stability
A model in production encounters all sorts of unclean, chaotic data—from typos to anomalous events—which can trigger unintended behavior. Find out how to test whether your model is ready for the real world.
The speed of model scoring directly impacts how it can be used in your business process. Find out how the speed of predictions influences their trustworthiness.