What You Need to Know About Model Risk Management
Model risk refers to the inherent risks associated with running machine learning models in production. Read this post to learn how to manage model risk.
What is model risk?
Model risk is a term typically used in the context of financial models. It usually refers to assumptions in the machine learning model that make it impossible to fully capture financial risk. This can be the result of a model which is high in bias and does not model the underlying data well, or one which is in any other way inaccurate. Failing to properly assess financial risk can come with a whole host of negative implications, such as decisions of improper subprime lending, incorrect predictions regarding the direction of financial markets or individual securities, and disastrous business decisions based on faulty financial forecasts.
However, in 2021, big data and machine learning are used widely across many different industries and the definition of model risk has been expanded to reflect this fact. Today, when someone speaks of model risk, they’re usually reflecting on the inherent downstream problems associated with relying on any sort of statistical model.
Model risk refers to the inherent risks associated with running machine learning models in production. The fact is that there are always risks run when working with data models, and every organization should put in place governance and operational procedures to minimize and mitigate such risks.
Sources of model risk
Sources of risk during model development and training
Model risk can arise in any number of places within the model building process. For example, a model’s outputs hinge on the data, assumptions, contexts, underlying math, and code with which it was trained. When any of these pieces possesses errors or biases, they will likely propagate website to the outputs and incur some sort of error.
And yet, even when using the utmost care to develop models, some small margin of fundamental inaccuracy is usually unavoidable. This is codified in the famous Bias-Variance Tradeoff principle of machine learning, which asserts that a model’s total error can be decomposed into a sum of its squared bias and variance plus a component called irreducible error. This last part is inescapable. Even with a perfectly optimized and tuned model, there will always be some portion of the error which is intractable due to the noise in the finitude of data with which we must work. No finite training set can provide a perfect measure of any real-world circumstance.
That said, there are many controllable aspects of model risk. The first of these is the data itself. While no dataset will ever be perfect, better models can pretty much always be obtained by gathering more and better quality representative data. This is a simple consequence of Monte Carlo approximation—the more data is used, the better the estimation of the real-world distribution. Thus, any time error is encountered in a model, the first instinct should be to go back and examine the data upon which it was trained.
Next, many models are trained with inherent assumptions in mind. This is particularly the case when manual feature selection is used or rules about the behavior of a system are encoded into the model itself. For example, a drug discovery model might be constructed such that it produces outputs which adhere to chemical properties underlying molecular interactions, or a language model might encode certain linguistic structures such as the arrangement of tenses or the behavior of predicates in a sentence. Any errors in these underlying assumptions will likewise lead to overall errors in the model’s outputs and quantifiable risk.
Another thing to keep in mind is the context in which a model is trained. This ties back to the data a model uses. Normally, a model is trained to accurately reflect the characteristics of some underlying data distribution. When a model is trained on data from one distribution and then applied to data from some other distribution, it can make bad predictions. Famously, this has been demonstrated in image recognition. When classifiers are trained on pictures of people from one race but then applied to those of a different race, the model can inaccurately label features or make other gross misgeneralizations. Having a representative dataset that accurately reflects the context in which the model will ultimately be used is paramount for reducing model risk.
This out-of-distribution error is problematic, but it’s just one side of the coin. There are issues that also arise when your entire distribution of data is inherently biased. For example, humans have their own biases and these biases are codified in the data we generate. When models are trained on this data, the model learns to replicate these biases, even though we’d ultimately like our model to operate as humans should, not as they do. Whenever a model is being designed, it is important to contemplate what biases it might be inclined towards and to decide whether any adjustments to the model are ultimately needed.
Context is also useful in other ways. One frequently encountered issue in certain models is their lack of transparency into the bases for their decisions. Neural networks and other highly parameterized models can often be complex to the point of confounding. While they may make highly accurate predictions, there is often little to no insight into why or how such predictions were made. Governor Lael Brainard of the Federal Reserve detailed this quite well in her recent speech. In it she discusses some of the nuanced complexities associated with model risk management and, particularly, the pitfalls associated with relying on black box algorithms that lack introspection into their inner workings. Oftentimes, understanding the context in which a model was trained can help to elucidate the reasoning behind its decision-making processes, and then sometimes it can even be preferable to go through the extra work of using interpretable machine learning models within a particular problem space.
Finally, model risk can arise due to issues with the technical machinery upon which a model is based. For example, a model itself can be flat-out inaccurate. It could exhibit high bias and not have enough features to accurately capture the relationships in the data, a phenomenon known as underfitting. It can also be high variance, and too closely model the underlying data such that its ability to generalize to new data is harmed. This is known as overfitting. Other issues could include an improperly calibrated model, a model based on erroneous equations or statistical theory, or a model trained with incorrect optimization methods that do not find a good optimum. The list is virtually endless.
And even if the math and procedures baked into a model are correct, technical errors can arise in its implementation. Programmer error is a frequent and well understood component of software engineering. For this reason, it is necessary to have excellent testing, CI/CD, and deployment processes in place in order to be able to quickly identify bugs in a model’s code and correct them. This is one reason why model governance is so important. Robust model validation is another critical piece of the puzzle.
Sources of risk for models in production
While there are many sources of model risk during the model development and training process, more and more organizations are beginning to realize that what happens after model deployment is just as important—if not more so.
Sources of risk for models in production are often related to the technical debt an organization may have accrued. Specifically, some of these key sources of risk include:
- Not having a full catalog of all models in production
- Lack of documentation for which data is being used for which models
- Models that were deployed in the past that don’t meet current organizational standards for testing and documentation
- Inability to see dependencies between all models in production
- Poor monitoring of models, resulting in degrading performance or outright failure
This is why governance across the full ML lifecycle is crucial.
How to manage model risk
This brings us to managing model risk. While we’ve discussed some specific pieces of a risk management strategy, it’s important to view it as an overarching philosophy of model development, not merely an ad-hoc approach deployed in response to individual error crises. Model risk is not a static quantity. It is something that can and must be mitigated.
A key component of this risk management process is governance. And as mentioned, governance should be applied across the complete ML lifecycle, not just to one area like model development. Machine learning governance refers to the overall process for how an organization controls access, implements policy, and tracks activity for models and their results. Among other things, this can include output validation, tests for precision, accuracy, and drift, model versioning that allows for the tracking of results over time, documenting model behavior, and controlling access rights to a given model.
DataRobot’s 2021 enterprise trends in machine learning report found that 56% of organizations struggle with governance, security, and auditability issues. So if you’re not sure where to start with machine learning governance, you’re not alone.
The complexity of machine learning governance is compounded by the fact that an organization’s policy must coexist with and complement industry regulations. This can be challenging, especially in highly regulated fields such as finance, medicine, biotech, and civil engineering where the rules are frequently changing. Adopting a cohesive model governance practice takes constant reflection, revision, and patience.