The Curse of Dimensionality – Combinatorial Explosions
Combinatorial explosions occur in some numeric problems when the complexity rapidly increases, caused by the increasing the number of possible combinations of inputs. This explosion in complexity can make some mathematical problems intractable to brute force solutions. Combinatorial explosions are a manifestation of the curse of dimensionality.
The problem of combinatorial explosions occurs frequently in insurance pricing. For example, I have data for an auto/motor insurance pricing project, and it has 27 rating factors. My rating structure could use anywhere from 0 to all 27 of these rating factors, and I want to find the best combination of rating factors. How many combinations will I have to search through?
If it took me only 1 minute to analyze each combination (and that’s faster than I’ve ever been able to work), then it would take me approximately 21,515,067,731,468 billion years to try each combination. To put this into perspective, the universe is only 13.8 billion years old!
But this is only a small part of the problem. Some rating factors interact with each other. For example, auto/motor insurers often find an interaction between the age of the driver and their gender.
Young drivers tend to cost more, and male drivers tend to cost more, but young male drivers cost even more than can be explained by the individual effects of age and gender. Generalized linear models (GLMs), one of the most popular statistical tools for actuaries, do not automatically capture such interactions. The user must explicitly define each interaction, definition both the which combinations have interactions, and the 3D mathematical function that describes that interaction. This rapidly increases the number of models that actuaries must test. The pricing problem becomes even more intractable than ever.
Because of this, some insurers hire huge teams of actuaries who do nothing but search through combinations of rating factors, looking for incremental improvements in their pricing models. With the rise of modern machine learning models, there are smarter, faster ways to select which features to include in a model, and which feature interactions to include. For example, there is a technique called lasso regularization. By adding a lasso penalty for each model coefficient, linear models can automatically do both feature selection and regularization at once.
Recently there has been a new development. In their paper “Accurate Intelligible Models with Pairwise Interactions”, Caruana et al describe high-performance generalized additive models with pairwise interactions (GA2Ms) that “a novel, computationally efficient method called FAST for ranking all possible pairs of features as candidates for inclusion into the model”. FAST quickly finds the feature interactions that are of most value.
When I applied GA2Ms to auto/motor insurance data, the FAST algorithm found and ranked possible feature interactions for me, shown above. As you can see, the strongest feature interaction was latitude and longitude, which are the geographic rating features, which are usually quite important for this type of insurance. This made sense to me. It also found a number of interactions between the driver’s age and other features. They match my expectations in this area, such as the age and gender interaction shown earlier in this article.
In an exciting move to support our insurance customers, GA2M models have been made available in the latest version of DataRobot. Since insurers operate in an industry that is both highly regulated and highly competitive, they need models that are simultaneously accurate and intelligible, explainable and justifiable to the regulator. Now insurers have access to “intelligible models with state-of-the-art accuracy”.
Colin Priest is a Fellow of the Institute of Actuaries of Australia and has worked in a wide range of actuarial and insurance roles, including Appointed Actuary, pricing, reserving, risk management, product design, underwriting, reinsurance, relationship management, and marketing. Over his career, Colin has held a number of CEO and general management roles, where he has championed data science initiatives in financial services, healthcare, security, oil and gas, government and marketing. He frequently speaks at various global actuarial conferences.
Colin is a firm believer in data-based decision making and applying machine learning to the insurance industry. He is passionate about the science of healthcare and does pro-bono work to support cancer research.