How to Avoid the Trap of Being Brilliant, but Impractical
The best insurance pricing model in the world is useless unless it can be used in an insurer’s computer system. A recent Harvard Business Review article described a team of actuaries that built an impressive predictive model, using the latest in machine learning algorithms. But the insurer lacked the infrastructure to directly implement the trained model in a production setting and the model was “too complex for the IT team to reproduce.” So how can you build insurance pricing models with the accuracy that comes with machine learning models, and yet keep enough simplicity of use that it works in your computer system? Is it possible to have a pricing model that is both brilliant and practical?
When I used to manage pricing for insurers, it was a painful process to change the rating tables. First, I would have to go to the executive team and get my project prioritized, and get IT resources allocated. Next, I would write a technical specification. Then the programmers would take a few months to code the mathematics. Finally, I would have to test that code and help debug why the answers didn’t match what I expected. It took from 6 to 12 months, and by the time I got the new rating structure into production, it was already obsolete! And by manually coding the rating table, the insurer created extra model risk – the risk that the model implementation was different to what I had designed.
Einstein once said, “Everything should be made as simple as possible, but not simpler.” While some insurers have embraced machine learning, others have avoided it because the formulae seemed too complex to put into production. But to survive in an ultra-competitive market, insurers need accuracy, and accuracy requires complexity. By avoiding complexity, insurers are losing sales and profit to their competitors due to the problem of adverse selection.
The practical problems of computational complexity have been known for some time, and computer science has developed well-known and widely-used solutions. Modern software is very complex. For example, Google has 2 billion lines of code, and it’s all in one place. Compared to Google, even the most complex insurance pricing models are simple, having only several thousands of lines of formulae. So if Google can manage that level of complexity, what can insurers and actuaries learn from this?
Well, computer science uses the idea of abstraction to deal with complexity in software design. By keeping interactions simple between sections of computer systems, and separating the interactions between sections from the implementation within each section, computer programmers can add levels of functionality that would otherwise be too complex to handle. Basically, it means that one part of a computer system doesn’t need to know the details of how any other part works, they only need to know how to communicate with each other.
This design architecture enables humans to create enormously complex systems by concentrating on a just few issues at a time. It also enables parts of computer systems to be upgraded without breaking other parts of the computer system – even though the upgrade may work differently internally, if it communicates with the rest of the system using the same rules as before, then all is well.
Insurance premium rating has the same architectural issues as complex computer systems:
- Insurance rates need to change frequently as the cost of claims changes rapidly, and as competitors change their rates. These changes should not break the rest of the insurance system.
- Insurance systems don’t need to know how the rating system works, they just need to send rating information and receive back the premium calculation.
- Additional risk is generated whenever a premium rating model must be transferred from one environment (where I designed the rating algorithm) to a different environment (where the algorithm will be used to calculate premium rates for customers).
While many data scientists are accustomed to using code to calculate predictions, this approach doesn’t address these three issues. If you change pricing models frequently, then every time you put a new model into production via code, you will need to:
- Compile the code, rebuilding your entire rating system
- Do regression testing to ensure that you haven’t broken any of the existing algorithms
- Do thorough testing of your new algorithm to ensure that the way the code works in your system is consistent with the way it worked in your development system
This is all time consuming and error-prone. A better solution is to not use code for production. Instead, make the pricing algorithm available via a Rest API. Most modern software can call Rest APIs, and APIs use abstraction to manage the complexity. DataRobot offers one-click model deployment via Rest APIs. You can quickly switch between models without fear that the production implementation is different to the model you built. Whenever you want to switch to a new pricing algorithm, all you need to do is change the model identifier ID in the API call. It’s as easy as that. By using machine learning in insurance, you can have both accuracy and complexity, being brilliant while still being practical.