5 Data Science Challenges Banks Face (And How to Overcome Them)
Making predictions has been a part of the banking industry since the world was flat. These days, you would be hard-pressed to identify a line of business or function in a bank that doesn’t have multiple needs for predictive analytics.
Banks of all sizes are realizing that they must find new ways of capturing, organizing, and making data available, and must up their game with new tools and techniques for learning from their data and embedding data-based capabilities into products, services, client interactions, and operations.
Cheap commodity hardware, the rise of open-source technologies, and new machine learning algorithms have eliminated the technological barriers of yesteryear. Modelers can now crank through an enormous amount of data and let the computer do the hard work of finding the best predictors. The machine “learns” how to make predictions based on the data you provide.
In the last decade or so, the sheer number of different machine learning models that can be used to glean insight from your data has exploded. Regression models have waned. Now there are neural networks, random forests, support vector machines, and gradient-boosted trees, just to name a few. But this has given rise to a whole new set of challenges.
Why is Data Science So Hard?
It is a practical impossibility to know exactly which of the myriad of available modeling algorithms and techniques will give you the best result given the data you have and what you are trying to predict. So, data scientists have to try a lot of them. Not even the most experienced data scientists can know a priori which model will work the best. Given practical time and budget constraints and an enormous backlog of demand, data scientists usually rely on a few models they know well.
It’s not enough to know that a model works well. With skeptical users, never mind regulators, the ability to explain how and why the model works is critical. Which data is important and when (all of the time, some of the time, only rarely)? Can you explain the reasons for a specific prediction (was it a single reason or a combination of factors)? Black box models that work brilliantly but without insight are of little value, even if you could get them past regulators — which you can’t.
There are almost always trade-offs that need to be carefully weighed and considered. Perhaps a model is very good at sniffing out positive outcomes, missing very few, but at the cost of a high number of false positives—a classic problem in fraud detection and AML. Or a model is exceedingly accurate but can’t perform at the speed needed to support actual business operations. Perhaps a model works very well for one location or customer group, but because of behavioral differences, works less well in others. These decisions must be carefully weighed and the trade-offs assessed before deploying a model into operational use.
Even the best model can’t correct for errors, gaps, or biases in your data. Nowadays, it’s cheap and easy to store data. That wasn’t always the case. This means that you may have limited history to work with — will that be good enough? Maybe you have data for one geography but are considering expanding into another — will a model trained on the one work for the other? Are all of the data features used to create your model appropriate? Can you defend use of all the data for decision-making purposes, even if the model finds that age, for example, is a good predictor? What if your history has embedded biases? If human-originated biases are reflected in the training data, this bias will be reflected in the resultant model.
Will your model continue to perform as conditions change, for example as customer behaviors evolve? Many business conditions are a moving target, and your models need to continuously learn and improve.
Automated Machine Learning Can Help Your Bank Overcome These Challenges
Automated machine learning (invented by DataRobot) solves many of these challenges and makes the others more manageable. With these advances it is now possible to get substantially more productivity from your team of overworked data scientists. And, solutions can be brought to market faster.
Automated machine learning:
Finds the best model for your particular situation through competitive elimination from an extensive resource library of common models—cutting hundreds or even thousands of hours off the time required to find the best model for your situation
Ranks the top performing models using any one of several different metrics available so you can evaluate and select from among the models best suited to your particular problem
Provides transparency into each model’s use of data, telling you not just which data is most important, but when to deploy it
Explains individual predictions, down to specific data features and their values
Provides diagnostics for understanding each models accuracy using a variety of standard metrics
Provides tools for understanding and making tradeoff decisions (e.g., between speed and accuracy, positive versus negative predictive value, when and where additional models may help)
Automatically creates most of the documentation required for model validation and model risk management (that data scientists, almost universally, dislike spending time on)
Reduces the cost, difficulty, and risk of deploying models into your production environment by providing minimally invasive deployment options such as code generation, API deployment, and deployment to Hadoop
Makes it easier to monitor model performance and detect drift or performance degradation over time, alerting modelers to the need for retraining or creation of challenger models
Makes retraining models on new data and redeploying models into production simple, fast, and low risk
Want to learn more about how your bank can get high value from AI and machine learning? Read more in our white paper, Intelligence Briefing: How Banks Are Winning with AI and Machine Learning.
About the Author:
H.P. Bunaes is the GM of Financial Services at DataRobot and in this role has helped banks and fintechs large and small all over the world leverage AI and machine learning for predictive analytics and data mining. H.P. has over 30 years experience in banking and held a variety of leadership positions at SunTrust and before that at FleetBoston. H.P. is a graduate of Trinity College and earned his advanced degree at the Massachusetts Institute of Technology.
H.P. Bunaes is the GM of Banking at DataRobot, helping banks leverage AI and machine learning for predictive analytics and data mining. H.P. has 35 years experience in banking, with broad banking domain knowledge and deep expertise in data and analytics. Prior to joining DataRobot, H.P. held a variety of leadership positions at SunTrust, including leading the design and development of the risk data and analytics platform used enterprise-wide for risk management. H.P. is a graduate of the Massachusetts Institute of Technology where he earned a Masters Degree in Management Information Systems, and of Trinity College where he earned a Bachelor of Science degree in Computer Science and Mechanical Engineering.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
How AI Helps Address Customer and Employee ChurnJune 8, 2023· 4 min read
Optimizing Large Language Model Performance with ONNX on DataRobot MLOpsJune 1, 2023· 11 min read
Belong @ DataRobot: AAPI Heritage Month with the ACTnow! CommunityMay 25, 2023· 3 min read
Discover insights on the specific conditions that make machine learning effective in certain financial applications, such as high-frequency trading. Read more.
In this article, we’ll first take a closer look at the concept of Real Estate Data Intelligence and the potential of AI to become a game changer in this niche.
In this blog post we’ll explore how Mindshare, a global media agency network, has leveraged data science tools to create a fast and reliable decision-making engine. Read more.