Successfully leveraging a data lake across multiple Hadoop distros Background

Winning with accurate and actionable information in banking

March 9, 2016
by
4 min

The Situation

In todays information-driven economy nowhere is clean, connected and trustworthy information more vital than in banking where information is the lifeblood of the business. In banking accurate, timely and actionable information is the difference between market leaders and also-ran. The banking business model is based on the concept of leverage. Banks raise capital through deposits, borrowings and sale of financial instruments and turn around and invest that capital to make loans and mortgages. Bank earnings are driven by the interest spread between interest earned on loans and interest paid out on deposits and borrowings. This is an overly simplistic generalization but is the crux of how the banking business model works.

The Opportunity

The wrinkle in the banking model is the need for reserve capital. Turns out that for everyone to have trust and confidence in the system and for the model to work, banks should maintain reserve capital to meet financial obligations incase the depositors or borrowers come looking for their money. The actual level of reserve capital is determined by government regulations but for current purposes its fair to assume that for every dollar banks raise they are required to keep up to 10 cents in reserves. Imagine what happens if the banks have dirty data like multiple instances of a single depositor or the single liability – the reserve capital requirements go up at the expense of deployed capital. Banks get hit with idle interest bearing deposits or borrowings and lower deployed capital resulting in increased cost, reduced revenues and lower margins. So, how do banks address these data quality issues? Simple – have IT use their classic, rule-based data quality tools to clean up the data, remove duplicates, standardize and everyone’s happy. Not so fast!

The case for business analyst-friendly, smart, enterprise-grade data preparation

Screen Shot 2016-03-09 at 10.11.22 AMLook at the sample of depositor information on the right. The classic, rule-base data quality approach using traditional tools to clean, deduplicate and standardize data won’t work in this case since even after cleaning customer names different addresses or phone numbers might cause these records to be flagged as unique resulting in duplicate data. The massive volume, variety and velocity of human and machine generated data and the ever changing nature of data quality issues further complicate things. The classic, rule-based data quality systems simply can’t keep pace with the data deluge and unknowns. Enter business analyst-friendly, smart, self-service data preparation platform like Paxata. Here’s how a modern data prep solution like Paxata addresses these challenges:

Business analyst-friendly, self-service solution: Paxata provides an Excel like interface for non technical business users to interactively and visually clean, combine and transform data without writing code, sampling data or building schemas. A banking analyst with the right business context and interactively working with the depositor information can quickly spot the issues and in a few clicks automatically clean, deduplicate and standardize data. Paxata balances business’s need for information with IT’s need for governance so business is empowered with self-service information creation and consumption within the guardrails of governance provided by IT.

 Smart: With the massive volume, variety and velocity of data confronting the banks, business analyst-friendliness isn’t enough. The analysts need an intelligent solution that can automatically spot and address known and unknown data quality issues at scale across multiple business units, geographic locations, customer segments and product categories. Enter Paxata’s smart data preparation solution that leverages machine learning, natural language processing (NLP), semantic analysis, in-memory processing and commodity hardware based distributed processing technologies such as SpaClean and Change (Summer 2015) copyrk and Hadoop to clean, combine, shape, enrich and transform data and address known and unknown data quality issues. Paxata’s cluster and edit functionality uses NLP algorithms such as ngram, fingerprinting and metphone to automatically discover clusters of similar entity values and makes recommendations on combining them into a single golden record. In contrast to the classic rule-based approach, Paxata is continuously learning and evolving to automatically address known and unknown data quality issues.

 Enterprise-grade: Finally, as issues are uncovered and fixed, every change should be captured and tracked to address information security and governance requirements and give bankers the accurate and trustworthy information they need to confidently run the business. Paxata provides auditing, lineage, versioning, recording and reordering capabilities to track, undo and redo changes, operationalize and provide context on steps taken to prepare data.

In conclusion, accurate and actionable information is paramount to success in banking and Paxata is the fastest path to turn raw data into reliable information. You can learn more about Paxata here.

FREE TRIAL
DataRobot Paxata

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

About the author
DataRobot

Enabling the AI-Driven Enterprise

The leader in enterprise AI, delivering trusted AI technology and enablement services to global enterprises competing in today’s Intelligence Revolution. Its enterprise AI platform maximizes business value by delivering AI at scale and continuously optimizing performance over time.

Meet DataRobot
Share this post
Subscribe to our Blog

Thanks! Check your inbox to confirm your subscription.

Thank You!

We’re almost there! These are the next steps:

  • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
  • Click the confirmation link to approve your consent.
  • Done! You have now opted to receive communications about DataRobot’s products and services.

Didn’t receive the email? Please make sure to check your spam or junk folders.

Close

Newsletter Subscription
Subscribe to our Blog