Paxata Blog Feature Images 4

Data Preparation Fueling Your Enterprise Data Fabric

May 15, 2019
· 3 min read

Over the past couple of months, I’ve had the opportunity to speak to numerous Chief Data Officers (CDOs), IT leaders, and business leaders on approaches to accelerate their data initiatives. Everyone seems to agree on the need for better data and the necessity of getting it into the hands of stakeholders sooner. It is also becoming increasingly clear that traditional approaches to data preparation via IT lead data integration, master data management, and data quality are no longer sufficient.


Traditional Approaches Will Not Scale

It was telling that a CDO from a global bank attending a recent Evanta CDO Inner Circle Dinner in New York suggested that “we have to completely rearchitect” to meet the business demands for relevant and fresh data at the speed of business.

  • Asking IT is Dead: Over the past 15+ years, the standard business approach to obtain data for a new report was to ask IT. This typically led to a back-and-forth exchange between the business requestor and the developer trying to match the requirements with a data extract; often, the business side would receive the desired data several weeks later. This method may have been satisfactory in a pre-internet world, but it fails miserably in today’s rapid business environment.
  • Self-Service BI Failed:  In recent years, IT’s response to this bottleneck was to essentially toss data over the fence to the business side, who would then put it into Tableau, Qlik, Excel, or other visualization tool.  Two exceedingly unfortunate results occurred:
    • The 80% effort that IT developers spent on data preparation, shaping, and cleaning shifted over to business analysts, using tools that were never designed for these tasks.
    • Everything done by business analysts became siloed data artifacts that could neither be shared with other users nor governed, leading to chaos.

Data Preparation Emerges as a Clear Category

Over the past 5+ years, a number of vendors delivered a new breed of data management software that focused on the task of data preparation. While data always went through some form of preparation, the difference now is that modern data preparation tools are designed specifically for business users. Empowering business users with self-service data preparation solves both problems stated above: 1) there is no longer a need to ask IT to develop refined data sets, and 2) business users now have a purpose-built tool to profile, shape, clean, and publish the data to their BI or visualization tool of choice. 


Data Preparation Beyond Analytics

Reiterating the comment made by the bank CDO at the Evanta event, a new architectural vision is emerging. The recent Market Guide for Data Preparation Tools report from Gartner suggests that “by 2022, data preparation will become a critical capability in more than 60% of data integration, analytics/BI, data science, data engineering and data lake enablement platforms.”  In the May 2019 Forrester report titled Big Data Fabric 2.0 Drives Data Democratization, Principal Analyst Noel Yuhanna writes that “businesses are reporting that integrating data from silos to support real-time insights has become a nightmare,” and that “without a big data fabric strategy, organizations will likely spend more time and effort ingesting, integrating, curating, and securing data insights.”

Data preparation platforms as part of your fabric can power more than just ad hoc analytical requests. At the Gartner Data and Analytics Symposium in Orlando, Cox Automotive, Nationwide Insurance, and AdhereHealth joined me in a session to discuss their uses of Data Prep self-service solution. Read my event blog post here.

With Data Prep, Nationwide Insurance now has more than 20 different projects or initiatives, ranging from financial reporting to creating customer masters. Cox Automotive is powering their data quality program. And AdhereHealth is onboarding medical and health related data from a broad range of external sources to drive personalized medication adherence insights.


Closing Thoughts

You can read more Data Prep case studies on our website. Vanguard organizations are indeed rearchitecting in order to gain their ability to manage and leverage their data as an asset. By making an enterprise grade data preparation platform part of their data fabric, forward-thinking organizations are now able to accelerate their analytics and data science initiatives – and the same data preparation can be used to accelerate your application consolidation and migration initiatives.

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free
About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog