Evolution of Data Management From Presumption driven to Data driven Background

Rapid Data Profiling Delivering Self-Service Data Trust

April 10, 2018
· 3 min read

A recent study into “The State of Data Quality in the Enterprise, 2018”, companies are experiencing two major obstacles in their quest for trust in their data: the significant variety of data sources and a complex mix of data types. The result is that` data preparation activities are difficult and the resulting analytical initiatives are slowed down. Register for our webcast titled “Rapid Data Profiling to Streamline Customer Data Onboardingwhere Mike White and Krupa Natarajan from Paxata will walk you through the elements of Rapid Data Profiling and share some customer examples.

The Case For Self Service Data Preparation

It is commonly proclaimed by industry observers and business leaders that data is the fuel of the digital economy. Of course, the challenge is that everyone has data – so what will make your business stand out in the data fueled race? For most, it is faster and smarter use of that data could give you a major leg up. But while the world’s repositories of data have exploded to an expected 45,000 Exabytes by 2020, and BI tools have become more pervasive than ever before, the path between that data and the BI tool is going through a proverbial straw.

IT cannot be the only place where data is prepared for analytics. Enter the domain of self-service data preparation.  Data analysts and power users are empowered to find, understand, and shape data from a variety of sources, and then publish that resulting dataset into their BI tool of choice. The resulting productivity and accuracy substantially improves analytical outcomes.  Of course, the quality of those outcomes are dependent on the quality of the data itself.

Researchers traditionally use Excel or Perl scripts to bring these different data sources together, flatten XML structures, parse delimited files, and compare the profile of a given patient to a cohort of other patients – a manual and time-consuming process. Additionally, most researchers lack the data science skills required with traditional data prep and ETL solutions.

Interactive Rapid Data Profiling

In our diagram above we outline the various stages in the data prep lifecycle. The first question the analyst has is what data is available? Maybe it is customer data and purchase transactions. Once you located the available data, you need to understand what is in the data and what does it look like. Rapid data profiling allows you to visually and quickly understand the shape of the data. It helps you understand possible data quality issues like duplicates, miss-spelling of States, Countries, Products, Companies and then it guides you to interactively fix those problems.

Paxata Rapid Data Profiling In Action

During the webcast on Wednesday, April 25 at 11:00 am PDT | 2:00 pm EDT we will introduce rapid data profiling, share examples of customer success, and demonstrate a modern, point-and-click approach for streamlining your data profiling and remediation efforts:

See scenarios and customer use cases where rapid data profiling changed the game

Generate a profile report with a single click and obtain summary insight (remediation guidance)

Leverage built-in views and algorithmic intelligence to detect data quality issues and remediate them across the entire dataset in real time

Generate data quality business rules using Excel-like syntax to discover and fix exceptions

Publish and automate a clean and conformed output for a data source or analytics solution

Parting Thoughts

As we are living in the middle of the data arms race, it is clear the world is never going to be homogeneous. Data is in every location – on-premises, in multiple clouds. And the variety will keep on growing. The key to transformative insights is driven by the agility of organizations moving from ideas, to pilots, to operationalizing their insights.

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free
About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog