Evolution of Data Management From Presumption driven to Data driven Background

Rapid Data Profiling Delivering Self-Service Data Trust

April 10, 2018
· 3 min read

A recent study into “The State of Data Quality in the Enterprise, 2018”, companies are experiencing two major obstacles in their quest for trust in their data: the significant variety of data sources and a complex mix of data types. The result is that data preparation activities are difficult and the resulting analytical initiatives are slowed down. Register for our webcast titled “Rapid Data Profiling to Streamline Customer Data Onboardingwhere Mike White and Krupa Natarajan from Paxata will walk you through the elements of Rapid Data Profiling and share some customer examples.

The Case For Self Service Data Preparation

It is commonly proclaimed by industry observers and business leaders that data is the fuel of the digital economy. Of course, the challenge is that everyone has data – so what will make your business stand out in the data fueled race? For most, it is faster and smarter use of that data could give you a major leg up. But while the world’s repositories of data have exploded to an expected 45,000 Exabytes by 2020, and BI tools have become more pervasive than ever before, the path between that data and the BI tool is going through a proverbial straw.

IT cannot be the only place where data is prepared for analytics. Enter the domain of self-service data preparation.  Data analysts and power users are empowered to find, understand, and shape data from a variety of sources, and then publish that resulting dataset into their BI tool of choice. The resulting productivity and accuracy substantially improves analytical outcomes.  Of course, the quality of those outcomes are dependent on the quality of the data itself.

Researchers traditionally use Excel or Perl scripts to bring these different data sources together, flatten XML structures, parse delimited files, and compare the profile of a given patient to a cohort of other patients – a manual and time-consuming process. Additionally, most researchers lack the data science skills required with traditional data prep and ETL solutions.

Interactive Rapid Data Profiling

In our diagram above we outline the various stages in the data prep lifecycle. The first question the analyst has is what data is available? Maybe it is customer data and purchase transactions. Once you located the available data, you need to understand what is in the data and what does it look like. Rapid data profiling allows you to visually and quickly understand the shape of the data. It helps you understand possible data quality issues like duplicates, miss-spelling of States, Countries, Products, Companies and then it guides you to interactively fix those problems.

Paxata Rapid Data Profiling In Action

During the webcast on Wednesday, April 25 at 11:00 am PDT | 2:00 pm EDT we will introduce rapid data profiling, share examples of customer success, and demonstrate a modern, point-and-click approach for streamlining your data profiling and remediation efforts:

See scenarios and customer use cases where rapid data profiling changed the game

Generate a profile report with a single click and obtain summary insight (remediation guidance)

Leverage built-in views and algorithmic intelligence to detect data quality issues and remediate them across the entire dataset in real time

Generate data quality business rules using Excel-like syntax to discover and fix exceptions

Publish and automate a clean and conformed output for a data source or analytics solution

Parting Thoughts

As we are living in the middle of the data arms race, it is clear the world is never going to be homogeneous. Data is in every location – on-premises, in multiple clouds. And the variety will keep on growing. The key to transformative insights is driven by the agility of organizations moving from ideas, to pilots, to operationalizing their insights.

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free
About the author

Enabling the AI-Driven Enterprise

The leader in enterprise AI, delivering trusted AI technology and enablement services to global enterprises competing in today’s Intelligence Revolution. Its enterprise AI platform maximizes business value by delivering AI at scale and continuously optimizing performance over time.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Thank you

    We will contact you shortly

    Thank You!

    We’re almost there! These are the next steps:

    • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
    • Click the confirmation link to approve your consent.
    • Done! You have now opted to receive communications about DataRobot’s products and services.

    Didn’t receive the email? Please make sure to check your spam or junk folders.


    Newsletter Subscription
    Subscribe to our Blog