The 4 Ways Intelligent Automation Helps Achieve DataOps Background

Why You Should Adopt Zero-Code Data Preparation Now

January 17, 2019
by
3 min

Arguably, data preparation has existed as long as data digitization has – in the form of data integration, ETL (extract, transform, load), data quality, and master data management, among others. Interestingly, the emergence of self-service data preparation applications coincides with the retelling of old narratives around the value of existing versus modern approaches. The biggest difference between the two can be summarized as follows: older tools are developer-centric and require coding or programming proficiency, while modern tools offer zero- or low-coding, along with visual point and click-type experiences aimed at business users.

The 80/20 Data Preparation Principle Is Still Prevalent

For the past 20 years, we have heard that 80% of analytical effort is spent on gathering and preparing data, while only 20% is actually spent on generating insights. While it was challenging enough in earlier times when most analytics were aimed at repeatedly answering a few known questions, today’s business environment demands answers to many more questions and often requires multiple explorative iterations. Adding to the complexity are technological advances such as data science, machine learning, and artificial intelligence projects.  If your business is hinging its future on becoming data-driven, can you really afford to spend 80% of your effort on data preparation, repeated across your data projects?

Modern Data Preparation Tools

Modern data preparation tools such as Paxata Self-Service Data Preparation bring two critical elements to the forefront:

  • Integration of previously disparate tools such as ETL, data quality, and MDM (master data management) into a single toolset supported by a rich, cloud-based platform.
  • User experience purpose-built for business users and analysts with a visual, Excel-like interface that allows users to find, profile, clean, enrich, join, and publish data with point-and-click actions — without requiring coding.

But I know Python, R, Informatica Power Center, SQL (Name Your Technology)

While these technologies are obviously powerful and will remain in your stack, the key question is: what is the best use of that technology? If Python is where you want to run your data science models, then keep it for running the models. But, for example, you should seriously consider coding a “Find and Replace” in Python or R to standardize all US STATE entries to full state names (e.g., California) versus abbreviated versions (e.g., CA).

Asking IT is No Longer an Option

Secondly, do you really want to keep your data-to-insights engagement model locked in a mode where your business team asks IT for a dataset, IT develops it using their toolset, and passes its interpretation of the request to the user? This approach puts an incredible burden on very scarce resources (IT developers and data scientists) and often require multiple iterations before the desired data set is produced.

The Benefits of Adopting Zero-Code Over Traditional Developer Code-centric Environments

  • Empowers business users, who have the context and understanding of the data, to prepare the data themselves.
  • Brings exponential productivity gains over coding approaches in terms of original development, re-use, and maintainability of programs.
  • Collaborative and emergent data governance, as all actions performed on the data are recorded with clear audit trails marking exactly where and when it was used.
  • Better IT productivity, as they can now focus on larger production data pipelines versus iterating back and forth on exploratory requests.
  • Improved business decision velocity, which hopefully result in better business outcomes.

Closing thoughts

I recently spoke to a product marketing friend, who excitedly told me about spending hours in Python extracting data from Marketo and Twitter, coding the joins, removing duplicates, and matching customer records across data sets.  My question is this: Is this truly the best use of a product marketer’s time?

In a recent Paxata webcast, Forrester principal analyst, Noel Yuhanna, spoke of the emergence and need to modernize data architecture with a big data fabric. Noel posited that embracing zero-code data preparation is one of the key requirements.

Your business may strive to be data-driven, but if you collect petabytes of data in your data lakes while accessing it via a proverbial straw, you will never see the velocity of insights nor realize the associated business value that it should bring to your organization.

Free Trial
DataRobot Paxata

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

About the author
DataRobot

Enabling the AI-Driven Enterprise

The leader in enterprise AI, delivering trusted AI technology and enablement services to global enterprises competing in today’s Intelligence Revolution. Its enterprise AI platform maximizes business value by delivering AI at scale and continuously optimizing performance over time.

Meet DataRobot
Share this post
Subscribe to our Blog

Thanks! Check your inbox to confirm your subscription.

Thank You!

We’re almost there! These are the next steps:

  • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
  • Click the confirmation link to approve your consent.
  • Done! You have now opted to receive communications about DataRobot’s products and services.

Didn’t receive the email? Please make sure to check your spam or junk folders.

Close

Newsletter Subscription
Subscribe to our Blog