The 4 Ways Intelligent Automation Helps Achieve DataOps Background

Why You Should Adopt Zero-Code Data Preparation Now

January 17, 2019
· 3 min read

Arguably, data preparation has existed as long as data digitization has – in the form of data integration, ETL (extract, transform, load), data quality, and master data management, among others. Interestingly, the emergence of self-service data preparation applications coincides with the retelling of old narratives around the value of existing versus modern approaches. The biggest difference between the two can be summarized as follows: older tools are developer-centric and require coding or programming proficiency, while modern tools offer zero- or low-coding, along with visual point and click-type experiences aimed at business users.

The 80/20 Data Preparation Principle Is Still Prevalent

For the past 20 years, we have heard that 80% of analytical effort is spent on gathering and preparing data, while only 20% is actually spent on generating insights. While it was challenging enough in earlier times when most analytics were aimed at repeatedly answering a few known questions, today’s business environment demands answers to many more questions and often requires multiple explorative iterations. Adding to the complexity are technological advances such as data science, machine learning, and artificial intelligence projects.  If your business is hinging its future on becoming data-driven, can you really afford to spend 80% of your effort on data preparation, repeated across your data projects?

Modern Data Preparation Tools

Modern data preparation tools such as Self-Service Data Preparation bring two critical elements to the forefront:

  • Integration of previously disparate tools such as ETL, data quality, and MDM (master data management) into a single toolset supported by a rich, cloud-based platform.
  • User experience purpose-built for business users and analysts with a visual, Excel-like interface that allows users to find, profile, clean, enrich, join, and publish data with point-and-click actions — without requiring coding.

But I know Python, R, Informatica Power Center, SQL (Name Your Technology)

While these technologies are obviously powerful and will remain in your stack, the key question is: what is the best use of that technology? If Python is where you want to run your data science models, then keep it for running the models. But, for example, you should seriously consider coding a “Find and Replace” in Python or R to standardize all US STATE entries to full state names (e.g., California) versus abbreviated versions (e.g., CA).

Asking IT is No Longer an Option

Secondly, do you really want to keep your data-to-insights engagement model locked in a mode where your business team asks IT for a dataset, IT develops it using their toolset, and passes its interpretation of the request to the user? This approach puts an incredible burden on very scarce resources (IT developers and data scientists) and often require multiple iterations before the desired data set is produced.

The Benefits of Adopting Zero-Code Over Traditional Developer Code-centric Environments

  • Empowers business users, who have the context and understanding of the data, to prepare the data themselves.
  • Brings exponential productivity gains over coding approaches in terms of original development, re-use, and maintainability of programs.
  • Collaborative and emergent data governance, as all actions performed on the data are recorded with clear audit trails marking exactly where and when it was used.
  • Better IT productivity, as they can now focus on larger production data pipelines versus iterating back and forth on exploratory requests.
  • Improved business decision velocity, which hopefully result in better business outcomes.

Closing thoughts

I recently spoke to a product marketing friend, who excitedly told me about spending hours in Python extracting data from Marketo and Twitter, coding the joins, removing duplicates, and matching customer records across data sets.  My question is this: Is this truly the best use of a product marketer’s time?

In a recent Data Prep webcast, Forrester principal analyst, Noel Yuhanna, spoke of the emergence and need to modernize data architecture with a big data fabric. Noel posited that embracing zero-code data preparation is one of the key requirements.

Your business may strive to be data-driven, but if you collect petabytes of data in your data lakes while accessing it via a proverbial straw, you will never see the velocity of insights nor realize the associated business value that it should bring to your organization.

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free

About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog