Self Service Data Preparation Are You Seeing The Full Picture Background
  • Blog
  • Data Preparation
  • Self-Service Data Preparation Powered by Big Data Fabric: The Secret to Becoming Information-Inspired

Self-Service Data Preparation Powered by Big Data Fabric: The Secret to Becoming Information-Inspired

November 6, 2018
by
4 min

Last week, industry analyst firm Forrester Research recognized Paxata as a leader in The Forrester Wave™: Data Preparation Solutions, Q4 2018. This report updated The Forrester Wave™: Data Preparation Tools, Q2 2017, where Paxata was also recognized as a leader. This honor added to our excitement of Paxata already receiving leadership recognition in The Forrester Wave™: Big Data Fabric, Q2 2018 earlier this year. In fact, Paxata is the only vendor acknowledged as a leader in both of these Wave reports. While it is always a pleasure to celebrate our leadership positions, of greater importance is that these accolades confirm and combine two extremely critical data management concepts: that a modern data management architecture must have world-class self-service data preparation capabilities.

Big Data Fabric Defined

Big data fabric is a data management framework that brings your diverse, multi-cloud, and hybrid data landscape under management. It speaks to the importance of playing the data where it lies, and the need for end-to-end metadata management and governance. And it points to using this same data architecture to fuel and power all of the data-driven use cases you might have in your organization – such as a single customer view, operational efficiencies, analytics, data science, or new products. Data preparation – i.e., the ability to easily find, clean, shape, and prepare data from your data landscape for use in any downstream use case – is deemed a critical capability for big data fabric by Forrester.

Self-Service Data Preparation Defined

Adding self-service to the general data prep description implies that we wish to empower not only traditional technical experts like developers, but also average business consumers of data – such as data analysts, citizen data scientists, citizen data engineers, and business analysts. By providing an easy-to-use, Excel-like, visual interface, users can easily interact in real-time with their data. Intelligent algorithms embedded into the system can dynamically profile the data and provide recommendations on how to clean or standardize it.

Self-Service Data Prep Plus Big Data Fabric Accelerates Your Data-driven Initiatives

We have all heard the idea that data is the new oil fueling digital transformation. While true, this is only half the story. Because on its own, the only thing raw data can fuel is the pockets of those you pay to store your data. To be truly insightful, raw data needs to be turned into information: data that has context and is clean, complete, and consumable.

The challenge is that this must be done across a highly dispersed data landscape at scale, so that every person, process, app, and device in the enterprise can become smarter and more informed.

Here are five key requirements that will determine your success:

  • Ease of use to empower business consumers at scale. To empower the entire business with information, we must first enable them to get to the data, wherever it might reside. Then we must help them understand that data and prep it by themselves for their needs – whether it be a new BI report, an Excel model, or data science project. Accomplishing this means providing an easy, user-friendly experience that allows them to prep data via intuitive, point-and-click interfaces that do not require programming skills.
  • Intelligence to ensure business users don’t get it wrong. Embedded algorithms and AI (artificial intelligence) should continually profile the data and guide the user on ways to clean, standardize, and recommend ways to join or combine data. Not only does this accelerate the process, it also provides “guardrails” to ensure casual users do not make errors, such as performing a cartesian product when joining datasets.
  • Powered by an adaptive, elastic architecture that can scale out and contract as needed. Successful data projects require speed on one side and the ability to scale to large data and compute capacity on the other. Your big data fabric management and data prep architecture must be able to provide both of these dimensions. You need the ability to quickly spin up your cluster, in any desired cloud environment, load your millions of data rows into your visual data prep interface, publish the results, and then break it all down again when the project is done. This impacts not only your business’ agility, but also the cost of delivering insights to your organization.
  • Enterprise governance and security. While this is obvious, it bears repeating. Siloed, point solutions do not provide end-to-end data governance, data lineage, security, or any of the building blocks for an enterprise data governance strategy.
  • Collaboration and sharing. If the entire organization is going to embrace the data-to-information journey, then it must accept the fact that data sharing is the ultimate team sport. It requires peer-to-peer collaboration on projects and sharing results. It means reusing other users’ datasets to avoid reinventing the wheel. It also calls for those with deep technical skills to educate and collaborate with casual users, so that the entire organization can learn, grow, and become more data-savvy.
Free Trial
DataRobot Paxata

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

About the author
DataRobot

Enabling the AI-Driven Enterprise

The leader in enterprise AI, delivering trusted AI technology and enablement services to global enterprises competing in today’s Intelligence Revolution. Its enterprise AI platform maximizes business value by delivering AI at scale and continuously optimizing performance over time.

Meet DataRobot
Share this post
Subscribe to our Blog

Thanks! Check your inbox to confirm your subscription.

Thank You!

We’re almost there! These are the next steps:

  • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
  • Click the confirmation link to approve your consent.
  • Done! You have now opted to receive communications about DataRobot’s products and services.

Didn’t receive the email? Please make sure to check your spam or junk folders.

Close

Newsletter Subscription
Subscribe to our Blog