Top 3 Criteria for Selecting a Data Preparation Tool Background

Top 3 Criteria for Selecting a Data Preparation Tool

November 16, 2017
by
2 min

Organizations looking to add modern data preparation to their analytics technology arsenal have multiple choices – ranging from line of business, self-service solutions to modules from legacy, IT-centric data management platforms.

Diverse use cases, varied skill levels, and unique business requirements make the data preparation tool selection process complex and confusing. Knowing the correct evaluation and selection criteria would go a long way towards helping organizations clarify goals and guiding them through the decision-making process.

In our experience, these three criteria are most commonly considered:

  1. User Interface:  Some data preparation tools offer visual drag-and-drop or spreadsheet-like user interfaces. Others utilize scripting or coding to convey data preparation instructions.If non-technical users will be using the data preparation tool, a spreadsheet-like user interface is highly advantageous, given many business analysts know and use Excel. The familiar, Excel-like user interface is natural and intuitive to them. For this group of users, working directly with data and logic instead of abstractions and workflows increases their confidence level and accelerates iterative data discovery and preparation cycles.
  2. Governance:  Data preparation tools vary widely in their approach to data governance, but because workflow is a fundamental part of data preparation, all tools offer data lineage tracking. Self-documenting data preparation tools offer especially strong data lineage capabilities. They record each data preparation step as it occurs. As the data changes, each operation that transforms, cleans, or blends data is documented automatically. For example, if a user removes white spaces from a column, that action gets documented, which then creates repeatability and enables users to govern data as they simultaneously discover it.
  3. Sampling Limitations for Profiling DataWhen working with data that is highly standard and predictable, it is acceptable to work with data samples to build data preparation processes and then apply those processes to an entire data collection. However, when data is less known and its structure is highly complex, the probability of unexpected outcomes increases. In this case, samples may not include all of the outliers and anomalies that exist in a full data collection.When working with uncertain data, it is critical for the data preparation tool to have the ability to work with the entire data set, not just a sample. This will help to avoid any unpleasant surprises which may arise from sampling alone.

Having a good set of criteria is essential for choosing a data preparation tool that meets your needs today and grows with your organization in the future.

Event
DataRobot Paxata

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

About the author
DataRobot

Enabling the AI-Driven Enterprise

The leader in enterprise AI, delivering trusted AI technology and enablement services to global enterprises competing in today’s Intelligence Revolution. Its enterprise AI platform maximizes business value by delivering AI at scale and continuously optimizing performance over time.

Meet DataRobot
Share this post
Subscribe to our Blog

Thanks! Check your inbox to confirm your subscription.

Thank You!

We’re almost there! These are the next steps:

  • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
  • Click the confirmation link to approve your consent.
  • Done! You have now opted to receive communications about DataRobot’s products and services.

Didn’t receive the email? Please make sure to check your spam or junk folders.

Close

Newsletter Subscription
Subscribe to our Blog