Top 3 Criteria for Selecting a Data Preparation Tool Background

Top 3 Criteria for Selecting a Data Preparation Tool

November 16, 2017
· 2 min read

Organizations looking to add modern data preparation to their analytics technology arsenal have multiple choices – ranging from line of business, self-service solutions to modules from legacy, IT-centric data management platforms.

Diverse use cases, varied skill levels, and unique business requirements make the data preparation tool selection process complex and confusing. Knowing the correct evaluation and selection criteria would go a long way towards helping organizations clarify goals and guiding them through the decision-making process.

In our experience, these three criteria are most commonly considered:

  1. User Interface:  Some data preparation tools offer visual drag-and-drop or spreadsheet-like user interfaces. Others utilize scripting or coding to convey data preparation instructions.If non-technical users will be using the data preparation tool, a spreadsheet-like user interface is highly advantageous, given many business analysts know and use Excel. The familiar, Excel-like user interface is natural and intuitive to them. For this group of users, working directly with data and logic instead of abstractions and workflows increases their confidence level and accelerates iterative data discovery and preparation cycles.
  2. Governance:  Data preparation tools vary widely in their approach to data governance, but because workflow is a fundamental part of data preparation, all tools offer data lineage tracking. Self-documenting data preparation tools offer especially strong data lineage capabilities. They record each data preparation step as it occurs. As the data changes, each operation that transforms, cleans, or blends data is documented automatically. For example, if a user removes white spaces from a column, that action gets documented, which then creates repeatability and enables users to govern data as they simultaneously discover it.
  3. Sampling Limitations for Profiling DataWhen working with data that is highly standard and predictable, it is acceptable to work with data samples to build data preparation processes and then apply those processes to an entire data collection. However, when data is less known and its structure is highly complex, the probability of unexpected outcomes increases. In this case, samples may not include all of the outliers and anomalies that exist in a full data collection.When working with uncertain data, it is critical for the data preparation tool to have the ability to work with the entire data set, not just a sample. This will help to avoid any unpleasant surprises which may arise from sampling alone.

Having a good set of criteria is essential for choosing a data preparation tool that meets your needs today and grows with your organization in the future.

DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Get free access now
About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog