3 Enterprise Requirements Where Data Prep with Excel is Less Than Stellar Background

3 Enterprise Requirements Where Data Prep with Excel is Less Than Stellar

October 10, 2017
· 2 min read

Excel has long been the tool for business analysts to perform lightweight data preparation tasks – identifying outliers and errors, aggregating values, and combining data into one spreadsheet for analytics.  However, all too often, business users waste time using Excel to manually profile and process data.

Truth is that Excel is inadequate for enterprise projects that comprise large-scale data sets, involve group collaboration, and require data accuracy in a short amount of time.

Among many, there are 3 areas where Excel’s limitations are – to nicely put it – limiting and too time consuming for data preparation at scale:

1) Interactive with Data Beyond 1 Million Rows: With Excel, data is limited to a million rows. Even with less than that amount, the larger the number of rows, the slower Excel gets and the greater the chance of Excel crashing – and taking all of the user’s changes down with it.

2) Data Profiling: To profile data in Excel, users typically create filters and pivot tables – but problems arise when a column contains thousands of distinct values or when there are duplicates resulting from different spellings. And because Excel filters have no visual representation for each value, the user must switch back and forth between pivot tables and filtered data to get a (partial) understanding of the data.

3) Data Governance and Trust: With Excel, there is no actual audit trail or data lineage. You can’t see the steps taken to cleanse a particular dataset, aside from spending your time making sense out of complex macros. And even with that, you must save every version of Excel and apply comments to mark significant changes.

These requirements and more demonstrate where data preparation with Excel entirely lacks ‘enterprise’ readiness.

free trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free
About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog