4 Key Data Prep Capabilities to Create a Single View of Customer Vendor Background

Best Practices for Creating and Operationalizing Data Lakes

August 30, 2018
· 2 min read

Data lakes have a wide range of use cases. For some, a data lake is an augmentation of an enterprise data warehouse. For others, it is a staging area accessible to technical teams for data science and machine learning. And for most, it has become a storage area for archiving data with the intention of unlocking its value at a later time. As Michelle Goetz of Forrester puts it, “ultimately, data lakes are a mechanism to rationalize data ecosystems, scale and democratize data, and serve a wider number of business use cases than previous data repositories.”1

In implementing data lakes, its best to think of a data lake as an application investment. You should always start with an evaluation phase, whereby all stakeholders have a chance to set their expectations and the team can assess requirements. Often, data lakes began as innovation grounds whereby the ideation for new data products or process optimization takes place. Regardless, in the evaluation phase, it is important to define the data lake success criteria in terms of its primary and secondary goals, and to continually show results. For starters, these questions can guide the implementation plans:

  • Who will the data lake serve and for which business use cases?
  • Will the primary users be data analysts, data scientists, data engineers, or a combination of these roles?
  • Do the users have the right skillsets? Are there technologies that can capitalize on the existing skillset of the team?
  • What discovery and exploration tools can help unlock the value of data lake quickly and continuously?
  • What ultimately defines the return on investment in the data lake? What are the short-term and long-term goals?

While the responsibility of implementing the data lake often lies with technical teams, the success of the project and its longevity depends on the adoption and on whether or not business teams and executive stakeholders can see the value in each phase of the implementation. This is often a challenge, as the tools and interfaces used to interact with data lakes are technical – too hard for business teams to use. To overcome this, perhaps the best piece of advice comes from Gartner: “Often, IT doesn’t understand what the data means, and some companies do not allow IT to know what it means. IT should architect the data lake with the focus on self-service capabilities so that the business can derive value from this data.”2 This is where self-service data preparation tools such as Data Prep can accelerate the value realization of data lake projects.

To help you create a proper strategy and well-crafted plan for building and operationalizing your data lake, we have created a best practices guide. 

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free


  1. Battle Of The Data Lakes: Customer Insight Versus Enterprise Platform, Forrester Research, February 16, 2018
  2. Gartner, Inc., Use Design Patterns to Increase the Value of Your Data Lake, Henry Cook and Thornton Jared Craig, May 29, 2018
About the author

Value-Driven AI

DataRobot is the leader in Value-Driven AI – a unique and collaborative approach to AI that combines our open AI platform, deep AI expertise and broad use-case implementation to improve how customers run, grow and optimize their business. The DataRobot AI Platform is the only complete AI lifecycle platform that interoperates with your existing investments in data, applications and business processes, and can be deployed on-prem or in any cloud environment. DataRobot and our partners have a decade of world-class AI expertise collaborating with AI teams (data scientists, business and IT), removing common blockers and developing best practices to successfully navigate projects that result in faster time to value, increased revenue and reduced costs. DataRobot customers include 40% of the Fortune 50, 8 of top 10 US banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telcos, 5 of top 10 global manufacturers.

Meet DataRobot
  • Listen to the blog
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog