4 Key Data Prep Capabilities to Create a Single View of Customer Vendor Background

Best Practices for Creating and Operationalizing Data Lakes

August 30, 2018
by
· 2 min read

Data lakes have a wide range of use cases. For some, a data lake is an augmentation of an enterprise data warehouse. For others, it is a staging area accessible to technical teams for data science and machine learning. And for most, it has become a storage area for archiving data with the intention of unlocking its value at a later time. As Michelle Goetz of Forrester puts it, “ultimately, data lakes are a mechanism to rationalize data ecosystems, scale and democratize data, and serve a wider number of business use cases than previous data repositories.”1

In implementing data lakes, its best to think of a data lake as an application investment. You should always start with an evaluation phase, whereby all stakeholders have a chance to set their expectations and the team can assess requirements. Often, data lakes began as innovation grounds whereby the ideation for new data products or process optimization takes place. Regardless, in the evaluation phase, it is important to define the data lake success criteria in terms of its primary and secondary goals, and to continually show results. For starters, these questions can guide the implementation plans:

  • Who will the data lake serve and for which business use cases?
  • Will the primary users be data analysts, data scientists, data engineers, or a combination of these roles?
  • Do the users have the right skillsets? Are there technologies that can capitalize on the existing skillset of the team?
  • What discovery and exploration tools can help unlock the value of data lake quickly and continuously?
  • What ultimately defines the return on investment in the data lake? What are the short-term and long-term goals?

While the responsibility of implementing the data lake often lies with technical teams, the success of the project and its longevity depends on the adoption and on whether or not business teams and executive stakeholders can see the value in each phase of the implementation. This is often a challenge, as the tools and interfaces used to interact with data lakes are technical – too hard for business teams to use. To overcome this, perhaps the best piece of advice comes from Gartner: “Often, IT doesn’t understand what the data means, and some companies do not allow IT to know what it means. IT should architect the data lake with the focus on self-service capabilities so that the business can derive value from this data.”2 This is where self-service data preparation tools such as Data Prep can accelerate the value realization of data lake projects.

To help you create a proper strategy and well-crafted plan for building and operationalizing your data lake, we have created a best practices guide. 

Free Trial
DataRobot Data Prep

Interactively explore, combine, and shape diverse datasets into data ready for machine learning and AI applications

Try now for free

References:

  1. Battle Of The Data Lakes: Customer Insight Versus Enterprise Platform, Forrester Research, February 16, 2018
  2. Gartner, Inc., Use Design Patterns to Increase the Value of Your Data Lake, Henry Cook and Thornton Jared Craig, May 29, 2018
About the author
DataRobot

The Next Generation of AI

DataRobot AI Platform is the next generation of AI. The unified platform is built for all data types, all users, and all environments to deliver critical business insights for every organization. DataRobot is trusted by global customers across industries and verticals, including a third of the Fortune 50.

Meet DataRobot
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Thank you

    We will contact you shortly

    Thank You!

    We’re almost there! These are the next steps:

    • Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
    • Click the confirmation link to approve your consent.
    • Done! You have now opted to receive communications about DataRobot’s products and services.

    Didn’t receive the email? Please make sure to check your spam or junk folders.

    Close
    Newsletter Subscription
    Subscribe to our Blog