Cloudera + DataRobot
Get the most value out of your data by integrating enterprise-grade scalability and predictive modeling.
Cloudera and DataRobot have partnered to deliver an integrated technology solution that runs in Hadoop and utilizes the power of automation and machine learning to extract the most value out of all your data.
The DataRobot integration with Cloudera Enterprise incorporates the best of data science, automated machine learning, and the massive processing power inherent in the Hadoop environment to empower users to get to business insight quickly by discovering and optimizing predictive models in less time.
The DataRobot integration with Cloudera Data Science Workbench enables data scientists and business analysts with a self-service approach to data science:
- Develop models in Python, R or Scala without having to worry about the details of Hadoop and Spark
- Utilize automated machine learning to optimize and tune models in order to create the most accurate models
- Easy and secure access to data and distributed processing
Cloudera was founded in 2008 by some of the brightest minds at Silicon Valley’s leading companies, including Google (Christophe Bisciglia), Yahoo! (Amr Awadallah), Oracle (Mike Olson), and Facebook (Jeff Hammerbacher).
Their founders held at their core the belief that open source, open standards, and open markets are best. That belief remains central to our values. Doug Cutting, co-creator of Hadoop, joined the company in 2009 as Chief Architect and remains in that role.
To manage and monitor the application, DataRobot uses Cloudera Custom Service Descriptors. You manage and monitor DataRobot using the exact same tools you use today. DataRobot uses Cloudera Manager to distribute runtime libraries to the Hadoop Data nodes; there are no extra installation and configuration tasks when you add a node.
DataRobot uses Sentry for fine-grained role-based authorization, and supports Kerberos and LDAP protocols. With DataRobot, your security protocols are exactly the same as they are for your other applications. Because DataRobot integrates natively and leverages HDFS, it inherits the encryption policies you implement in Cloudera. You do not need to implement additional controls to ensure the security of your data.
Cloudera Manager tracks DataRobot lifecycle events and security-related events just like any other process running in your Cloudera cluster. Cloudera also tracks the DataRobot analysis and model files, and Cloudera Navigator provides you with a visualization of data lineage.
DataRobot workloads runs in YARN containers, so DataRobot coexists with your other applications. You do not need to partition your cluster to prevent resource conflict – YARN handles that for you.
DataRobot runs on standard Hadoop-spec commodity hardware. It requires no long-running processes on Hadoop Data nodes. Unlike some commercial tools, you will not need to replace or upgrade your Hadoop servers. In addition, DataRobot does not need a proprietary storage layer; it works directly with HDFS. DataRobot users can work directly with HDFS files, and the application uses HDFS to store analysis datasets and models.
DataRobot uses Apache Spark for high-performance in-memory model scoring. Since DataRobot leverages Cloudera’s scale-out architecture, you can provision scoring for the scoring volume you need at your desired level of throughput.