Cloudera + DataRobot

Automated machine learning powered by the enterprise data hub

Contact us for a demo

Get the most value out of your data by integrating enterprise-grade scalability and predictive modeling.

Cloudera + DataRobot

Cloudera and DataRobot have partnered to deliver an integrated technology solution that runs in Hadoop and utilizes the power of automation and machine learning to extract the most value out of all your data.

The DataRobot integration with Cloudera Enterprise incorporates the best of data science, automated machine learning, and the massive processing power inherent in the Hadoop environment to empower users to get to business insight quickly by discovering and optimizing predictive models in less time.

The DataRobot integration with Cloudera Data Science Workbench enables data scientists and business analysts with a self-service approach to data science:

  • ✔ Develop models in Python, R or Scala without having to worry about the details of Hadoop and Spark
  • ✔ Utilize automated machine learning to optimize and tune models in order to create the most accurate models
  • ✔ Easy and secure access to data and distributed processing

Management

Management

To manage and monitor the application, DataRobot uses Cloudera Custom Service Descriptors. You manage and monitor DataRobot using the exact same tools you use today. DataRobot uses Cloudera Manager to distribute runtime libraries to the Hadoop Data nodes; there are no extra installation and configuration tasks when you add a node.

Security & Encryption

Security & Encryption

DataRobot uses Sentry for fine-grained role-based authorization, and supports Kerberos and LDAP protocols. With DataRobot, your security protocols are exactly the same as they are for your other applications. Because DataRobot integrates natively and leverages HDFS, it inherits the encryption policies you implement in Cloudera. You do not need to implement additional controls to ensure the security of your data.

Auditing & Lineage

Auditing & Lineage

Cloudera Manager tracks DataRobot lifecycle events and security-related events just like any other process running in your Cloudera cluster. Cloudera also tracks the DataRobot analysis and model files, and Cloudera Navigator provides you with a visualization of data lineage.

YARN for Workload Management

YARN for Workload Management

DataRobot workloads runs in YARN containers, so DataRobot coexists with your other applications. You do not need to partition your cluster to prevent resource conflict – YARN handles that for you.

Light Footprint

Light Footprint

DataRobot runs on standard Hadoop-spec commodity hardware. It requires no long-running processes on Hadoop Data nodes. Unlike some commercial tools, you will not need to replace or upgrade your Hadoop servers. In addition, DataRobot does not need a proprietary storage layer; it works directly with HDFS. DataRobot users can work directly with HDFS files, and the application uses HDFS to store analysis datasets and models.

Spark Scoring

Spark Scoring

DataRobot uses Apache Spark for high-performance in-memory model scoring. Since DataRobot leverages Cloudera’s scale-out architecture, you can provision scoring for the scoring volume you need at your desired level of throughput.

Want to become a DataRobot partner?

Become a partner Partner portal