DataRobot Snowpark Blog image BG v.1.1
  • Blog
  • AI & ML Expertise
  • DataRobot Enables Scalable Feature Engineering and Inference Leveraging Snowflake Snowpark/Java UDFs

DataRobot Enables Scalable Feature Engineering and Inference Leveraging Snowflake Snowpark/Java UDFs

June 9, 2021
by
· 3 min read

Although the fields of machine learning and AI have begun to mature, there is often a lack of success in applying AI in organizations, an issue DataRobot calls the “AI production gap.”

One of the leading common issues we see in the machine learning life cycle (feature engineering, training, scoring, etc.) is the involvement of a variety of separate and often slow compute environments. This often unduly increases model pipeline complexity, with major nuances being the pulling and shifting of data from where your data lives in production databases and handing machine learning operations teams a series of model artifacts with dependencies that need to be reassembled to put that model into production.

To help combat this problem, in the DataRobot 6.3 release, DataRobot announced the release of Portable Prediction Servers, allowing organizations to bring any DataRobot model closer to their production data as well as integrate into already existing pipelines and applications.

With Snowflake’s announcement of Snowpark and Java UDFs (user-defined functions), DataRobot has continued to expand on this theme. With Snowpark, data preparation tasks in Zepl can be pushed down into Snowflake for in-database feature engineering. And to further reduce the disparate compute environment problem between machine learning models and data, DataRobot Java Scoring Code can be paired with Snowflake Java UDFs for in-database model scoring/inference.

Using Snowpark in Zepl for Feature Engineering

Snowpark is a new developer experience for Snowflake, allowing you to build efficient and powerful pipelines with familiar constructs in your programming language of choice. Snowflake has always delivered performance and ease-of-use for users familiar with SQL. Now Snowpark enables users to write in Scala and Java using a DataFrame model that is widely used and familiar.

Inside a Zepl Notebook, users simply set their cell runtime to “%snowpark” and configure a “Snowpark” data source. From there, code executed in Zepl will be translated and pushed down to the Snowflake platform where the data is already living, taking advantage of Snowflake’s performance, scalability, and concurrency.

Here we can see sample Scala code that hits a Snowflake table and returns a set of filtered rows:

Screen Shot 2021 06 03 at 11.48.13 AM

Pairing DataRobot Scoring Code Models with Java UDFs

With Java UDFs, customers can run Java functions right inside Snowflake’s Data Cloud with better performance, scalability, and concurrency over hosted external services.

One mechanism DataRobot supports for exporting and running models in external environments is Java Scoring Code. These Scoring Code JARs contain prediction calculation logic identical to the DataRobot API, can be run anywhere Java code can be executed, and are often the best choice for low-latency, high-scale scoring.

Models that support Java Scoring Code can be identified by their tag in the leaderboard:

Screen Shot 2021 05 12 at 1.10.19 PM

Once models are deployed to a Snowflake Prediction Environment, users execute the generated script to upload the JAR and create an associated UDF to perform inference directly in Snowflake.

pasted image 0 3

With the Java UDF, users can make predictions anywhere they currently leverage Snowflake—all without moving any data outside the database. 

To close the loop and understand a model’s performance on data in production, users can ingest service and prediction data back into DataRobot MLOps for analysis.

pasted image 0 4

Availability


The Zepl integration that supports a Snowpark runtime and the integration for running Java Scoring code in Snowflake Java UDFs will be available in public beta in the DataRobot 7.1 release, going live June 15, 2021.

About the author
Miles Adkins
Miles Adkins

Partner Sales Engineer at Snowflake

Miles is a partner sales engineer at Snowflake and leads the joint technical go-to-market for Snowflake’s machine learning and data science partners.

Meet Miles Adkins

Chris Cozzi
Chris Cozzi

Chris Cozzi is a product manager on DataRobot MLOps. Before he joined DataRobot, Chris worked on product management in enterprise analytics and healthcare marketing.

Meet Chris Cozzi
  • Listen to the blog
     
  • Share this post
    Subscribe to DataRobot Blog
    Newsletter Subscription
    Subscribe to our Blog