DataRobot at PyData Silicon Valley 2014: Highlights, Tutorial and Slides

May 7, 2014

· 3 min read

We just returned from a great weekend in Menlo Park, CA (at Facebook’s HQ) where we attended PyData Silicon Valley 2014. The conference brought together top users and developers of data analysis tools in Python. It was a great place to share ideas on how to best apply the language and tools to address challenges in data management, processing, analytics and visualization.

We’d like to first thank NumFOCUS and all volunteers for organizing this great event and Facebook for hosting it at their headquarters in Menlo Park, CA.

This was our third time attending PyData, and this time we were especially looking forward to hearing how companies like Facebook use Python to analyze petabytes of data.

Here are some of our highlights of the conference:

Jason Sundram of Facebook gave a fantastic talk on “A Full Stack Approach to Data Visualization: Terabytes (and Beyond) at Facebook”. Jason demonstrated how he turns terabytes of data from Facebook into compelling, interactive, data-driven applications. He showed the audience different types of visualizations they create — and in particular, how some of it is displayed on a massive panel wall at Facebook.

Full house at Jason Sundram’s talk on viz of big data (aka small RAM). #pydata pic.twitter.com/dsMCmEtjix

— Eli Bressert (@astrobiased) May 3, 2014

Another talk we really enjoyed was by Greg Lamp from YHat on “ggplot for python”. Greg explained how ggplot provides a high-level grammar that allows user to quickly and easily make plots that are actually visually appealing. He gave an example of how ggplot works by analyzing a dataset of baseball pitches and identifying the vulnerabilities of certain players (such as low hit rate in certain regions of the strike zone) in an intuitive plot. You can view his whole tutorial here.

DataRobot and PyData

One of our data scientists, Peter Prettenhofer, lead a tutorial on Friday on his favorite data science algorithm, Gradient Boosted Regression Trees (GBRT). GBRT is a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. This algorithm is a key ingredient of many winning solutions in data mining competitions such as the Netflix Prize, the GE Flight Quest, and the Heritage Health Prize. Peter is the primary author of a popular implementation of Gradient Boosted Regression Trees in the Python machine learning toolkit, scikit-learn. Peter began his talk with a brief introduction to the GBRT model [slides here] and continued on with an in-depth tutorial dedicated to applying GBRT successfully in practice using scikit-learn. He covered topics including regularization, model tuning, and model interpretation — all of which can significantly improve your score on Kaggle. Peter ran the tutorial through an IPython notebook which you can download here.

Gradient Boosted Regression Trees in scikit-learn from DataRobot

@DataRobot @pprett Great talk! Very informative. Thanks! #pydata — Alessandro Gagliardi (@MadDataScience) May 2, 2014

In addition to leading this tutorial session, we also sponsored and exhibited. We were thrilled to give demos of the DataRobot beta product to many attendees. If you attended the event, saw our demo and would like to really give it a try yourself, request an invite to our beta program.

Until next time, PyData.

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

See other posts in AI & ML Expertise

Subscribe to our Blog

First Name

Last Name

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.

DataRobot at PyData Silicon Valley 2014: Highlights, Tutorial and Slides

How to Choose the Right LMM for Your Use Case

Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends

Choosing the Right Vector Embedding Model for Your Generative AI Use Case

Related Posts

Thanks! Check your inbox to confirm your subscription.