More than
Models built by datarobot customers

DataRobot at PyData London 2014: Highlights and Presentation Slides

February 26, 2014


We just returned from an exciting weekend in London where we attended PyData London 2014. The conference brought together users and developers of data analysis tools in Python to share the experiences, techniques, triumphs, and pitfalls of using Python for different projects.

This was the first PyData conference to be held in Europe and it was fantastic to experience the thriving data science and start-up community both in London and at the conference itself. We’d like to thank NumFocus  and Ian Ozsvald (@ianozsvald) and his team  for hosting such an important event packed with exciting and informative talks. PyData was hosted at Level39, an accelerator/hacker space in the Canary Wharf financial district with a spectacular view over London.


It was our second time attending PyData, and this time we were especially looking forward to learning about the use of Python in the finance industry and meeting people who were doing just that.

These are our highlights from the conference:

Felix Fernandez and the Financial Industry 

Felix Fernandez, CIO of the Cash & Derivatives IT department within Deutsche Börse IT (German Stock Exchange), gave the first keynote of the conference on the role of Python in the financial industry. His talk was a reflection on his experience using Python for financial applications such as pricing simulation, quantitative modeling and automation, and the potential of Python as a universal tool for end-to-end development.

We hope (and believe that) many other companies and industries will start to embrace Python as a one-stop-shop for data analysis and application development.

Gael Varoquaux and Imaging Data

The second keynote was given by Gael Varoquaux, a neuroscience researcher by day and vivid open-source developer and project lead at night. Gael delivered an exciting talk on “Building a Cutting-Edge Data Processing Environment on a Budget” which included some of the (universal) lessons learned at his lab while working on analyzing brain-imaging data. He also presented some fascinating insights into the community dynamics in open source software and shared his experience on how to build (and not to build) a community around a software project.


Dirk Gorissen and Mobility Patterns

Dirk Gorissen spoke later on about measuring and predicting departures from routine in human mobility. He presented to us a Bayesian framework that analyzes an individual’s mobility patterns and identifies departures. This framework is able to detect both spatial and temporal departures from routine based on heterogeneous sensor data (GPS, Cell Tower, eye-witness). If you are interested in building the next-generation Google Now, you should check out the recorded talk — but beware: you should be solid in Graphical Models.


James Powell and ForPythonQuants

James Powell, Python guru and chair of the New York Python Meetup, gave a number of entertaining and impressive talks on advanced Python, including “Embeddings of Python” and the “Generator Showcase Showdown”. We highly recommend his talks to any aspiring Pythonista! (You can checkout his blog at where he posts a lot of this information.) James is also organizing a conference for Python in quantitative finance, ForPythonQuants , in NYC on the 14th of March. If you are into quantitative finance and live in the Boston or New York area, be sure to sign up — we’ll see you there!

DataRobot and PyData

One of our data scientists, Peter Prettenhofer, took the opportunity to talk about his favorite data-science algorithm, Gradient Boosted Regression Trees, a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. This algorithm is the key ingredient of many winning solutions in data-mining competitions such as the Netflix Prize, Heritage Health Prize, or the GE Flight Quest, which was won by our Data Science lead, Xavier Conort.

Peter is the primary author of a popular implementation of Gradient Boosted Regression Trees in the Python machine learning toolkit, scikit-learn. His talk was an in-depth discussion on how to apply Gradient Boosting successfully in practice. The talk was recorded but is not yet available, so in the meantime you can find the slides below:

In addition to giving one of the talks, we also gave away  DataRobot t-shirts, talked to attendees about their use of predictive modeling, and even gave some people an early peek at our product.

PyData was a huge success this year and we’re already looking forward to the next PyData 2014 in Berlin — see you there!