Statistical Learning in Python Blog Series Kick-off
This blog post is the first of a series that follows along with the “StatLearning” MOOC by Trevor Hastie and Rob Tibshirani in Winter 2014. We’ll show how to use many of the techniques they cover using Python instead of R.
Ever since I was exposed to data science and statistical machine learning, one book has always claimed the prime real-estate on my desk: The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. It is the seminal work on statistical learning and covers a wide range of statistical techniques for data analysis that we at DataRobot use on a daily basis. To me, the best part of the book is that it presents methods from both statistics and machine learning in a coherent and accessible way.
My colleagues and I were thrilled when two of the authors, Trevor Hastie and Robert Tibshirani, announced a Massive Online Open Course on statistical learning: StatLearning. It’s free to the general public and will be hosted on Stanford’s OpenEdX platform. The course runs from January 21, 2014 through March 22, 2014.
It is based on a new book that they co-authored with Gareth James and Daniela Witten, An Introduction to Statistical Learning. It covers much of the same material as Elements of Statistical Learning, but at a level more accessible to a broad audience and includes many examples of applied statistical learning using R, a domain-specific language for statistical computing. The course, like the book, will include many practical examples of statistical computing using R.
“R is the most powerful statistical computing language on the planet.” – Norman Nie
At DataRobot, R is one of two key languages we use on a day-to-day basis (the other being Python). We agree with Norman Nie: R definitely is the most powerful statistical computing language on the planet. However, many (if not most) productionalized data science projects cannot be realized in R alone.
Python is a general purpose programming language with a strong scientific computing stack that includes many of the statistical learning techniques taught in the course. Since more and more people are using Python for data science, we decided to create a blog series that follows along with the StatLearning course and shows how many of the statistical learning techniques presented in the course can be applied using tools from the Python ecosystem: “numpy”, “scipy”, “pandas”, “matplotlib”, “scikit-learn”, and “statsmodels.” Over the next two months we will reproduce many of the examples presented in the course using Python in place of R. From time to time, we may also cover some supplemental material and/or interesting case studies.
If you haven’t used Python yet,
This post was written by Jeremy Achin and Peter Prettenhofer. Please post any feedback,
Peter Prettenhofer is VP of Engineering at DataRobot. He studied computer science at Graz University of Technology, Austria and Bauhaus University Weimar, Germany, focusing on machine learning and natural language processing. He is a contributor to scikit-learn where he co-authored a number of modules such as Gradient Boosted Regression Trees, Stochastic Gradient Descent, and Decision Trees.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
Optimizing Large Language Model Performance with ONNX on DataRobot MLOpsJune 1, 2023· 11 min read
Belong @ DataRobot: AAPI Heritage Month with the ACTnow! CommunityMay 25, 2023· 3 min read
Deep Learning for Decision-Making Under UncertaintyMay 18, 2023· 5 min read
Many companies are experiencing mounting pressure to have a generative AI strategy, but most are not equipped to meaningfully put generative AI to work. For AI leaders, there are deeper questions you need to ask as you consider your path with generative AI.
Discover the challenges and benefits of big data in AI, downsampling, and smart sampling techniques to reduce data size without losing accuracy.
DataRobot 9.0 helps organizations scale the use of AI to create value enterprise-wide. Discover how it simplifies ML production, automates deployment, and manages model drift to maintain business value.