Booststrapping a Modern Data Science Education

August 27, 2018
by

Whether you live in Tokyo or El Salvador, if you are moderately intelligent, possess the grit and passion — you can obtain a data science education. This may be an idealistic overstatement, but with dawn of the Internet age, this statement is more true now than ever before.

Many of the world’s universities such as Stanford, MIT, and Tsinghua are releasing their courses online as Massively Open Online Courses, or MOOCs for short. It’s an exciting time to enter the data science field, and universities need to think through the value they can add as content becomes commoditized (though, this is a larger subject worthy of its own piece).

DataRobot is known as a product company but what few realize is that DataRobot is also an education company. Below are some thoughts and resources to help you accelerate your data science educational journey, as well as an introduction to the game changer that is automation — making modeling accessible to all.

First, let’s decompose what it means to be a data scientist. There are many definitions, but it is best represented visually in this Venn diagram popularized by Drew Conway:

That is to say, one needs to have some degree of knowledge of the underlying techniques to be applied, the programming ability to implement the vision into a reality, and the domain knowledge to understand how an analysis or model will add value. If you are sharpening your skill set in any of these three areas you are systematically improving yourself as a data scientist.

Let’s look into some references within each skill set:

Math & Statistics:

• Prediction is at the heart of data science, and some would say it’s the core of science itself. Elements of Statistical Learning by Hastie, Tibshirani, and Friedman is often considered the gold standard in introducing the theory of many of the common predictive modeling algorithms. Introduction to Statistical Learning is a simplified reference written by many of the same authors and colleagues.  Why not hear the algorithm explanations straight from the horse’s mouth as the authors offer a nice Statistical Learning MOOC through Stanford?

• We recommend to start with the linear or logistic regression algorithm as a base. Logistic regression is the building block of fancier algorithms such as the deep learning algorithms that you often read about today.

Programming Ability:

• Data science is a team sport and coincidentally every software engineering project you undertake is also collaboration. At the minimum, it is a collaboration between you and your future self. Therefore, you want to strive for code that is readable, well-documented, and tested and thus easier to maintain. These are universal principles regardless of whether or not you write the analysis in R/Python/Julia/etc. Remember to ‘be kind to your future self’.

• R & Python are the modern lingua franca’s of data science.  Within DataRobot, a common question we often get asked is, “Which one should I use?” Ultimately we actually see this as counterproductive and instead recommend focus on the problem at hand, and reach for the tool you believe presents the lowest friction for you to reach a solution. One can try both and focus on building a deeper foundation within the approach that comes more naturally. Within DataRobot we have made it a point to offer our DataRobot API in both languages so the user is not faced with an ultimatum.

Domain Knowledge:

• When presenting your work, the last thing you want to convey is an image of an out-of-touch data scientist who is disconnected with the real world. Therefore, as a first project, the only recommendation we have here is to apply your knowledge to a field you are intrinsically passionate about. The storytelling and communication aspect of the project will also be easier as a result.

There are many great educational platforms such as coursera, edX, udemy, udacity, datacamp, kaggle learn, fast.ai, etc. No one platform has a monopoly on great content and within the rapidly changing space of data science, you should seek truth and knowledge wherever you find it.

However, as unbiased as we tried to be within DataRobot, we can’t hide our fondness for DataCamp. The focus on data science and the learning-by-doing approach makes them a company worth checking out as a burgeoning data scientist.

Full Disclosure: DataRobot employees teach courses on Datacamp.

“The measure of how well you learned something is the degree to which you can build with it” – Rachel Thomas, Co-Founder fast.ai

The economy is built upon products and services. To make a living as a data scientist, you need to build something or teach others. Therefore, we strongly recommend that as soon as possible, one get into trenches and start applying the knowledge. This can be done through a Kaggle competition, a reproducible analysis via a blogpost, an open source contribution, real world applications within your community, etc.

The Game Changer in Automation

Acquiring the quantitative skill set is essential in the modern economy, but there is no getting around the significant upfront time investment required. This will take months or years to acquire only to find out later that much of what you learned will not be valued by the business. (EX: Your boss probably will not care how many hidden layers you used within your neural network.)

DataRobot was one of the first to build a platform that automates the mathematical and programming aspects of the workflow and thus enabling the user to operate on a higher level of abstraction and focus on the core business problem.

Education can now begin on the practical business level allowing those with the knowledge of the business and the data to be immediately useful. This practical philosophy is the driving force behind DataRobot University. DataRobot offers one and two day intensive DataRobot University training courses for data scientists and non-data scientists alike around the world.

DataRobot University also offers classes for the executive team designed to help executives  identify machine learning opportunities and cut through the hype. The educational strategy should not to be overlooked when scaling data science across an organization and transforming into an AI-driven enterprise.

See if an upcoming course is coming to you here: https://www.datarobot.com/education/

The path to becoming a data scientist is not limited to those with a PhD. Machine learning has the potential to make an impact on almost every facet of modern life. You don’t have to ask for anybody’s permission to acquire the knowledge.  Take it. This is not to say it will be easy but the implications of this technology will only increase over time. We wish you luck on your journey and we will see you on the other side.

– DataRobot Team

Igor Veksler is on the Customer Facing Data Science team at DataRobot. The mandate of the Customer Facing Data Science team is to laser focus on customer success and work collaboratively with the customer in enabling the AI-driven enterprise.