Tackling Daily Fantasy Football with Data Science

January 6, 2020
by
5 min

As a football fan who is also a data scientist, the one question I get asked the most by my friends is this:

“Can you help me win my fantasy football league using predictive analytics and modeling, Mr. Data Scientist?” 

Despite being a lifelong football fan (Go Bears!), I’ve never played fantasy football, but I’m certainly aware of it; how can you not be? More than 59 million Americans played fantasy sports in 2017, according to the Fantasy Sports & Gaming Association. That number has only grown since and over the past few years, fantasy sports has reached its latest – and perhaps most significant – evolution: the creation of Daily Fantasy Sports (DFS).

As opposed to drafting one team of real-life players and tracking their production throughout the whole season – as typical fantasy football leagues are like – DFS companies (like DraftKings and FanDuel) have attempted to ratchet up the competition and excitement by turning those season-long competitions into hundreds of daily and weekly competitions where players can draft as many teams as they want. The prize pools of these competitions are also massive; players can win up to $1 million bucks by being good at DFS!

Unfortunately, having never played…I’m probably not very good at DFS. But this struck me as a problem well-suited for predictive modeling. Once I saw how easily the DataRobot platform makes model building by automating huge parts of the process, I thought this would be a good modeling experiment to see if – with the right data, a powerful Enterprise AI platform, and some advice from the right subject matter expert – this DFS novice could compete!

Method to My Madness

Like any good data scientist, the first step to my project was to frame and understand the “business problem” by talking to a subject matter expert; in this case, a DataRobot employee, Gareth Goh, who used to be a professional DFS player and 2018 DraftKings World Championship qualifier. He outlined the basic framework for what we were trying to solve:

  • Predict which NFL players would play well on a given Sunday, based on their matchup and other contextual factors.
  • Identify the players that would provide the biggest return for value (fantasy points production relative to their assigned DraftKings salary).
  • Build a roster within the constraints of the $50,000 “salary cap”.
  • Use game theory principles to “zig while others zag”. These large-field tournaments on DraftKings were filled with literally tens of thousands of different fantasy lineups and the prizes were so top-heavy that it was important to differentiate your lineup and strategy as best as possible in order to have a shot at first place.

With that in mind, I set out to build a model using DataRobot to predict player performance. Accessing data and projections from various fantasy data providers and using a “wisdom-of-the-crowd” approach — along with lagged weekly player stats going back several weeks in order to enhance the projections — I prepped and cleaned that data and loaded it into DataRobot to build my predictive model. I let DataRobot handle all the additional feature engineering.

pasted image 0111123

DataRobot’s blueprints provide different layers of visibility in how features are being engineered.

I then built a simple optimizer that applied constraints – the $50,000 salary cap, along with the positions (one Quarterback, two Running Backs, three Wide Receivers, one Tight End, one Flex, and one Defense / Special Teams) – and took in the model’s predictions to build a lineup. I entered it into a tournaments on DraftKings on week 1 …and proceeded to win zero dollars! Back to the drawing board for week 2.

Gareth mentioned that most players will enter multiple lineups – as many as 150 in a tournament – to create multiple combinations. With that in mind, I advanced the optimizer to build more lineups, adjusting the project to include prediction interval estimates along with point estimates. Using these estimates, I could now build distributions for each player. I also added a new constraint to the optimizer that took into consideration each player’s distributions but being careful not to be overleveraged on any single player (for example, not having Tom Brady in more than 50% of my optimizer’s lineups).

image (9)12333

The model and optimizer generated dozens of lineups, of which I picked a handful to enter into tournaments in week 2…to much better results! My best lineup won $55 on an entry fee of just $3, a pretty significant ROI.

                Screen Shot 2019-12-17 at 11.36.07 AM Screen Shot 2019-12-17 at 11.36.34 AM

Lessons Learned from the Gridiron

The rest of the season, like with any sort of gambling (especially in football) saw lots of ups and downs, as I tried to learn more about the nuances of the game while continuously tweaking my model. I remained convinced that this was a very natural data science problem: using historical data with constraints and appropriate context to make predictions, and then using those predictions to make “business” ROI-driven decisions was exactly what we help companies do here at DataRobot.

pasted image 01112333

The DataRobot model struggled to accurately predict the performance of Defense/Special Teams – not a huge surprise; even amid the wild randomness of the National Football League, defensive performances are particularly full of variance.

pasted image 0222333

The Running Back model, on the other hand, proved to be very accurate in its predictions and points projections.

But football is a notoriously variance-driven sport, and the game theory nature (and overall difficulty) of daily fantasy sports made me realize that subject matter expertise in conjunction with the best predictive models is always crucial. The DataRobot model did a good job predicting running back and wide receiver scoring, while struggling more with defenses / special teams. Intuitively, this makes sense; running back production is largely a function of volume, where the more carries a running back gets, the more likely they are to rack up yardage, touchdowns and fantasy points. Defensive production meanwhile is one of the most random aspects of a very random sport; fantasy scoring for defenses is generally lower than other positions, so an impossible-to-predict situation like a defensive unit scoring a touchdown can skew the projections significantly.

Beyond the accuracy of the model, the DataRobot platform also allowed me to build a pretty complex workflow in very little time, iterating fast and exploring more complex situations in addition to spitting out predictions. Fantasy football is an extremely random game within a very random sport, but DataRobot’s automation and accuracy makes it easy for a novice — with access to some quality data — to take their shot at playing Daily Fantasy Sports.

New call-to-action

About the author
Matt Marzillo

Data Scientist

Matt Marzillo is a data scientist at DataRobot, based out of Chicago. He’s currently enabling customers with data science projects with a primary focus in healthcare, including payers, providers, and life science organizations. Prior to DataRobot Matt has worked with several healthcare companies both as an internal data science leader and as a consultant. Matt has an MS in Predictive Analytics from Northwestern University where he still holds an appointment as an Adjunct Instructor.

Meet Matt Marzillo