DataRobot’s Oscar Prediction 2020: Who Will Win Best Picture?
This blog is meant to be a fun and unique take on predicting Best Picture for the 92nd Academy Awards.
The 92nd Academy Awards (Oscars) takes place on Sunday, February 9, 2020. This is a big night for the film industry as the awards signify international recognition of excellence for actors, directors, writers, designers, and much more. One of the biggest awards of the night is Best Picture with a variety of different nominees from across different genres. To celebrate the word’s love of art and our love of data, we’ve leveraged the power of automated machine learning to put together our own prediction for which nominee is most likely to win Best Picture this year.
How We’re Predicting Best Picture
In order to make this prediction, I first collected data from previous Best Picture nominees and winners for the last 92 years. Using information available in IMDb, I gathered data about each of the movies, such as the title, actors, plot, director, writers, and genre. Additionally, I included data such as the film release date, runtime, country the film was from, the language of the movie, and domestic & worldwide box office information. In order to capture the valuable information of what the public and experts think, I added public and critics ratings from both IMdb and Rotten Tomatoes.
Once the data was aggregated, I simply dropped the file into DataRobot, which automatically built over 100 models. I chose the most accurate of those models — an ensemble of a Light GBM Random Forest Classifier and an Auto-tuned K-Nearest Neighbors Classifier. DataRobot took care of feature engineering and built all the models.
Now, it’s time to reveal the results!
The Best Picture Award Goes to…
|Ranked in order of most likely to win Best Picture||Nominees for Best Picture|
|2||Ford v Ferrari|
A (Best) Picture is Worth a Thousand Words
An interesting way to visually assess textual data is with a word cloud. This allows you to see which words stand out the most for any given dataset. Below are two word clouds for the Best Picture prediction.
The redder the word, the more likely it is to win the Best Picture. The bluer the word, the less likely that film is to win Best Picture. And the size of the word indicates how often it shows up across all the nominated movie plots (large words appear more often than smaller words).
The word cloud below is for the target “genre”. As you can see, ‘drama’ is the biggest word and it’s red. This indicates that the Best Picture winner is more likely to be a drama (whereas ‘western’ is small and blue, so less likely to win for Best Picture). All three of our top selections are labeled as Dramas, which is reflected in the word cloud results below:
Word Cloud for “genre”.
This word cloud is for the target “topic” and covers a range of topics within the movies in the data set. For this word cloud, a film about ‘love’ (large and red) is more likely to win Best Picture than a film about ‘life’ (large and blue). Even though both words appear a lot (hence their large size compared to other topics), the fact that ‘love’ is in red means it’s more likely to win.
In terms of our predictions, “war” stands out in the word cloud as a big red word. Since 1917 is a war movie, this shows that it’s got a good chance to win for Best Picture.
Word Cloud for “plot”.
That’s a Wrap for this Prediction
If you’re planning to watch the Oscars and have a favorite movie that you’d like to win for Best Picture, think about all of the data involved in the planning, filming, and viewing stages. Data science provides insight and support for all sorts of things like ticket sales, release dates, expenses, and more. Here’s to a great (data-driven) awards season!
Interested in learning more about the art of AI storytelling in film? Check out this customer video with StoryFit.