GRAMMY Predictions 2022: Composing DataRobot Blueprints
This blog is meant to be a fun take on predicting Song and Record of the Year for the 64th Annual GRAMMY Awards.
It’s that time of year again. The GRAMMY awards are right around the corner, which means that we have yet another opportunity to leverage DataRobot to make some fun predictions as to who will win Song and Record of the Year. Last year, Billie Eilish went back-to-back, winning Record of the Year for her smash hit “everything i wanted,” and H.E.R took home Song of the Year for her powerful and all-important ballad “I Can’t Breathe.”
For those who haven’t read the prior blogs (2019, 2020, 2021), the idea behind this task is to leverage various sources of data for each track (betting odds, audio analysis, lyric sentiment) to rank which ones are most likely to win the aforementioned awards. Notably (and something I emphasize with each blog), the Recording Academy’s decision criteria for choosing winners is far more complex than we’ll ever be able to express in a tabular dataset; however, this yearly exercise is about demonstrating the diversity of problem domains that machine learning can be applied to rather than correctly predicting the winners, though this has happened in the past. So far, DataRobot has predicted Song of the Year (2019) and Record of the Year (2020) correctly, and the winning track has appeared in the top three most likely list for Song of the Year (2020) and Record of the Year (2021).
This past summer, DataRobot announced the addition of Composable ML to our AI Cloud platform. This enhancement provides the flexibility for data scientists to customize DataRobot blueprints either by using built-in tasks or custom code. This sort of capability has been one of the most (if not the most) requested features I’ve heard from data scientists since joining DataRobot, including myself. Given this new capability, I decided to try it out for this year’s awards ceremony.
First, as I’ve done in previous years, I model each award show separately, leveraging DataRobot’s modeling APIs to iterate through hundreds of blueprints. Throughout the process, I perform various tasks such as feature selection (i.e., eliminating the least important factors), hyperparameter tuning (i.e., changing the settings of the blueprint), and trying different training durations (i.e., searching for the optimal number of previous award shows to train on). Finally, I select the DataRobot blueprint that performs the best over the five most recent awards shows (i.e., the one that appears on top of the leaderboard).
The above shows the best blueprint for predicting Record of the Year winners (which, for all the curious data scientists out here, has an AUC value of 0.8276). In the past when I’ve done this analysis, I’ve found success in manually creating sentiment-related features around the track lyrics (e.g., what percent of the words in the song are considered profane? How many words are related to feelings of joy? etc.). However, with the advent of Composable ML, I now have access to built-in sentiment analysis tasks in DataRobot to preprocess the lyrics for me. Hence, I can now create a modified version of the blueprint like so:
With this newly created blueprint, I’ve actually increased my AUC value to 0.8621 (+4%), which means I’m performing 72% better than randomly guessing. In light of this, I tried adding the same text preprocessing to the best Song of the Year model; however, it didn’t improve performance. This exemplifies the beauty behind DataRobot’s Composable ML – we have the freedom to experiment and receive quick empirical validation if our adjustments work or not. For Record of the Year, it did work, and for Song of the Year, it didn’t (and that’s okay).
And the tracks to watch out for are…
|Song of the Year 2022|
|Billie Eilish||“Happier Than Ever”||50.82%|
|Olivia Rodrigo||“drivers license”||32.86%|
|Silk Sonic||“Leave The Door Open”||16.44%|
|Record of the Year 2022|
|Olivia Rodrigo||“drivers license”||38.83%|
|Billie Eilish||“Happier Than Ever”||26.00%|
|Silk Sonic||“Leave The Door Open”||24.74%|
This year, the models rank the same three tracks towards the top, with Billie Eilish set to take her second Song of the Year award and Olivia Rodrigo her first Record of the Year nod. However, it wouldn’t be surprising to see the dynamic duo of Bruno Mars and Anderson .Paak (officially named Silk Sonic) to win either award. While she may not have the highest probability for a Record of the Year win, Eilish still has a good shot at it. And, if she does win, she will be the first person to win this award three times in a row.
And there we have it – another year in the books! Will Billie Eilish make history yet again? Tune in on April 3rd to find out. For more information about Composable ML, visit our publicly-available documentation.