Taking a Dip in U.S. Shark Tank Data
Shark Tank is a well-known entrepreneurial-themed reality TV show in the United States that has become popular for several reasons: it inspires potential entrepreneurs, teaches lessons about entrepreneurship and it’s entertaining.
For the startup founders presenting on Shark Tank, it is a once-in-a-lifetime opportunity to, hopefully, secure a Shark investment in their startups. Likewise, the investing Sharks, who themselves are already multi-millionaires or billionaires, can help these promising startups and make significant returns on their investments. For example, Scrub Daddy has over $170M in total sales since Lori Greiner’s 2012 investment of $200K for 20% investment.
To better understand the U.S. Shark Tank phenomenon, we combined data from Sharkalytics (which has data up to Season 7 and some interesting Shark Tank visualizations), ABC, Wikipedia, Halle Tecco’s database, and Google. This U.S. Shark Tank dataset is comprised of all 803 startup pitches from 199 episodes in Seasons 1 to 9. For each pitch, we know the date, episode, description and ask, basic startup information, entrepreneur gender, and deal.
In this blog, we are going to predict views, whether or not a deal will be made before pitching in the tank and which shark is likely to win a deal.
Shark Tank started losing TV viewers after Season 6 and there were various ideas to address it. We want to estimate the U.S. viewers (millions) for future episodes and review some factors affecting viewership numbers.
- Data: 803 pitches and 22 features (target feature is viewers)
- Metric: R Squared
- Regression Models: 13
- Partitioning: random, 8-fold cross validation (each fold with more than 100 pitches)
The best model for views prediction is an Average Blender of eXtreme Gradient Boosted Trees Regressor and RuleFit Classifier with a good R Squared of 88%.
Top 10 features include season, episode, day of week, ask equity, sharks, industry, state, deal valuation, ask amount, and ask valuation.
season: In retrospect, Season 6 (2014 to 2015) is the peak with TV episodes that have >7M viewers. Subsequent seasons’ viewership have continued to drop. The availability of Shark Tank on popular online streaming websites such as Hulu and Netflix could be one reason for the decline. It is not clear if the drop in viewership will continue in Season 10 and beyond, so season data may not be an useful feature for views prediction.
episode: During episodes 14 to 20 (seasons 8 and 9 had a total of 24 episodes each), there is usually higher TV viewership. This could be due to the winter months where people can spend more time at home watching TV.
day of week: Fridays (represented by 4) on TV prime time tends to have higher viewership, compared to Tuesdays and Sundays (represented by 1 and 6 respectively).
sharks: Size of Shark names indicate how often they appear together on Shark Tank, and color indicates a correlation to viewership numbers. Red is highest viewership, and blue is lowest viewership.
Here are some interesting insights which can be easily observed using DataRobot:
- The highest viewership was from episodes where Herjavec and Cuban appear together, or when there were some guest Sharks in the earlier seasons, such as Nick Woodman, John Paul DeJoria, and Steve Tisch
- When O’Leary appears together with Cuban or Corcoran, there is typically higher viewership than Greiner and Corcoran
- The lowest viewership was from guest Sharks (e.g. Alex Rodriguez, Rohan Oza, Sara Blakely, and Bethenny Frankel) in the more recent seasons
It can be nerve-wracking for the startup founders before their investor pitch. We want to do deal prediction to understand: What is the likelihood that a startup will receive a deal before they enter the tank? What are the reasons influencing a deal or no deal?
- Data: 803 pitches and 11 features (target feature is deal)
- Metric: AUC (Area Under the ROC Curve is chosen because it can be explained intuitively to business users)
- Binary Classification Models: 14
- Partitioning: stratified, 8-fold cross validation
The best deal prediction model is an Average Blender of Elastic-Net Classifier and eXtreme Gradient Boosted Trees Classifier with AUC of 60%.
Top 10 features include description, state, ask amount, ask valuation, gender, ask equity, industry, episode, sharks, and season.
description: “Quality” and “design” (see Tipsy Elves [pitch]) are some keywords associated with deals. “Services” is a keyword closely associated with no deals, This makes sense because Shark Tank is more about finding tangible products which can become more successful, whereas services-based startup concepts are harder to present and understand. Using these insights, the entrepreneurs can customize their startup’s description to get a better deal rate.
state: “IL” (see SockTABS [pitch]), “FL”, and “UT” have startups with higher deal probability, compared to “OR” and “MA”.
gender: Female entrepreneurs (see R. Riveter [pitch]) and mixed teams have a slightly higher chance of receiving an investment.
industry: “Media / Entertainment” (see Ten Thirty One Productions [pitch]), “Lifestyle / Home”, “Food and Beverage” are more popular categories; “Automotive”, “Fashion / Beauty”, “Others” (such as “Business Services”) are less so.
For each Shark who is present at a pitch, they have to decide whether to make an offer on their own or team up with other Sharks. If an offer is made, the startup founder can choose to accept, negotiate, or reject this deal; or choose another Shark’s competing offer. In other words, it can be tough for a Shark to win deals, so we want to do a Shark prediction: Which Shark is likely to win this deal and why?
- Data: 11 features (target feature is deal_by_<shark_name>)
- Metric: AUC
- Binary Classification Models: 14
|Shark Name||Attendance||Wins||Prediction Accuracy|
|Lori Greiner||72%||18%||60% using Average Blender of Elastic-Net and Light Gradient Boosting Classifiers on stratified 5-fold cross validation|
|Barbara Corcoran||58%||16%||64% using Average Blender of Elastic-Net and TensorFlow Classifiers on stratified 4-fold cross validation|
|Robert Herjavec||92%||11%||54% using Average Blender of RandomForest and Generalized Additive2 Classifiers on stratified 7-fold cross validation|
|Kevin O’Leary||96%||8%||55% using Average Blender of Elastic-Net and Light Gradient Boosting Classifiers on stratified 7-fold cross validation|
|Daymond John||68%||14%||53% using Average Blender of SVM and N-Gram Text Classifiers on stratified 5-fold cross validation|
|Mark Cuban||89%||18%||54% using Average Blender of Elastic-Net and TensorFlow Classifiers on stratified 7-fold cross validation|
We dive deeper into Barbara Corcoran’s model because it has the highest AUC of 64%.
Top 10 features for Corcoran include description, ask valuation/equity/amount, gender, season, industry, state, episode, and sharks.
Corcoran typically offers and wins the investment, when: the ask valuation is less than $2M, ask equity is 10% or less, and ask amount is less than $100K. She tends to support female entrepreneurs, food and beverage startups, and typically has more wins towards the season end.
The above are explanations for top predictions from validation data, where Corcoran is most likely to win and not win. For example, ID 139 refers to Wild Friends Food [pitch] where Keeley & Erika asked for $50K in exchange for 10% equity. Corcoran made the only offer and was eventually the investor. Another example is ID 322 which refers to SynDaver Labs [pitch] where the male founder asked for and accepted an $3M offer from Herjavec. Corcoran was the first Shark to go out.
Using DataRobot, we built many high-quality models automatically on numeric, categorical, and textual data. And we explored insights into the best model for views, deals, and Shark predictions.
There can be future work to use DataRobot on the full pitch transcript (in text) extracted from U.S. Shark Tank video/audio, which captures the actual proceedings (e.g. entrepreneur’s introduction and demonstration, Q&A, and Sharks’ discussions). In addition, this work can probably be extended to “Shark Tanks” from other countries, usually known as Dragons’ Den, such as the Canadian or UK versions.
Clifton is a Customer Facing Data Scientist (CFDS) at DataRobot working in Singapore and leads the Asia Pacific (APAC)’s CFDS team. His vertical domain expertise is in banking, insurance, government; and his horizontal domain expertise is in cybersecurity, fraud detection, and public safety. Clifton’s PhD and Bachelor’s degrees are from Clayton School of Information Technology, Monash University, Australia. In his free time, Clifton volunteers professional services to events, conferences, and journals. Was also part of teams which won some analytics competitions.
We will contact you shortly
We’re almost there! These are the next steps:
- Look out for an email from DataRobot with a subject line: Your Subscription Confirmation.
- Click the confirmation link to approve your consent.
- Done! You have now opted to receive communications about DataRobot’s products and services.
Didn’t receive the email? Please make sure to check your spam or junk folders.
Belong @ DataRobot: AAPI Heritage Month with the ACTnow! CommunityMay 25, 2023· 3 min read
Deep Learning for Decision-Making Under UncertaintyMay 18, 2023· 5 min read
Getting Value Out of Generative AIMay 10, 2023· 3 min read
DataRobot ACTnow! (Asians Coming Together Now) community honors their legacy and celebrates Asian-American and Pacific Islander (AAPI) Heritage Month.
Discover DataRobot's commitment to DEIB during Celebrate Diversity Month. Learn how our Belong Communities and 2023 resolutions foster a culture of belonging.
Women @ DR seeks to create, promote and expand an inclusive culture that connects, educates and advances the needs and aspirations of our community. Read more about Women's History Month celebration.