Taking a Dip in U.S. Shark Tank Data

December 11, 2018

by

· 6 min read

Shark Tank is a well-known entrepreneurial-themed reality TV show in the United States that has become popular for several reasons: it inspires potential entrepreneurs, teaches lessons about entrepreneurship and it’s entertaining.

For the startup founders presenting on Shark Tank, it is a once-in-a-lifetime opportunity to, hopefully, secure a Shark investment in their startups. Likewise, the investing Sharks, who themselves are already multi-millionaires or billionaires, can help these promising startups and make significant returns on their investments. For example, Scrub Daddy has over $170M in total sales since Lori Greiner’s 2012 investment of $200K for 20% investment.

To better understand the U.S. Shark Tank phenomenon, we combined data from Sharkalytics (which has data up to Season 7 and some interesting Shark Tank visualizations), ABC, Wikipedia, Halle Tecco’s database, and Google. This U.S. Shark Tank dataset is comprised of all 803 startup pitches from 199 episodes in Seasons 1 to 9. For each pitch, we know the date, episode, description and ask, basic startup information, entrepreneur gender, and deal.

In this blog, we are going to predict views, whether or not a deal will be made before pitching in the tank and which shark is likely to win a deal.

Views prediction

Shark Tank started losing TV viewers after Season 6 and there were various ideas to address it. We want to estimate the U.S. viewers (millions) for future episodes and review some factors affecting viewership numbers.

Experimental design:

Data: 803 pitches and 22 features (target feature is viewers)
Metric: R Squared
Regression Models: 13
Partitioning: random, 8-fold cross validation (each fold with more than 100 pitches)

The best model for views prediction is an Average Blender of eXtreme Gradient Boosted Trees Regressor and RuleFit Classifier with a good R Squared of 88%.

Top 10 features include season, episode, day of week, ask equity, sharks, industry, state, deal valuation, ask amount, and ask valuation.

season: In retrospect, Season 6 (2014 to 2015) is the peak with TV episodes that have >7M viewers. Subsequent seasons’ viewership have continued to drop. The availability of Shark Tank on popular online streaming websites such as Hulu and Netflix could be one reason for the decline. It is not clear if the drop in viewership will continue in Season 10 and beyond, so season data may not be an useful feature for views prediction.

episode: During episodes 14 to 20 (seasons 8 and 9 had a total of 24 episodes each), there is usually higher TV viewership. This could be due to the winter months where people can spend more time at home watching TV.

day of week: Fridays (represented by 4) on TV prime time tends to have higher viewership, compared to Tuesdays and Sundays (represented by 1 and 6 respectively).

sharks: Size of Shark names indicate how often they appear together on Shark Tank, and color indicates a correlation to viewership numbers. Red is highest viewership, and blue is lowest viewership.

Here are some interesting insights which can be easily observed using DataRobot:

The highest viewership was from episodes where Herjavec and Cuban appear together, or when there were some guest Sharks in the earlier seasons, such as Nick Woodman, John Paul DeJoria, and Steve Tisch
When O’Leary appears together with Cuban or Corcoran, there is typically higher viewership than Greiner and Corcoran
The lowest viewership was from guest Sharks (e.g. Alex Rodriguez, Rohan Oza, Sara Blakely, and Bethenny Frankel) in the more recent seasons

Deal prediction

It can be nerve-wracking for the startup founders before their investor pitch. We want to do deal prediction to understand: What is the likelihood that a startup will receive a deal before they enter the tank? What are the reasons influencing a deal or no deal?

Experimental design:

Data: 803 pitches and 11 features (target feature is deal)
Metric: AUC (Area Under the ROC Curve is chosen because it can be explained intuitively to business users)
Binary Classification Models: 14
Partitioning: stratified, 8-fold cross validation

The best deal prediction model is an Average Blender of Elastic-Net Classifier and eXtreme Gradient Boosted Trees Classifier with AUC of 60%.

Top 10 features include description, state, ask amount, ask valuation, gender, ask equity, industry, episode, sharks, and season.

description: “Quality” and “design” (see Tipsy Elves [pitch]) are some keywords associated with deals. “Services” is a keyword closely associated with no deals, This makes sense because Shark Tank is more about finding tangible products which can become more successful, whereas services-based startup concepts are harder to present and understand. Using these insights, the entrepreneurs can customize their startup’s description to get a better deal rate.

state: “IL” (see SockTABS [pitch]), “FL”, and “UT” have startups with higher deal probability, compared to “OR” and “MA”.

gender: Female entrepreneurs (see R. Riveter [pitch]) and mixed teams have a slightly higher chance of receiving an investment.

industry: “Media / Entertainment” (see Ten Thirty One Productions [pitch]), “Lifestyle / Home”, “Food and Beverage” are more popular categories; “Automotive”, “Fashion / Beauty”, “Others” (such as “Business Services”) are less so.

Shark prediction

For each Shark who is present at a pitch, they have to decide whether to make an offer on their own or team up with other Sharks. If an offer is made, the startup founder can choose to accept, negotiate, or reject this deal; or choose another Shark’s competing offer. In other words, it can be tough for a Shark to win deals, so we want to do a Shark prediction: Which Shark is likely to win this deal and why?

Experimental design

Data: 11 features (target feature is deal_by_<shark_name>)
Metric: AUC
Binary Classification Models: 14

Shark Name	Attendance	Wins	Prediction Accuracy
Lori Greiner	72%	18%	60% using Average Blender of Elastic-Net and Light Gradient Boosting Classifiers on stratified 5-fold cross validation
Barbara Corcoran	58%	16%	64% using Average Blender of Elastic-Net and TensorFlow Classifiers on stratified 4-fold cross validation
Robert Herjavec	92%	11%	54% using Average Blender of RandomForest and Generalized Additive2 Classifiers on stratified 7-fold cross validation
Kevin O’Leary	96%	8%	55% using Average Blender of Elastic-Net and Light Gradient Boosting Classifiers on stratified 7-fold cross validation
Daymond John	68%	14%	53% using Average Blender of SVM and N-Gram Text Classifiers on stratified 5-fold cross validation
Mark Cuban	89%	18%	54% using Average Blender of Elastic-Net and TensorFlow Classifiers on stratified 7-fold cross validation

We dive deeper into Barbara Corcoran’s model because it has the highest AUC of 64%.

Top 10 features for Corcoran include description, ask valuation/equity/amount, gender, season, industry, state, episode, and sharks.

Corcoran typically offers and wins the investment, when: the ask valuation is less than $2M, ask equity is 10% or less, and ask amount is less than $100K. She tends to support female entrepreneurs, food and beverage startups, and typically has more wins towards the season end.

The above are explanations for top predictions from validation data, where Corcoran is most likely to win and not win. For example, ID 139 refers to Wild Friends Food [pitch] where Keeley & Erika asked for $50K in exchange for 10% equity. Corcoran made the only offer and was eventually the investor. Another example is ID 322 which refers to SynDaver Labs [pitch] where the male founder asked for and accepted an $3M offer from Herjavec. Corcoran was the first Shark to go out.

Conclusion

Using DataRobot, we built many high-quality models automatically on numeric, categorical, and textual data. And we explored insights into the best model for views, deals, and Shark predictions.

There can be future work to use DataRobot on the full pitch transcript (in text) extracted from U.S. Shark Tank video/audio, which captures the actual proceedings (e.g. entrepreneur’s introduction and demonstration, Q&A, and Sharks’ discussions). In addition, this work can probably be extended to “Shark Tanks” from other countries, usually known as Dragons’ Den, such as the Canadian or UK versions.

About the author

Clifton Phua

Customer Facing Data Scientist, DataRobot

Clifton is a Customer Facing Data Scientist (CFDS) at DataRobot working in Singapore and leads the Asia Pacific (APAC)’s CFDS team. His vertical domain expertise is in banking, insurance, government; and his horizontal domain expertise is in cybersecurity, fraud detection, and public safety. Clifton’s PhD and Bachelor’s degrees are from Clayton School of Information Technology, Monash University, Australia. In his free time, Clifton volunteers professional services to events, conferences, and journals. Was also part of teams which won some analytics competitions.

Meet Clifton Phua

Share this post

Subscribe to DataRobot Blog

First Name

Last Name

Email

Country

State

Yes! Please email me news and offers for DataRobot products and services.

DataRobot is committed to protecting your privacy. You can find full details of how we use your information, and directions on opting out from our marketing emails, in our Privacy Policy.