SIGNALS PREDICTING ABOVE AVERAGE
ACQUISITION PRICES OF STARTUPS
MIT Sloan School of Management
Michelle Villagra and Victoria Young
VC Investments Top $48B in 2014
Objective: Find signals
that can predict the
likelihood of startups
being successfully
acquired at an above
average acquisition price
of $43M.
Michelle Villagra and Victoria Young | MIT Sloan
Q: What are predictive signals in startup data?
Michelle Villagra and Victoria Young | MIT Sloan
Annual Venture Capital investments have reached its highest
level in over a decade. What are some signals that can predict
the likelihood of startups being successfully acquired at an above
average acquisition price?
Project Scope | Available Data
Startup Info Time Financial
Name
Acquiring Company
Number of Employees
(at the startup)
Founding date
Funding date
Created Variables:
Years until acquisition
Acquired after 2000
Number of years in business
Acquisition amount
Total funding raised
Short term assets (cash)
Created Variables:
Funding to cash/price
Available Crunchbase data on companies that were acquired before July 25th, 2013.
Michelle Villagra and Victoria Young | MIT Sloan
Hypothesis: Amount of $ Raised Is Key Signal
Michelle Villagra and Victoria Young | MIT Sloan
Since 2007 the average successful US startup raised $41 million and exited at
$242.9 million with a strong correlation between larger exits and companies
that raised more money.
AboveAverageMedianAcquireFactor ~ raised_amount + founded_year +
total_money + acquired_year + YearstoAcquire + Years2Funding +
YearsInBiz + raised_amount + funded_year + AcquiredAfter2000 +
Funding2Cash
Methodology | Approach
Michelle Villagra and Victoria Young | MIT Sloan
Data Extraction: We pulled data from the Enigma Database as .csv files.
Data Organization
(A) Formatting: Cleaned up money amounts that had symbols and letters mixed
in with numbers (i.e. $10M USD to 10,000,000)
(B) Cleaning: Removed N/As and duplicates to avoid skewing results
(C) Integration: Merged 3 separate .csv files
(D) Feature Expansion: Created new variables based on existing data (i.e. “Time
In Business = 2013 - Founding Year)
Analysis | Strategy
Michelle Villagra and Victoria Young | MIT Sloan
● Analysis: Logistic Regression, CART, Random Forest, Clustering
● Baseline: Average median acquisition price since 1996 is $43M*,
71.4% of startups in data have been acquired over this amount.
● Dependent variable: Binary, Above Average Acquisition (>$43M)
● Independent variables: Founding year, amount of funds raised,
total money, acquired year, number of years in business, number of
years from founding to acquisition, number of years until funded,
funded year, number of years acquired after 2000, ratio of funds
raised to acquisition amount.
*WilmerHale VC Report
Challenges | Analysis
Michelle Villagra and Victoria Young | MIT Sloan
Accurate Data: Over 30,000+ observations (combined in 3 separate csv files)
were available but many of those observations had N/As or zeroes, meaning we
could not interpret whether or not that data point was accurate. Also, when
merging based on the start-up name, many were only listed in one csv file so we
lost many observations during the merge process.
Consistent Data: The data was inconsistent across the variables we wanted to
include in our model. For example, we had many N/As. We also ended up having
to remove observations because of formatting issues and many were non US.
For example money was represented in different units leading to a major
consistency issue.
Models | Accuracy
Michelle Villagra and Victoria Young | MIT Sloan
Baseline Data
Log .714 .728
CART .714 .877
Random Forest .714 .755
Clusters N/A N/A
We ran a logistic regression, CART, Random Forest, and clustering in order to look for
the best model for identifying predictive signals. Because our number of observations
became very limited after we merged different data sets, the accuracy for our random
forest was lower than expected. Ultimately our CART performed best.
Models | Results
Michelle Villagra and Victoria Young | MIT Sloan
Based on our dataset, the CART model was best in overall accuracy. The tree produced
by the CART model lets us prioritize the factors that are most important as predictive
signals of a startup’s success, setting a benchmark of $12M in funding raised as being
predictive of an above average acquisition amount with an 87.7% accuracy.
Models | Results
Michelle Villagra and Victoria Young | MIT Sloan
Cluster 1: Relatively young companies,
funded around 2009 and raised ~$5M
and acquired within 8 years of founding.
Cluster 2: Younger companies, funded
around 2009 and raised ~$7M and
acquired within 7 years of founding.
Cluster 3: Older companies, funded
around 2008 and raised ~$13.5M and
acquired within 12 years of founding and
with $63M in short term assets.
Data | Visualizations
Michelle Villagra and Victoria Young | MIT Sloan
Data | Visualizations
Michelle Villagra and Victoria Young | MIT Sloan
Data | Visualizations
Michelle Villagra and Victoria Young | MIT Sloan
Moving Forward | Next Steps
Michelle Villagra and Victoria Young | MIT Sloan
Model Testing & Optimization: Now that we have reached some baseline
metrics and significance in accuracy, we need to continue testing the
model to optimize it and incorporate newly available variables over time
by getting access to more valid observations as well as incorporate new
variables into our models to test for significance.
Analysis To Action: In order to make any analysis actionable, we would
need to conduct additional research by expanding the amount of
observations, adding additional variables to test for significance including
revenue, number of customers, App Annie download data, key investors,
year over year growth, etc.
THANK YOU!
MIT Sloan School of Management
Michelle Villagra and Victoria Young

Crunchbase Signals Predicting Above Average Acquisition Price of Startups

  • 1.
    SIGNALS PREDICTING ABOVEAVERAGE ACQUISITION PRICES OF STARTUPS MIT Sloan School of Management Michelle Villagra and Victoria Young
  • 2.
    VC Investments Top$48B in 2014 Objective: Find signals that can predict the likelihood of startups being successfully acquired at an above average acquisition price of $43M. Michelle Villagra and Victoria Young | MIT Sloan
  • 3.
    Q: What arepredictive signals in startup data? Michelle Villagra and Victoria Young | MIT Sloan Annual Venture Capital investments have reached its highest level in over a decade. What are some signals that can predict the likelihood of startups being successfully acquired at an above average acquisition price?
  • 4.
    Project Scope |Available Data Startup Info Time Financial Name Acquiring Company Number of Employees (at the startup) Founding date Funding date Created Variables: Years until acquisition Acquired after 2000 Number of years in business Acquisition amount Total funding raised Short term assets (cash) Created Variables: Funding to cash/price Available Crunchbase data on companies that were acquired before July 25th, 2013. Michelle Villagra and Victoria Young | MIT Sloan
  • 5.
    Hypothesis: Amount of$ Raised Is Key Signal Michelle Villagra and Victoria Young | MIT Sloan Since 2007 the average successful US startup raised $41 million and exited at $242.9 million with a strong correlation between larger exits and companies that raised more money. AboveAverageMedianAcquireFactor ~ raised_amount + founded_year + total_money + acquired_year + YearstoAcquire + Years2Funding + YearsInBiz + raised_amount + funded_year + AcquiredAfter2000 + Funding2Cash
  • 6.
    Methodology | Approach MichelleVillagra and Victoria Young | MIT Sloan Data Extraction: We pulled data from the Enigma Database as .csv files. Data Organization (A) Formatting: Cleaned up money amounts that had symbols and letters mixed in with numbers (i.e. $10M USD to 10,000,000) (B) Cleaning: Removed N/As and duplicates to avoid skewing results (C) Integration: Merged 3 separate .csv files (D) Feature Expansion: Created new variables based on existing data (i.e. “Time In Business = 2013 - Founding Year)
  • 7.
    Analysis | Strategy MichelleVillagra and Victoria Young | MIT Sloan ● Analysis: Logistic Regression, CART, Random Forest, Clustering ● Baseline: Average median acquisition price since 1996 is $43M*, 71.4% of startups in data have been acquired over this amount. ● Dependent variable: Binary, Above Average Acquisition (>$43M) ● Independent variables: Founding year, amount of funds raised, total money, acquired year, number of years in business, number of years from founding to acquisition, number of years until funded, funded year, number of years acquired after 2000, ratio of funds raised to acquisition amount. *WilmerHale VC Report
  • 8.
    Challenges | Analysis MichelleVillagra and Victoria Young | MIT Sloan Accurate Data: Over 30,000+ observations (combined in 3 separate csv files) were available but many of those observations had N/As or zeroes, meaning we could not interpret whether or not that data point was accurate. Also, when merging based on the start-up name, many were only listed in one csv file so we lost many observations during the merge process. Consistent Data: The data was inconsistent across the variables we wanted to include in our model. For example, we had many N/As. We also ended up having to remove observations because of formatting issues and many were non US. For example money was represented in different units leading to a major consistency issue.
  • 9.
    Models | Accuracy MichelleVillagra and Victoria Young | MIT Sloan Baseline Data Log .714 .728 CART .714 .877 Random Forest .714 .755 Clusters N/A N/A We ran a logistic regression, CART, Random Forest, and clustering in order to look for the best model for identifying predictive signals. Because our number of observations became very limited after we merged different data sets, the accuracy for our random forest was lower than expected. Ultimately our CART performed best.
  • 10.
    Models | Results MichelleVillagra and Victoria Young | MIT Sloan Based on our dataset, the CART model was best in overall accuracy. The tree produced by the CART model lets us prioritize the factors that are most important as predictive signals of a startup’s success, setting a benchmark of $12M in funding raised as being predictive of an above average acquisition amount with an 87.7% accuracy.
  • 11.
    Models | Results MichelleVillagra and Victoria Young | MIT Sloan Cluster 1: Relatively young companies, funded around 2009 and raised ~$5M and acquired within 8 years of founding. Cluster 2: Younger companies, funded around 2009 and raised ~$7M and acquired within 7 years of founding. Cluster 3: Older companies, funded around 2008 and raised ~$13.5M and acquired within 12 years of founding and with $63M in short term assets.
  • 12.
    Data | Visualizations MichelleVillagra and Victoria Young | MIT Sloan
  • 13.
    Data | Visualizations MichelleVillagra and Victoria Young | MIT Sloan
  • 14.
    Data | Visualizations MichelleVillagra and Victoria Young | MIT Sloan
  • 15.
    Moving Forward |Next Steps Michelle Villagra and Victoria Young | MIT Sloan Model Testing & Optimization: Now that we have reached some baseline metrics and significance in accuracy, we need to continue testing the model to optimize it and incorporate newly available variables over time by getting access to more valid observations as well as incorporate new variables into our models to test for significance. Analysis To Action: In order to make any analysis actionable, we would need to conduct additional research by expanding the amount of observations, adding additional variables to test for significance including revenue, number of customers, App Annie download data, key investors, year over year growth, etc.
  • 16.
    THANK YOU! MIT SloanSchool of Management Michelle Villagra and Victoria Young