SlideShare a Scribd company logo
 
Exploratory Data Analysis for 
Machine Learning on: 
Worldwide Startup 
Companies  
 
Fitrie Ratnasari 
28th November, 2020 
 
 
 
   
 
 
 
Dataset Description & Initial Planning 
 
 
The dataset will be used from Crunchbase in csv format, of which the data are worldwide start up 
companies recorded from 1902 until 2014. It is consisted of 54,294 entries (rows) with 38 
attributes of various data types between object and float, as picture shown below: 
 
1 
 
 
As for an objective of this project we would like to crack the hidden insight along the years from 
1904 until 2014 from crunching the numbers of the dataset, not to mention from the company has 
been founded, their status, corporate actions such as Merger & Acquisition (M&A) or becoming 
Public Companies, and their funding status whether have received Seed, Angel fund or Venture 
Capital investment.  
For further analysis we would like to know the successful probability for companies by 
considering their status, market, and funding have been received as a basis by conducting 
correlation amongst attributes.  
In various studies had been taken, successful start up commonly defined as two-way strategy 
that makes a large amount of money to its founders, investors and first employees, as a company 
can either have an IPO (Initial Public Offering) by going to a public stock market (i.e. Facebook 
going public, allowing everyone to invest in the company by buying shares being sold by its 
insiders in the U.S stock market) or, be acquired by or merged (M&A) with another company (i.e. 
Microsoft acquiring LinkedIn for $26B) where those who have previously invested receive 
immediate cash in return for their shares. This process is often denominated as an exit strategy 
(Guo, Lou, & Pérez-Castrillo, 2015). This project will therefore consider​ both an IPO (Initial Public 
Offering) and a process of M&A (Mergers & Acquisitions) as the critical events that classify a 
start-up as successful. 
Initial plans before doing further data exploration are seeing the data type thoroughly regarding 
its datatype and all data fulfillment given whether they are appropriate, by then we can know 
subsequently what kind of data cleansing should be taken. 
 
2 
 
 
Data Cleansing & Feature Engineering 
After acquiring the dataset we found that there are numerous tasks for data cleansing should be 
taken before doing any further analysis, since the dataset is quite messy with formatting, labelling 
header, quite a lot involving missing values, and the dataset are also dispersed to introduce 
outliers. 
So that in this project, the ​data wrenching ​which have been taken are : 
1. Fixing spacing format in header such as ‘ market ‘ and ‘ funding_total_usd ‘  
2. Remove 4855 row duplicates 
3. Tackling uncommon format. 
Attribute ‘funding_total_usd’ involved uncommon string format with wrongly used comma 
as separate number, then we eliminate the comma and change the data type into 
numeric. 
4. Handling missing values. 
Change the missing values such as ‘funding_total_usd’, from NaN value with 0.  
5. Detecting and handling outliers. 
When plotting into distribution, outliers really matter to generate uninterpretable 
visualization. For this we remove the outlier by using interquartile range. Should be noted 
that this step only be used for Exploratory Data Analysis only, not to be used in Machine 
Learning (in ML we’ll be transforming the data whether using regression, polynomial 
regression or log instead) 
 
Feature Engineering​ also brings advantages such as handling object data-type into numeric by 
One-Hot-Encoding and can also be used for transforming the attributes which have an outlier 
(considering removing them altogether can also reduce our training accuracy later in the Machine 
Learning process). Hence in this dataset we use : 
1. One Hot Encoding for attribute ‘status’. 
2. Creating new variables of ‘get_seed_funding’, ‘get_angel_fund’, and ‘get_venture’, and 
most importantly ‘successfull_code’. 1 for ‘Yes’, and 0 for ‘No’ for all cases mentioned.  
3. Change attributes ‘founded_at’ to be ‘founded_year’, since inconsistent data between 
year in ‘founded_at’ and year in ‘founded_month’ is found, so that we extract the year in 
‘founded_month’ to be new attribute ‘founded_year’, and subsequently drop the 
‘founded_at’ column.  
3 
 
 
Key Finding and Insights 
Start ups are supposed to be known for their innovation from the gap of problem and solving, 
and also known for companies of growth seeking-business, so that the nature of business itself 
requires heavy funds and it is common to look for capital from a variety of sources such as angel 
investor and venture capitals. 
In this section, there are 3 sub-section: start up, market, and funding.  
 
A. START UP 
 
1. ​Top 5 Country in terms of Start Up Quantity: 
 
We can say that the USA has dominating start up quantity across the globe, more than 50% 
from whole startups worldwide. It is undoubtedly true, since the US has an immense support 
ecosystem for startups to grow from ideation to scale up the business. Following England 
with 2,642 start up companies, Canada 1,405 companies, China 1,239 companies and 
Germany with 968 companies respectively in 2014. 
4 
 
 
2. Start Up Status  
 
Since 1902 until 2014 from 49,437 start up companies recorded, 5.4% of them are closed, 
86.9% operating and 7,7% acquired can be called as one of terms for successful start up 
for their exit strategy. 
 
3. Start Up Founded Year Distribution 
 
From the figure above, mid of 1995 is the commencement of growing startups worldwide, 
where recorded around 437 companies and almost doubled in following year by having 
731 start up companies in 2001.  
5 
 
 
The history also took place as ‘Bubble DotCom’, where the technology-companies 
attracted the market to be over-valuation. In 1999, the height of the dotcom craze, there 
were 457 IPOs. Most were Internet and technology stocks. Of those, 117 doubled in price 
on the first day of trading. Tech and dotcom IPOs were minting new millionaires every 
day, both at the management level and retail investor level. But then the sell-off started on 
March 11, 2000. Investors suddenly realized that a tech and/or Internet company with a 
billion-dollar valuation that has no revenue or earnings is saddled with debt and has no 
future.  
 
4. Those Who Survived from Dotcom Bubble and Become Tech Titans  
 
After Dotcom Bubble, companies with strong business revenues have survived, namely 
Amazon, Netflix, eBay, Google, Alibaba. Some of them are now still tech-leading 
companies. As we know FANG+ companies (FACEBOOK, AMAZON, NETFLIX, GOOGLE, 
ALIBABA) are the take titans who outperformed the wider market since the coronavirus 
(COVID-19) pandemic spurred record sell-offs in March. Unlike other stocks which met 
their dip price in this time, FANG+ companies are hype up even until 80% take up rate 
compared to their lowest in early March due to its performance and forward looking 
valuation. 
 
 
6 
 
 
B. MARKET 
 
1. Top 15 StartUp Market Worldwide 
 
 
 
 
 
It is obvious that the growing number of startups would touch almost all sectors related to 
people as the market. The most common category from all startups worldwide are software 
at the highest place, followed by Biotechnology, Mobile, E-Commerce, Curated Web, 
Enterprise Software, Health Care, Clean Technology, Games and Embedded system of 
hardware & software.  
Whereas Figure below shows that e-Commerce is the most favorable category amongs start 
up companies in China, Indonesia and India, which is slightly different from States. 
7 
 
 
2. Most Favorable Category of Startup Product 
 
For almost 3 decades until 2014, Social Media, Curated Web, Mobile can be seen as the 
most favorable start up product from all over the world. The least favourable, the smaller 
the picture of words would be plotted. 
   
8 
 
 
C. FUNDING 
 
1. Total Funding Distribution  
 
Total Funding can be defined as total or sum from whole funding obtained, from seed, 
grant, angel investor, and venture capitals in all round. From the picture above we can 
see that the dispersion of total funding across startups is very high. So that we take out 
the outlier to understand better, as can be seen below. The data tells that most of total 
funding are below USD 2.5 Million, and top 10% start up companies received 78% from all 
total funding across the globe. 
 
 
 
 
9 
 
 
2. Total Funding in Various Unicorn in 2014 
Unicorns in 2014 are not as many as today, but there are few of them who are still 
becoming the tech titans today. Facebook as one of the unicorns in 2014 successfully 
obtained the highest funding compared to Alibaba, Twitter, Cloudera and Uber with the 
amount almost USD 2.5 Billion.
 
 
3. Seed Funding, vs Angel Funding vs. Venture Investment 
Seed money, sometimes being called seed funding or seed capital, is a form of securities 
offering in which an investor invests capital in a startup company in exchange for an 
equity stake or convertible note stake in the company. The term seed suggests that this is 
a very early investment, meant to support the business until it can generate cash of its 
own (see cash flow), or until it is ready for further investments. Seed money options 
include friends and family funding, seed venture capital funds, angel funding, and 
crowdfunding. 
The difference between seed funding and angel investment used in this dataset is seed 
funding coming from seed venture capital institution funds. Whereas angel investment 
coming from informal or private investors or being called as angel investors who 
deliberately invest based on their personal preference.  
While venture capital is a form of private equity and a type of financing that investors 
provide to startup companies and small businesses that are believed to have long-term 
10 
 
 
growth potential. Venture capital generally comes from well-off investors, investment 
banks and any other financial institutions. However, it does not always take a monetary 
form; it can also be provided in the form of technical or managerial expertise. In the 
dataset column ‘venture’ are the total investment amount from round A, round B, round C, 
round D, until round H.  
The question is, how many startups are getting seed, angel investment and venture 
investment? We can see the difference in three (3) pie chart below.  
 
 
 
 
11 
 
 
It is an obvious fact that the most difficult source of funding for start up is to get angel 
investment, as the number of angel investments is very few and requires a strong 
networking to get access to them. Business incubator can be the hub between angel 
investors and start up companies.  
Meanwhile, the startup percentage of getting seed funding is around 28%. From investor 
glasses, giving seed funding can be both advantage and disadvantage. The drawbacks is 
the risk would be higher than investors who inject the monetary during a VC round, since 
the real market and numbers of revenue are not there. On the other hand, when investors 
choose the right start up in the seed stage they will be having higher return as they do not 
need to inject more monetary funds to take a part in the shareholders list like VC rounds 
do. High risk, high return. 
Last but not least, if there are 100 startup companies, statistics show that 47 of them are 
backed by Venture Capital institutions. Even though this might seem easy for a startup to 
get VC investment, it should be taken into consideration that the startup must be ready 
with all due diligence processes. If VC is interested in the proposal, the firm or the 
investor must then perform due diligence, which includes a thorough investigation of the 
company's business model, products, management, and operating history, among other 
things. 
Once due diligence has been completed, the firm or the investor will pledge an 
investment of capital in exchange for equity in the company. These funds may be 
provided all at once, but more typically the capital is provided in rounds. The firm or 
investor then takes an active role in the funded company, advising and monitoring its 
progress before releasing additional funds.  
The investor exits the company after a period of time, typically four to six years after the 
initial investment, by initiating a merger, acquisition or initial public offering (IPO), of which 
we label them as successful startup companies later on in this project. 
 
 
12 
 
 
 
4. Do the Successful Unicorns require seed funding? 
It is a fascinating fact that FANG+ companies (Facebook, Amazon, Netflix, Google, 
Alibaba) as the tech titans-companies today which took majority market caps in Wall 
Street were not required seed funding back then, right after they founded the product.  
They are using bootstrapping to fund themselves. On the other hand, we can see 
Dropbox and Uber (also becoming unicorn today) received seed funding back then for 
USD 200,000 and below.   
 
 
13 
 
 
5. How much does a monetary fund differ from seed funding, angel investor, and each 
round in VC investment? 
 
 
 
 
 
 
 
 
 
 
From the dataset we can conclude that the seed funding average took the lowest fund 
amongst other funding rounds and sources, around USD 776,350
While angel investment average is around USD 1 Million. 
Whereas Venture Capital rounds average are: 
Average of Round A : USD 6.9 Million 
Average of Round B : USD 13.5 Million 
Average of Round C : USD 21 Million 
Average of Round D : USD 28 Million 
Average of Round E : USD 32 Million 
Average of Round F : USD 48 Million 
Average of Round G : USD 83 Million 
Average of Round H : USD 175 Million 
In 2014, only 4 companies are getting round H investment, 3 of them are e-Commerce: 
Flipkart (India), Deem (USA), Locondo (Japan).  
 
The other one company categorised in game, which has headquarter in Singapore, 
named Gumi.  
14 
 
 
Hypothesis & Statistical Significance Testing 
After seeing the data, the hypothesis arises are: 
1. Startups in the USA have strong relations for being successful startups or have linear 
relationships, since the ecosystem is set up greatly. 
2. Startups who get venture investment supposed to be successful startups. 
3. Founded year becomes one of the predictors for a startup to succeed. 
Statistical Significance Testing 
 
Method would be used to conduct statistical significance to prove the hypothesis is through 
calculating Pearson Correlation and p-value between attributes.  
 
Correlation is a measure of the extent of interdependence between variables. 
Causation is the relationship between cause and effect between two variables. 
It is important to know the difference between these two and that correlation does not imply 
causation. Determining correlation is much simpler than determining causation as causation may 
require independent experimentation. 
 
Pearson Correlation 
The Pearson Correlation measures the linear dependence between two variables X & Y. The 
resulting coefficient is a value between -1 and 1 inclusive, where: 
● 1: Total positive linear correlation. 
● 0: No linear correlation, the two variables most likely do not affect each other. 
● -1: Total negative linear correlation. 
 
P-value 
P-value is the probability value that the correlation between these two variables is statistically 
significant. Normally, we choose a significance level of 0.05, which means that we are 95% 
confident that the correlation between the variables is significant. By convention when: 
p-value is < 0.001: we say there is strong evidence that the correlation is significant. p-value is < 
0.05: there is moderate evidence that the correlation is significant.  
p-value is < 0.1: there is weak evidence that the correlation is significant.  
p-value is > 0.1: there is no evidence that the correlation is significant. 
 
15 
 
 
New ‘target’ variable is made to define successful startup or not. As mentioned in the first section, 
successful startups are those who have Merger & Acquisition record or have become public 
companies.  
Result below shows that correlation between USA startup and successful startup is extremely 
weak linear relationship, since the p-value is < 0.001, the correlation between country code USA 
and become successful startup is statistically significant, although the linear relationship is weak 
(~0.09), means that even if you are from USA and build a startup it does not mean it would be 
successful later on. 
 
Same idea with relationship between getting venture investment and becoming successful 
startup, since the p-value is < 0.001, the correlation between getting venture and become 
successful startup is statistically significant, although the linear relationship is weak (~0.12), means 
getting venture does only have a slight effect on becoming successful startup. 
 
Hence we can conclude that Hypothesis 1 and 2 are not correctly proved. 
 
 
 
 
 
 
 
 
16 
 
 
Suggestion & Summary 
There are numerous suggestions to take the analysis going further: 
1. Using the most updated dataset in Q3 2020, so that we can see the start up dynamics on 
what happened during Corona Virus, this can be a basis for the Government to mitigate 
the failure of promising startup companies. 
2. Predicting whether from startup companies are having high probability to succeed 
(defined by Merger & Acquisition or going IPO) by using Machine Learning on 
Classification Model. 
 
Summary 
The dataset is about startup, corporate actions and investment obtained, sourced from 
Crunchbase in csv format, of which the data are worldwide start up companies recorded from 
1902 until 2014. It consists of 54,294 entries (rows) with 38 attributes of various data types 
between objects and floats with quite messy formatting. Due to this limitation, several steps in 
data cleansing are highly required before conducting further analysis.  
For further research, would be great if data can also involve variables which have correlation to 
target, such as founder experience year, number of patents/copyright/ any goodwill of the 
company, grit score of the founders, customer count, etc, by that we can see which ones of them 
could boost up number of successful startups. Subsequently could enhance prediction in data 
training for Machine Learning.  
 
17 
 
 
Source Code Archive 
https://github.com/fitrieratna/notebook/blob/master/EDA_START_UP_v_finale.ipynb 
 
References 
1. IBM Machine Learning Foundation, Exploratory Data Analysis, 2020 
2. IBM Data Science For Professional: Machine Learning, 2020 
18 
 
 
 
19 

More Related Content

Similar to Exploratory Data Analysis of Worldwide Startup Companies using Python

Seven Forces Reshaping Enterprise Software
Seven Forces Reshaping Enterprise SoftwareSeven Forces Reshaping Enterprise Software
Seven Forces Reshaping Enterprise Software
Boston Consulting Group
 
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
Arno Kapteyn
 
A data-driven look at financing trends, investors, and hot markets within the...
A data-driven look at financing trends, investors, and hot markets within the...A data-driven look at financing trends, investors, and hot markets within the...
A data-driven look at financing trends, investors, and hot markets within the...
Netreba
 
Private Equity: Powering Alpha Via AI, Analytics & Automation
Private Equity: Powering Alpha Via AI, Analytics & AutomationPrivate Equity: Powering Alpha Via AI, Analytics & Automation
Private Equity: Powering Alpha Via AI, Analytics & Automation
Cognizant
 
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingThe Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
Cognizant
 
IRJET- Data Analysis of Startups Investments and Funding Trends in India
IRJET- Data Analysis of Startups Investments and Funding Trends in IndiaIRJET- Data Analysis of Startups Investments and Funding Trends in India
IRJET- Data Analysis of Startups Investments and Funding Trends in India
IRJET Journal
 
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
Acquaint Softtech Private Limited
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
Pentaho
 
Presentation To Seda Technology Programme
Presentation To Seda Technology ProgrammePresentation To Seda Technology Programme
Presentation To Seda Technology Programme
Elton050505
 
DealMarket Digest Issue 131 - 7 March 2014
DealMarket Digest Issue 131 - 7 March 2014DealMarket Digest Issue 131 - 7 March 2014
DealMarket Digest Issue 131 - 7 March 2014
Urs Haeusler
 
2014 - India Startup Landscape
2014 - India Startup Landscape2014 - India Startup Landscape
2014 - India Startup Landscape
P J
 
10 myths of iiot
10 myths of iiot10 myths of iiot
10 myths of iiot
SuryanarayanaTata
 
Data - the new oil #data #oil #technology #future
Data - the new oil #data #oil #technology #futureData - the new oil #data #oil #technology #future
Data - the new oil #data #oil #technology #future
Benjamin Rohé
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big Data
Idiro Analytics
 
DealMarket DIGEST Issue 167 // 19 December 2014
DealMarket DIGEST Issue 167 // 19 December 2014DealMarket DIGEST Issue 167 // 19 December 2014
DealMarket DIGEST Issue 167 // 19 December 2014
CAR FOR YOU
 
State of SMB Software Q2 2018
State of SMB Software Q2 2018State of SMB Software Q2 2018
State of SMB Software Q2 2018
Mark MacLeod
 
2020 Software Company Benchmark Report - 132 Companies
2020 Software Company Benchmark Report - 132 Companies2020 Software Company Benchmark Report - 132 Companies
2020 Software Company Benchmark Report - 132 Companies
Kelly Thomas
 
eTailing India Launches Big Data Report - 2015
eTailing India Launches Big Data Report - 2015 eTailing India Launches Big Data Report - 2015
eTailing India Launches Big Data Report - 2015
eTailing India
 
Design2Disrupt: Mark Plakias (Orange Institute) - Unicorns, Startups & Giants
Design2Disrupt: Mark Plakias (Orange Institute) -   Unicorns, Startups & GiantsDesign2Disrupt: Mark Plakias (Orange Institute) -   Unicorns, Startups & Giants
Design2Disrupt: Mark Plakias (Orange Institute) - Unicorns, Startups & Giants
VINTlabs | The Sogeti Trendlab
 

Similar to Exploratory Data Analysis of Worldwide Startup Companies using Python (20)

Seven Forces Reshaping Enterprise Software
Seven Forces Reshaping Enterprise SoftwareSeven Forces Reshaping Enterprise Software
Seven Forces Reshaping Enterprise Software
 
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...
 
A data-driven look at financing trends, investors, and hot markets within the...
A data-driven look at financing trends, investors, and hot markets within the...A data-driven look at financing trends, investors, and hot markets within the...
A data-driven look at financing trends, investors, and hot markets within the...
 
Group Project_WCMDMTJT
Group Project_WCMDMTJTGroup Project_WCMDMTJT
Group Project_WCMDMTJT
 
Private Equity: Powering Alpha Via AI, Analytics & Automation
Private Equity: Powering Alpha Via AI, Analytics & AutomationPrivate Equity: Powering Alpha Via AI, Analytics & Automation
Private Equity: Powering Alpha Via AI, Analytics & Automation
 
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingThe Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-Making
 
IRJET- Data Analysis of Startups Investments and Funding Trends in India
IRJET- Data Analysis of Startups Investments and Funding Trends in IndiaIRJET- Data Analysis of Startups Investments and Funding Trends in India
IRJET- Data Analysis of Startups Investments and Funding Trends in India
 
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
IT Outsourcing 2024 Trends and Impacts to watch ou c0708891701d4a2f806742338e...
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Presentation To Seda Technology Programme
Presentation To Seda Technology ProgrammePresentation To Seda Technology Programme
Presentation To Seda Technology Programme
 
DealMarket Digest Issue 131 - 7 March 2014
DealMarket Digest Issue 131 - 7 March 2014DealMarket Digest Issue 131 - 7 March 2014
DealMarket Digest Issue 131 - 7 March 2014
 
2014 - India Startup Landscape
2014 - India Startup Landscape2014 - India Startup Landscape
2014 - India Startup Landscape
 
10 myths of iiot
10 myths of iiot10 myths of iiot
10 myths of iiot
 
Data - the new oil #data #oil #technology #future
Data - the new oil #data #oil #technology #futureData - the new oil #data #oil #technology #future
Data - the new oil #data #oil #technology #future
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big Data
 
DealMarket DIGEST Issue 167 // 19 December 2014
DealMarket DIGEST Issue 167 // 19 December 2014DealMarket DIGEST Issue 167 // 19 December 2014
DealMarket DIGEST Issue 167 // 19 December 2014
 
State of SMB Software Q2 2018
State of SMB Software Q2 2018State of SMB Software Q2 2018
State of SMB Software Q2 2018
 
2020 Software Company Benchmark Report - 132 Companies
2020 Software Company Benchmark Report - 132 Companies2020 Software Company Benchmark Report - 132 Companies
2020 Software Company Benchmark Report - 132 Companies
 
eTailing India Launches Big Data Report - 2015
eTailing India Launches Big Data Report - 2015 eTailing India Launches Big Data Report - 2015
eTailing India Launches Big Data Report - 2015
 
Design2Disrupt: Mark Plakias (Orange Institute) - Unicorns, Startups & Giants
Design2Disrupt: Mark Plakias (Orange Institute) -   Unicorns, Startups & GiantsDesign2Disrupt: Mark Plakias (Orange Institute) -   Unicorns, Startups & Giants
Design2Disrupt: Mark Plakias (Orange Institute) - Unicorns, Startups & Giants
 

More from Fitrie Ratnasari

Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
Fitrie Ratnasari
 
Business Analytics Case on GYF Ads
Business Analytics Case on GYF AdsBusiness Analytics Case on GYF Ads
Business Analytics Case on GYF Ads
Fitrie Ratnasari
 
Programs for Digital Creativepreneurs
Programs for Digital Creativepreneurs Programs for Digital Creativepreneurs
Programs for Digital Creativepreneurs Fitrie Ratnasari
 
Michael Tampi - Peran Inkubator Bisnis
Michael Tampi - Peran Inkubator BisnisMichael Tampi - Peran Inkubator Bisnis
Michael Tampi - Peran Inkubator Bisnis
Fitrie Ratnasari
 
Wempy Dyocta Koto - "Venture Capital"
Wempy Dyocta Koto - "Venture Capital"Wempy Dyocta Koto - "Venture Capital"
Wempy Dyocta Koto - "Venture Capital"
Fitrie Ratnasari
 
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
Fitrie Ratnasari
 
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
Fitrie Ratnasari
 
Crowdfunding in Indonesia - Ministry of Tourism and Creative Economy
Crowdfunding in Indonesia - Ministry of Tourism and Creative EconomyCrowdfunding in Indonesia - Ministry of Tourism and Creative Economy
Crowdfunding in Indonesia - Ministry of Tourism and Creative Economy
Fitrie Ratnasari
 
Theses exam 2012 - Wideband Speech Reconstruction
Theses exam 2012 - Wideband Speech ReconstructionTheses exam 2012 - Wideband Speech Reconstruction
Theses exam 2012 - Wideband Speech Reconstruction
Fitrie Ratnasari
 

More from Fitrie Ratnasari (9)

Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)
 
Business Analytics Case on GYF Ads
Business Analytics Case on GYF AdsBusiness Analytics Case on GYF Ads
Business Analytics Case on GYF Ads
 
Programs for Digital Creativepreneurs
Programs for Digital Creativepreneurs Programs for Digital Creativepreneurs
Programs for Digital Creativepreneurs
 
Michael Tampi - Peran Inkubator Bisnis
Michael Tampi - Peran Inkubator BisnisMichael Tampi - Peran Inkubator Bisnis
Michael Tampi - Peran Inkubator Bisnis
 
Wempy Dyocta Koto - "Venture Capital"
Wempy Dyocta Koto - "Venture Capital"Wempy Dyocta Koto - "Venture Capital"
Wempy Dyocta Koto - "Venture Capital"
 
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014
 
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...
 
Crowdfunding in Indonesia - Ministry of Tourism and Creative Economy
Crowdfunding in Indonesia - Ministry of Tourism and Creative EconomyCrowdfunding in Indonesia - Ministry of Tourism and Creative Economy
Crowdfunding in Indonesia - Ministry of Tourism and Creative Economy
 
Theses exam 2012 - Wideband Speech Reconstruction
Theses exam 2012 - Wideband Speech ReconstructionTheses exam 2012 - Wideband Speech Reconstruction
Theses exam 2012 - Wideband Speech Reconstruction
 

Recently uploaded

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Exploratory Data Analysis of Worldwide Startup Companies using Python

  • 1.   Exploratory Data Analysis for  Machine Learning on:  Worldwide Startup  Companies     Fitrie Ratnasari  28th November, 2020             
  • 2.     Dataset Description & Initial Planning      The dataset will be used from Crunchbase in csv format, of which the data are worldwide start up  companies recorded from 1902 until 2014. It is consisted of 54,294 entries (rows) with 38  attributes of various data types between object and float, as picture shown below:    1 
  • 3.     As for an objective of this project we would like to crack the hidden insight along the years from  1904 until 2014 from crunching the numbers of the dataset, not to mention from the company has  been founded, their status, corporate actions such as Merger & Acquisition (M&A) or becoming  Public Companies, and their funding status whether have received Seed, Angel fund or Venture  Capital investment.   For further analysis we would like to know the successful probability for companies by  considering their status, market, and funding have been received as a basis by conducting  correlation amongst attributes.   In various studies had been taken, successful start up commonly defined as two-way strategy  that makes a large amount of money to its founders, investors and first employees, as a company  can either have an IPO (Initial Public Offering) by going to a public stock market (i.e. Facebook  going public, allowing everyone to invest in the company by buying shares being sold by its  insiders in the U.S stock market) or, be acquired by or merged (M&A) with another company (i.e.  Microsoft acquiring LinkedIn for $26B) where those who have previously invested receive  immediate cash in return for their shares. This process is often denominated as an exit strategy  (Guo, Lou, & Pérez-Castrillo, 2015). This project will therefore consider​ both an IPO (Initial Public  Offering) and a process of M&A (Mergers & Acquisitions) as the critical events that classify a  start-up as successful.  Initial plans before doing further data exploration are seeing the data type thoroughly regarding  its datatype and all data fulfillment given whether they are appropriate, by then we can know  subsequently what kind of data cleansing should be taken.    2 
  • 4.     Data Cleansing & Feature Engineering  After acquiring the dataset we found that there are numerous tasks for data cleansing should be  taken before doing any further analysis, since the dataset is quite messy with formatting, labelling  header, quite a lot involving missing values, and the dataset are also dispersed to introduce  outliers.  So that in this project, the ​data wrenching ​which have been taken are :  1. Fixing spacing format in header such as ‘ market ‘ and ‘ funding_total_usd ‘   2. Remove 4855 row duplicates  3. Tackling uncommon format.  Attribute ‘funding_total_usd’ involved uncommon string format with wrongly used comma  as separate number, then we eliminate the comma and change the data type into  numeric.  4. Handling missing values.  Change the missing values such as ‘funding_total_usd’, from NaN value with 0.   5. Detecting and handling outliers.  When plotting into distribution, outliers really matter to generate uninterpretable  visualization. For this we remove the outlier by using interquartile range. Should be noted  that this step only be used for Exploratory Data Analysis only, not to be used in Machine  Learning (in ML we’ll be transforming the data whether using regression, polynomial  regression or log instead)    Feature Engineering​ also brings advantages such as handling object data-type into numeric by  One-Hot-Encoding and can also be used for transforming the attributes which have an outlier  (considering removing them altogether can also reduce our training accuracy later in the Machine  Learning process). Hence in this dataset we use :  1. One Hot Encoding for attribute ‘status’.  2. Creating new variables of ‘get_seed_funding’, ‘get_angel_fund’, and ‘get_venture’, and  most importantly ‘successfull_code’. 1 for ‘Yes’, and 0 for ‘No’ for all cases mentioned.   3. Change attributes ‘founded_at’ to be ‘founded_year’, since inconsistent data between  year in ‘founded_at’ and year in ‘founded_month’ is found, so that we extract the year in  ‘founded_month’ to be new attribute ‘founded_year’, and subsequently drop the  ‘founded_at’ column.   3 
  • 5.     Key Finding and Insights  Start ups are supposed to be known for their innovation from the gap of problem and solving,  and also known for companies of growth seeking-business, so that the nature of business itself  requires heavy funds and it is common to look for capital from a variety of sources such as angel  investor and venture capitals.  In this section, there are 3 sub-section: start up, market, and funding.     A. START UP    1. ​Top 5 Country in terms of Start Up Quantity:    We can say that the USA has dominating start up quantity across the globe, more than 50%  from whole startups worldwide. It is undoubtedly true, since the US has an immense support  ecosystem for startups to grow from ideation to scale up the business. Following England  with 2,642 start up companies, Canada 1,405 companies, China 1,239 companies and  Germany with 968 companies respectively in 2014.  4 
  • 6.     2. Start Up Status     Since 1902 until 2014 from 49,437 start up companies recorded, 5.4% of them are closed,  86.9% operating and 7,7% acquired can be called as one of terms for successful start up  for their exit strategy.    3. Start Up Founded Year Distribution    From the figure above, mid of 1995 is the commencement of growing startups worldwide,  where recorded around 437 companies and almost doubled in following year by having  731 start up companies in 2001.   5 
  • 7.     The history also took place as ‘Bubble DotCom’, where the technology-companies  attracted the market to be over-valuation. In 1999, the height of the dotcom craze, there  were 457 IPOs. Most were Internet and technology stocks. Of those, 117 doubled in price  on the first day of trading. Tech and dotcom IPOs were minting new millionaires every  day, both at the management level and retail investor level. But then the sell-off started on  March 11, 2000. Investors suddenly realized that a tech and/or Internet company with a  billion-dollar valuation that has no revenue or earnings is saddled with debt and has no  future.     4. Those Who Survived from Dotcom Bubble and Become Tech Titans     After Dotcom Bubble, companies with strong business revenues have survived, namely  Amazon, Netflix, eBay, Google, Alibaba. Some of them are now still tech-leading  companies. As we know FANG+ companies (FACEBOOK, AMAZON, NETFLIX, GOOGLE,  ALIBABA) are the take titans who outperformed the wider market since the coronavirus  (COVID-19) pandemic spurred record sell-offs in March. Unlike other stocks which met  their dip price in this time, FANG+ companies are hype up even until 80% take up rate  compared to their lowest in early March due to its performance and forward looking  valuation.      6 
  • 8.     B. MARKET    1. Top 15 StartUp Market Worldwide            It is obvious that the growing number of startups would touch almost all sectors related to  people as the market. The most common category from all startups worldwide are software  at the highest place, followed by Biotechnology, Mobile, E-Commerce, Curated Web,  Enterprise Software, Health Care, Clean Technology, Games and Embedded system of  hardware & software.   Whereas Figure below shows that e-Commerce is the most favorable category amongs start  up companies in China, Indonesia and India, which is slightly different from States.  7 
  • 9.     2. Most Favorable Category of Startup Product    For almost 3 decades until 2014, Social Media, Curated Web, Mobile can be seen as the  most favorable start up product from all over the world. The least favourable, the smaller  the picture of words would be plotted.      8 
  • 10.     C. FUNDING    1. Total Funding Distribution     Total Funding can be defined as total or sum from whole funding obtained, from seed,  grant, angel investor, and venture capitals in all round. From the picture above we can  see that the dispersion of total funding across startups is very high. So that we take out  the outlier to understand better, as can be seen below. The data tells that most of total  funding are below USD 2.5 Million, and top 10% start up companies received 78% from all  total funding across the globe.          9 
  • 11.     2. Total Funding in Various Unicorn in 2014  Unicorns in 2014 are not as many as today, but there are few of them who are still  becoming the tech titans today. Facebook as one of the unicorns in 2014 successfully  obtained the highest funding compared to Alibaba, Twitter, Cloudera and Uber with the  amount almost USD 2.5 Billion.     3. Seed Funding, vs Angel Funding vs. Venture Investment  Seed money, sometimes being called seed funding or seed capital, is a form of securities  offering in which an investor invests capital in a startup company in exchange for an  equity stake or convertible note stake in the company. The term seed suggests that this is  a very early investment, meant to support the business until it can generate cash of its  own (see cash flow), or until it is ready for further investments. Seed money options  include friends and family funding, seed venture capital funds, angel funding, and  crowdfunding.  The difference between seed funding and angel investment used in this dataset is seed  funding coming from seed venture capital institution funds. Whereas angel investment  coming from informal or private investors or being called as angel investors who  deliberately invest based on their personal preference.   While venture capital is a form of private equity and a type of financing that investors  provide to startup companies and small businesses that are believed to have long-term  10 
  • 12.     growth potential. Venture capital generally comes from well-off investors, investment  banks and any other financial institutions. However, it does not always take a monetary  form; it can also be provided in the form of technical or managerial expertise. In the  dataset column ‘venture’ are the total investment amount from round A, round B, round C,  round D, until round H.   The question is, how many startups are getting seed, angel investment and venture  investment? We can see the difference in three (3) pie chart below.           11 
  • 13.     It is an obvious fact that the most difficult source of funding for start up is to get angel  investment, as the number of angel investments is very few and requires a strong  networking to get access to them. Business incubator can be the hub between angel  investors and start up companies.   Meanwhile, the startup percentage of getting seed funding is around 28%. From investor  glasses, giving seed funding can be both advantage and disadvantage. The drawbacks is  the risk would be higher than investors who inject the monetary during a VC round, since  the real market and numbers of revenue are not there. On the other hand, when investors  choose the right start up in the seed stage they will be having higher return as they do not  need to inject more monetary funds to take a part in the shareholders list like VC rounds  do. High risk, high return.  Last but not least, if there are 100 startup companies, statistics show that 47 of them are  backed by Venture Capital institutions. Even though this might seem easy for a startup to  get VC investment, it should be taken into consideration that the startup must be ready  with all due diligence processes. If VC is interested in the proposal, the firm or the  investor must then perform due diligence, which includes a thorough investigation of the  company's business model, products, management, and operating history, among other  things.  Once due diligence has been completed, the firm or the investor will pledge an  investment of capital in exchange for equity in the company. These funds may be  provided all at once, but more typically the capital is provided in rounds. The firm or  investor then takes an active role in the funded company, advising and monitoring its  progress before releasing additional funds.   The investor exits the company after a period of time, typically four to six years after the  initial investment, by initiating a merger, acquisition or initial public offering (IPO), of which  we label them as successful startup companies later on in this project.      12 
  • 14.       4. Do the Successful Unicorns require seed funding?  It is a fascinating fact that FANG+ companies (Facebook, Amazon, Netflix, Google,  Alibaba) as the tech titans-companies today which took majority market caps in Wall  Street were not required seed funding back then, right after they founded the product.   They are using bootstrapping to fund themselves. On the other hand, we can see  Dropbox and Uber (also becoming unicorn today) received seed funding back then for  USD 200,000 and below.        13 
  • 15.     5. How much does a monetary fund differ from seed funding, angel investor, and each  round in VC investment?                      From the dataset we can conclude that the seed funding average took the lowest fund  amongst other funding rounds and sources, around USD 776,350 While angel investment average is around USD 1 Million.  Whereas Venture Capital rounds average are:  Average of Round A : USD 6.9 Million  Average of Round B : USD 13.5 Million  Average of Round C : USD 21 Million  Average of Round D : USD 28 Million  Average of Round E : USD 32 Million  Average of Round F : USD 48 Million  Average of Round G : USD 83 Million  Average of Round H : USD 175 Million  In 2014, only 4 companies are getting round H investment, 3 of them are e-Commerce:  Flipkart (India), Deem (USA), Locondo (Japan).     The other one company categorised in game, which has headquarter in Singapore,  named Gumi.   14 
  • 16.     Hypothesis & Statistical Significance Testing  After seeing the data, the hypothesis arises are:  1. Startups in the USA have strong relations for being successful startups or have linear  relationships, since the ecosystem is set up greatly.  2. Startups who get venture investment supposed to be successful startups.  3. Founded year becomes one of the predictors for a startup to succeed.  Statistical Significance Testing    Method would be used to conduct statistical significance to prove the hypothesis is through  calculating Pearson Correlation and p-value between attributes.     Correlation is a measure of the extent of interdependence between variables.  Causation is the relationship between cause and effect between two variables.  It is important to know the difference between these two and that correlation does not imply  causation. Determining correlation is much simpler than determining causation as causation may  require independent experimentation.    Pearson Correlation  The Pearson Correlation measures the linear dependence between two variables X & Y. The  resulting coefficient is a value between -1 and 1 inclusive, where:  ● 1: Total positive linear correlation.  ● 0: No linear correlation, the two variables most likely do not affect each other.  ● -1: Total negative linear correlation.    P-value  P-value is the probability value that the correlation between these two variables is statistically  significant. Normally, we choose a significance level of 0.05, which means that we are 95%  confident that the correlation between the variables is significant. By convention when:  p-value is < 0.001: we say there is strong evidence that the correlation is significant. p-value is <  0.05: there is moderate evidence that the correlation is significant.   p-value is < 0.1: there is weak evidence that the correlation is significant.   p-value is > 0.1: there is no evidence that the correlation is significant.    15 
  • 17.     New ‘target’ variable is made to define successful startup or not. As mentioned in the first section,  successful startups are those who have Merger & Acquisition record or have become public  companies.   Result below shows that correlation between USA startup and successful startup is extremely  weak linear relationship, since the p-value is < 0.001, the correlation between country code USA  and become successful startup is statistically significant, although the linear relationship is weak  (~0.09), means that even if you are from USA and build a startup it does not mean it would be  successful later on.    Same idea with relationship between getting venture investment and becoming successful  startup, since the p-value is < 0.001, the correlation between getting venture and become  successful startup is statistically significant, although the linear relationship is weak (~0.12), means  getting venture does only have a slight effect on becoming successful startup.    Hence we can conclude that Hypothesis 1 and 2 are not correctly proved.                  16 
  • 18.     Suggestion & Summary  There are numerous suggestions to take the analysis going further:  1. Using the most updated dataset in Q3 2020, so that we can see the start up dynamics on  what happened during Corona Virus, this can be a basis for the Government to mitigate  the failure of promising startup companies.  2. Predicting whether from startup companies are having high probability to succeed  (defined by Merger & Acquisition or going IPO) by using Machine Learning on  Classification Model.    Summary  The dataset is about startup, corporate actions and investment obtained, sourced from  Crunchbase in csv format, of which the data are worldwide start up companies recorded from  1902 until 2014. It consists of 54,294 entries (rows) with 38 attributes of various data types  between objects and floats with quite messy formatting. Due to this limitation, several steps in  data cleansing are highly required before conducting further analysis.   For further research, would be great if data can also involve variables which have correlation to  target, such as founder experience year, number of patents/copyright/ any goodwill of the  company, grit score of the founders, customer count, etc, by that we can see which ones of them  could boost up number of successful startups. Subsequently could enhance prediction in data  training for Machine Learning.     17 
  • 19.     Source Code Archive  https://github.com/fitrieratna/notebook/blob/master/EDA_START_UP_v_finale.ipynb    References  1. IBM Machine Learning Foundation, Exploratory Data Analysis, 2020  2. IBM Data Science For Professional: Machine Learning, 2020  18