1/2
Exploratory Data Analysis is the first step in Machine Learning to get understanding of the dataset by summarizing their characteristic and often plotting them visually.
Besides, the aim in this paper is to get insights which derived from data points and in the end of this research, hypothesis are being examined by Statistical Significance Testing.
The enterprise software industry is being transformed by substantial investor capital, Cloud 2.0, artificial intelligence, data protection, preferred platforms, and a talent shortage, leading stakeholders of all kinds to make big changes, and big choices.
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...Arno Kapteyn
In marketing a saying goes: “The Customer does not exist” this recognizes that each customer is unique in its desires and requirements and there is no such thing as a one size fits all. In analogy this session sets out with the statement that “The business does not exist”. An enterprise is a unique combination of central or de-central, task, geography, goal and/ or skills oriented divisions and departments. All these stakeholders have individual needs and requirements that might be met or supported by the services offered by the Enterprise IT Domain. To offer a one size fits all solution (per service) in response to this multitude of desires is to reuse the line of though from Henry Ford: “They can have any color they like as long as it is black”. This approach was very useful to help build the company into the global enterprise it is today. Yet at some point in the growth of the organization it was abandoned in favor of more variation of choice as a trip to the ford dealer will tell you. Even a standard product like a McDonalds hamburger varies to suite the (taste) requirements in different parts of the world.
How can the IT Domain of an enterprise growing in complexity, size, geographical presence etc. strategically prepare to deal with this complexity and ensure it will (keep) deliver(ing) maximum value in the eyes of the individual stakeholder (groups)?
Private Equity: Powering Alpha Via AI, Analytics & AutomationCognizant
Embedding a data-driven approach that relies on the latest digital technologies, tools and techniques can help to increase the value of portfolio companies and enable them to transform – which can be critical while formulating exit strategies.
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingCognizant
It’s a new era in business, in which growth will be driven by finding meaning and insights in data. Recent research demonstrates what separates winners from losers and how to rise to the top as a "meaning maker."
What's in store for Big Data in 2015? Will the 'Internet of Things' fuel the Industrial Internet? Will Big Data get Cloudy? Check out the top five Big Data predictions for 2015 according to Quentin Gallivan, CEO, Pentah0
DealMarket Digest Issue 131 - 7 March 2014Urs Haeusler
SEE WHATS NOTEWORTHY IN PRIVATE EQUITY THIS WEEK /// ISSUE 131 - March 7th, 2014:
- How New European Rules Affect Private Equity Teams
- PE outlook for Europe
- EY’s Top 10 VC Dealmakers Worldwide
- Global Telecom M&A Hits 13 Year High
- PE Drives Robust Returns for Ontario Pension Fund
- Quote of the Week: Venture Capital? Make Way for Geek Guilds
Data - the new oil #data #oil #technology #futureBenjamin Rohé
Data - the #1 artificial resource, soon self generated by machines, will be the basis to fuel the economic growth and power shift in the world of tomorrow.
Over the last years, Europe and the U.S. have lost share of the global GDP - especially Europe is in a dramatic decline. The ongoing industrial revolution, fuelled by data, will re-distribute power and GDP share in a unprecedented speed and ways.
2020 Software Company Benchmark Report - 132 CompaniesKelly Thomas
This is the 2020-1 version. This will be updated regularly throughout the year based on new financials and market cap information.
Benchmark analysis of 132 publicly-traded software companies. Includes growth rates, gross margin, market capitalization, EBITDA, sales and marketing investment, R&D investment, G&A, stock compensation, operating income, revenue per employee, historical analysis, IPO analysis, free cash flow, and cash position, along with market cap correlation analysis and many others.
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)Fitrie Ratnasari
Kebutuhan pasokan Nikel sebagai bahan baku utama baterai listrik untuk mobil listrik (EV) di tahun mendatang dan mengingat secara histori Indonesia sebagai salah satu produsen nikel memiliki cadangan nikel terbesar di dunia yaitu 21 juta ton metrik th 2020
menjadi potensi yang sangat menarik.
Namun Nikel yang dihasilkan Indonesia mayoritas masih berupa Nickel Ore dan Ferronikel/Nickel Pig Iron, membutuhkan
rantai panjang untuk menjadi Nikel Mate yang digunakan sbg bahan baterai listrik.
Research ini bertujuan untuk melihat potensi dan tantangan tersebut, sentiman opini publik sebagai penggerak pasar menjadi penting diamati. Sehingga dapat menelaah adakah korelasi antara
sentimen publik dengan harga saham nikel Indonesia (dalam hal ini PT. Antam, sebagai saham favorit beberapa institusi keuangan besar dan juga investor retail).
This project aimed for Business Analytics Specialization on Wharton School, University of Pennsylvania.
Case discussed is GYF Ads who has threat from Ad Blocking that cannibalized organic user base. Problem Statement, Strategies, and Measurement are embodied in this presentation.
More Related Content
Similar to Exploratory Data Analysis of Worldwide Startup Companies using Python
The enterprise software industry is being transformed by substantial investor capital, Cloud 2.0, artificial intelligence, data protection, preferred platforms, and a talent shortage, leading stakeholders of all kinds to make big changes, and big choices.
ISACA IT GRC Conference 2008 Creating Business Value by means of Stakeholder ...Arno Kapteyn
In marketing a saying goes: “The Customer does not exist” this recognizes that each customer is unique in its desires and requirements and there is no such thing as a one size fits all. In analogy this session sets out with the statement that “The business does not exist”. An enterprise is a unique combination of central or de-central, task, geography, goal and/ or skills oriented divisions and departments. All these stakeholders have individual needs and requirements that might be met or supported by the services offered by the Enterprise IT Domain. To offer a one size fits all solution (per service) in response to this multitude of desires is to reuse the line of though from Henry Ford: “They can have any color they like as long as it is black”. This approach was very useful to help build the company into the global enterprise it is today. Yet at some point in the growth of the organization it was abandoned in favor of more variation of choice as a trip to the ford dealer will tell you. Even a standard product like a McDonalds hamburger varies to suite the (taste) requirements in different parts of the world.
How can the IT Domain of an enterprise growing in complexity, size, geographical presence etc. strategically prepare to deal with this complexity and ensure it will (keep) deliver(ing) maximum value in the eyes of the individual stakeholder (groups)?
Private Equity: Powering Alpha Via AI, Analytics & AutomationCognizant
Embedding a data-driven approach that relies on the latest digital technologies, tools and techniques can help to increase the value of portfolio companies and enable them to transform – which can be critical while formulating exit strategies.
The Value of Signal (and the Cost of Noise): The New Economics of Meaning-MakingCognizant
It’s a new era in business, in which growth will be driven by finding meaning and insights in data. Recent research demonstrates what separates winners from losers and how to rise to the top as a "meaning maker."
What's in store for Big Data in 2015? Will the 'Internet of Things' fuel the Industrial Internet? Will Big Data get Cloudy? Check out the top five Big Data predictions for 2015 according to Quentin Gallivan, CEO, Pentah0
DealMarket Digest Issue 131 - 7 March 2014Urs Haeusler
SEE WHATS NOTEWORTHY IN PRIVATE EQUITY THIS WEEK /// ISSUE 131 - March 7th, 2014:
- How New European Rules Affect Private Equity Teams
- PE outlook for Europe
- EY’s Top 10 VC Dealmakers Worldwide
- Global Telecom M&A Hits 13 Year High
- PE Drives Robust Returns for Ontario Pension Fund
- Quote of the Week: Venture Capital? Make Way for Geek Guilds
Data - the new oil #data #oil #technology #futureBenjamin Rohé
Data - the #1 artificial resource, soon self generated by machines, will be the basis to fuel the economic growth and power shift in the world of tomorrow.
Over the last years, Europe and the U.S. have lost share of the global GDP - especially Europe is in a dramatic decline. The ongoing industrial revolution, fuelled by data, will re-distribute power and GDP share in a unprecedented speed and ways.
2020 Software Company Benchmark Report - 132 CompaniesKelly Thomas
This is the 2020-1 version. This will be updated regularly throughout the year based on new financials and market cap information.
Benchmark analysis of 132 publicly-traded software companies. Includes growth rates, gross margin, market capitalization, EBITDA, sales and marketing investment, R&D investment, G&A, stock compensation, operating income, revenue per employee, historical analysis, IPO analysis, free cash flow, and cash position, along with market cap correlation analysis and many others.
Analisa Sentimen Publik Terhadap Emiten Nikel (PT Antam Tbk.)Fitrie Ratnasari
Kebutuhan pasokan Nikel sebagai bahan baku utama baterai listrik untuk mobil listrik (EV) di tahun mendatang dan mengingat secara histori Indonesia sebagai salah satu produsen nikel memiliki cadangan nikel terbesar di dunia yaitu 21 juta ton metrik th 2020
menjadi potensi yang sangat menarik.
Namun Nikel yang dihasilkan Indonesia mayoritas masih berupa Nickel Ore dan Ferronikel/Nickel Pig Iron, membutuhkan
rantai panjang untuk menjadi Nikel Mate yang digunakan sbg bahan baterai listrik.
Research ini bertujuan untuk melihat potensi dan tantangan tersebut, sentiman opini publik sebagai penggerak pasar menjadi penting diamati. Sehingga dapat menelaah adakah korelasi antara
sentimen publik dengan harga saham nikel Indonesia (dalam hal ini PT. Antam, sebagai saham favorit beberapa institusi keuangan besar dan juga investor retail).
This project aimed for Business Analytics Specialization on Wharton School, University of Pennsylvania.
Case discussed is GYF Ads who has threat from Ad Blocking that cannibalized organic user base. Problem Statement, Strategies, and Measurement are embodied in this presentation.
This presentation was presented by Michael Tampi, CoFounder KINARA Incubator on Stakeholder Meeting at Makassar 19th of June 2014, which held by Ministry of Tourism and Creative Economy - The Republic of Indonesia.
This presentation was presented by Wempy Dyocta Koto, CEO Systec Venture Capital on Stakeholder Meeting at Makassar 19th of June 2014, which held by Ministry of Tourism and Creative Economy - The Republic of Indonesia.
Michael Tampi - Idea Validation @Pusat Kreatif Digital Bandung 2014Fitrie Ratnasari
This presentation was presented by Michael Tampi (Founder CommaID and CoFounder KinaraID) at Kopi Sore, Pusat Kreatif Digital Bandung on June 30th, 2014.
Wempy Dyocta Koto - How To Come Up with Unique Ideas @Pusat Kreatif Digital B...Fitrie Ratnasari
This presentation was presented by Wempy Dyocta Koto (CEO Systec Venture Capital & CEO Wardour & Oxford) at Kopi Sore, Pusat Kreatif Digital Bandung on June 30th, 2014.
Crowdfunding in Indonesia - Ministry of Tourism and Creative EconomyFitrie Ratnasari
This presentation prepared for Director of Cooperation and Facilitation, Ministry of Tourism and Creative Economy on her speech in International Crowdfunding Seminar at BPPT, Jakarta. This Seminar was the first seminar in Indonesia in terms of Crowdfunding. Crowdfunding becoming the hottest issue in Indonesia as one of solution for the creativepreneur to realize their projects. Contrary, access to the Bank for the creative industry is a mattered, due to collateral issue and prudent aspect that Bank holds as its principle. That's why Crowdfunding being the promising way to the creative industry in Indonesia.
Unfortunately, In Indonesia Crowdfunding is still in a grey area, because the regulation does not exist at the moment. Government related to crowdfunding are Bank of Indonesia, OJK, Ministry of Tourism and Creative Economy, Ministry of Communication and Informatics and Ministry of Coordinating for Economic Affairs.
For sure, Ministry of Tourism and Creative Economy cq. Directorate Cooperation and Facilitation will open discuss with the related stakeholders in terms of devising regulation soon.
Presentation prepared by Fitrie Ratnasari
In most of the communication systems speech is transmittes in narrowband, containing frequencies from 300 Hz to 3400 Hz. Compared with normal speech which is generally contains a perceptually significant amount of energy up to 8 kHz, this speech has a muffled quality and reduced intelligibility, particularly noticeable in sounds such as /s/ and /f/ . Speech which has been bandlimited to 8 kHz is often coded for this reason, but this requires an increase in the bit rate.
Wideband reconstruction is a scheme that adds a synthesized highband signal to narrowband speech to produce a higher quality wideband speech signal. The synthesized highband signal is based entirely on information contained in the narrowband speech, and is thus achieved at zero increase in the bit rate from a coding perspective. Wideband reconstruction can function as a post-processor to any narrowband telephone receiver, or alternatively it can be combined with any narrowband speech coder to produce a very low bit rate wideband speech coder. Applications include higher quality mobile, teleconferencing, and internet telephony.
This final project aims to simulate the bandwidth extension system using spectral shifting method for highband excitation, which is used codebook and linear mapping to estimate the envelope of highband. The algorithm for wide band expansion proved to work, though certain unwanted artefacts were introduced in the reconstructed signal. Listening tests confirmed the presence of these unwanted artefacts. Objective and subjective tests demonstrate that wideband speech synthesized using these techniques have presentage in (numerical) 50 % of the respondences with SNR 5,13 dB. Optimum parameter used in this system goes to Euclidean distance with K=1 for KNN classification and correlation distance with 256 clusters for Kmean clustering. Computational time for spectral shifting 0.144 s, for spectral folding 0.138 s and codebook needs 164,2 s. Subjective measurement using DMOS for spectral shifting about 3.65 and for spectral folding 2. However further research and improvement to reach higher quality from this system for implementation are still needed.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Exploratory Data Analysis of Worldwide Startup Companies using Python
1.
Exploratory Data Analysis for
Machine Learning on:
Worldwide Startup
Companies
Fitrie Ratnasari
28th November, 2020
2.
Dataset Description & Initial Planning
The dataset will be used from Crunchbase in csv format, of which the data are worldwide start up
companies recorded from 1902 until 2014. It is consisted of 54,294 entries (rows) with 38
attributes of various data types between object and float, as picture shown below:
1
3.
As for an objective of this project we would like to crack the hidden insight along the years from
1904 until 2014 from crunching the numbers of the dataset, not to mention from the company has
been founded, their status, corporate actions such as Merger & Acquisition (M&A) or becoming
Public Companies, and their funding status whether have received Seed, Angel fund or Venture
Capital investment.
For further analysis we would like to know the successful probability for companies by
considering their status, market, and funding have been received as a basis by conducting
correlation amongst attributes.
In various studies had been taken, successful start up commonly defined as two-way strategy
that makes a large amount of money to its founders, investors and first employees, as a company
can either have an IPO (Initial Public Offering) by going to a public stock market (i.e. Facebook
going public, allowing everyone to invest in the company by buying shares being sold by its
insiders in the U.S stock market) or, be acquired by or merged (M&A) with another company (i.e.
Microsoft acquiring LinkedIn for $26B) where those who have previously invested receive
immediate cash in return for their shares. This process is often denominated as an exit strategy
(Guo, Lou, & Pérez-Castrillo, 2015). This project will therefore consider both an IPO (Initial Public
Offering) and a process of M&A (Mergers & Acquisitions) as the critical events that classify a
start-up as successful.
Initial plans before doing further data exploration are seeing the data type thoroughly regarding
its datatype and all data fulfillment given whether they are appropriate, by then we can know
subsequently what kind of data cleansing should be taken.
2
4.
Data Cleansing & Feature Engineering
After acquiring the dataset we found that there are numerous tasks for data cleansing should be
taken before doing any further analysis, since the dataset is quite messy with formatting, labelling
header, quite a lot involving missing values, and the dataset are also dispersed to introduce
outliers.
So that in this project, the data wrenching which have been taken are :
1. Fixing spacing format in header such as ‘ market ‘ and ‘ funding_total_usd ‘
2. Remove 4855 row duplicates
3. Tackling uncommon format.
Attribute ‘funding_total_usd’ involved uncommon string format with wrongly used comma
as separate number, then we eliminate the comma and change the data type into
numeric.
4. Handling missing values.
Change the missing values such as ‘funding_total_usd’, from NaN value with 0.
5. Detecting and handling outliers.
When plotting into distribution, outliers really matter to generate uninterpretable
visualization. For this we remove the outlier by using interquartile range. Should be noted
that this step only be used for Exploratory Data Analysis only, not to be used in Machine
Learning (in ML we’ll be transforming the data whether using regression, polynomial
regression or log instead)
Feature Engineering also brings advantages such as handling object data-type into numeric by
One-Hot-Encoding and can also be used for transforming the attributes which have an outlier
(considering removing them altogether can also reduce our training accuracy later in the Machine
Learning process). Hence in this dataset we use :
1. One Hot Encoding for attribute ‘status’.
2. Creating new variables of ‘get_seed_funding’, ‘get_angel_fund’, and ‘get_venture’, and
most importantly ‘successfull_code’. 1 for ‘Yes’, and 0 for ‘No’ for all cases mentioned.
3. Change attributes ‘founded_at’ to be ‘founded_year’, since inconsistent data between
year in ‘founded_at’ and year in ‘founded_month’ is found, so that we extract the year in
‘founded_month’ to be new attribute ‘founded_year’, and subsequently drop the
‘founded_at’ column.
3
5.
Key Finding and Insights
Start ups are supposed to be known for their innovation from the gap of problem and solving,
and also known for companies of growth seeking-business, so that the nature of business itself
requires heavy funds and it is common to look for capital from a variety of sources such as angel
investor and venture capitals.
In this section, there are 3 sub-section: start up, market, and funding.
A. START UP
1. Top 5 Country in terms of Start Up Quantity:
We can say that the USA has dominating start up quantity across the globe, more than 50%
from whole startups worldwide. It is undoubtedly true, since the US has an immense support
ecosystem for startups to grow from ideation to scale up the business. Following England
with 2,642 start up companies, Canada 1,405 companies, China 1,239 companies and
Germany with 968 companies respectively in 2014.
4
6.
2. Start Up Status
Since 1902 until 2014 from 49,437 start up companies recorded, 5.4% of them are closed,
86.9% operating and 7,7% acquired can be called as one of terms for successful start up
for their exit strategy.
3. Start Up Founded Year Distribution
From the figure above, mid of 1995 is the commencement of growing startups worldwide,
where recorded around 437 companies and almost doubled in following year by having
731 start up companies in 2001.
5
7.
The history also took place as ‘Bubble DotCom’, where the technology-companies
attracted the market to be over-valuation. In 1999, the height of the dotcom craze, there
were 457 IPOs. Most were Internet and technology stocks. Of those, 117 doubled in price
on the first day of trading. Tech and dotcom IPOs were minting new millionaires every
day, both at the management level and retail investor level. But then the sell-off started on
March 11, 2000. Investors suddenly realized that a tech and/or Internet company with a
billion-dollar valuation that has no revenue or earnings is saddled with debt and has no
future.
4. Those Who Survived from Dotcom Bubble and Become Tech Titans
After Dotcom Bubble, companies with strong business revenues have survived, namely
Amazon, Netflix, eBay, Google, Alibaba. Some of them are now still tech-leading
companies. As we know FANG+ companies (FACEBOOK, AMAZON, NETFLIX, GOOGLE,
ALIBABA) are the take titans who outperformed the wider market since the coronavirus
(COVID-19) pandemic spurred record sell-offs in March. Unlike other stocks which met
their dip price in this time, FANG+ companies are hype up even until 80% take up rate
compared to their lowest in early March due to its performance and forward looking
valuation.
6
8.
B. MARKET
1. Top 15 StartUp Market Worldwide
It is obvious that the growing number of startups would touch almost all sectors related to
people as the market. The most common category from all startups worldwide are software
at the highest place, followed by Biotechnology, Mobile, E-Commerce, Curated Web,
Enterprise Software, Health Care, Clean Technology, Games and Embedded system of
hardware & software.
Whereas Figure below shows that e-Commerce is the most favorable category amongs start
up companies in China, Indonesia and India, which is slightly different from States.
7
9.
2. Most Favorable Category of Startup Product
For almost 3 decades until 2014, Social Media, Curated Web, Mobile can be seen as the
most favorable start up product from all over the world. The least favourable, the smaller
the picture of words would be plotted.
8
10.
C. FUNDING
1. Total Funding Distribution
Total Funding can be defined as total or sum from whole funding obtained, from seed,
grant, angel investor, and venture capitals in all round. From the picture above we can
see that the dispersion of total funding across startups is very high. So that we take out
the outlier to understand better, as can be seen below. The data tells that most of total
funding are below USD 2.5 Million, and top 10% start up companies received 78% from all
total funding across the globe.
9
11.
2. Total Funding in Various Unicorn in 2014
Unicorns in 2014 are not as many as today, but there are few of them who are still
becoming the tech titans today. Facebook as one of the unicorns in 2014 successfully
obtained the highest funding compared to Alibaba, Twitter, Cloudera and Uber with the
amount almost USD 2.5 Billion.
3. Seed Funding, vs Angel Funding vs. Venture Investment
Seed money, sometimes being called seed funding or seed capital, is a form of securities
offering in which an investor invests capital in a startup company in exchange for an
equity stake or convertible note stake in the company. The term seed suggests that this is
a very early investment, meant to support the business until it can generate cash of its
own (see cash flow), or until it is ready for further investments. Seed money options
include friends and family funding, seed venture capital funds, angel funding, and
crowdfunding.
The difference between seed funding and angel investment used in this dataset is seed
funding coming from seed venture capital institution funds. Whereas angel investment
coming from informal or private investors or being called as angel investors who
deliberately invest based on their personal preference.
While venture capital is a form of private equity and a type of financing that investors
provide to startup companies and small businesses that are believed to have long-term
10
12.
growth potential. Venture capital generally comes from well-off investors, investment
banks and any other financial institutions. However, it does not always take a monetary
form; it can also be provided in the form of technical or managerial expertise. In the
dataset column ‘venture’ are the total investment amount from round A, round B, round C,
round D, until round H.
The question is, how many startups are getting seed, angel investment and venture
investment? We can see the difference in three (3) pie chart below.
11
13.
It is an obvious fact that the most difficult source of funding for start up is to get angel
investment, as the number of angel investments is very few and requires a strong
networking to get access to them. Business incubator can be the hub between angel
investors and start up companies.
Meanwhile, the startup percentage of getting seed funding is around 28%. From investor
glasses, giving seed funding can be both advantage and disadvantage. The drawbacks is
the risk would be higher than investors who inject the monetary during a VC round, since
the real market and numbers of revenue are not there. On the other hand, when investors
choose the right start up in the seed stage they will be having higher return as they do not
need to inject more monetary funds to take a part in the shareholders list like VC rounds
do. High risk, high return.
Last but not least, if there are 100 startup companies, statistics show that 47 of them are
backed by Venture Capital institutions. Even though this might seem easy for a startup to
get VC investment, it should be taken into consideration that the startup must be ready
with all due diligence processes. If VC is interested in the proposal, the firm or the
investor must then perform due diligence, which includes a thorough investigation of the
company's business model, products, management, and operating history, among other
things.
Once due diligence has been completed, the firm or the investor will pledge an
investment of capital in exchange for equity in the company. These funds may be
provided all at once, but more typically the capital is provided in rounds. The firm or
investor then takes an active role in the funded company, advising and monitoring its
progress before releasing additional funds.
The investor exits the company after a period of time, typically four to six years after the
initial investment, by initiating a merger, acquisition or initial public offering (IPO), of which
we label them as successful startup companies later on in this project.
12
14.
4. Do the Successful Unicorns require seed funding?
It is a fascinating fact that FANG+ companies (Facebook, Amazon, Netflix, Google,
Alibaba) as the tech titans-companies today which took majority market caps in Wall
Street were not required seed funding back then, right after they founded the product.
They are using bootstrapping to fund themselves. On the other hand, we can see
Dropbox and Uber (also becoming unicorn today) received seed funding back then for
USD 200,000 and below.
13
15.
5. How much does a monetary fund differ from seed funding, angel investor, and each
round in VC investment?
From the dataset we can conclude that the seed funding average took the lowest fund
amongst other funding rounds and sources, around USD 776,350
While angel investment average is around USD 1 Million.
Whereas Venture Capital rounds average are:
Average of Round A : USD 6.9 Million
Average of Round B : USD 13.5 Million
Average of Round C : USD 21 Million
Average of Round D : USD 28 Million
Average of Round E : USD 32 Million
Average of Round F : USD 48 Million
Average of Round G : USD 83 Million
Average of Round H : USD 175 Million
In 2014, only 4 companies are getting round H investment, 3 of them are e-Commerce:
Flipkart (India), Deem (USA), Locondo (Japan).
The other one company categorised in game, which has headquarter in Singapore,
named Gumi.
14
16.
Hypothesis & Statistical Significance Testing
After seeing the data, the hypothesis arises are:
1. Startups in the USA have strong relations for being successful startups or have linear
relationships, since the ecosystem is set up greatly.
2. Startups who get venture investment supposed to be successful startups.
3. Founded year becomes one of the predictors for a startup to succeed.
Statistical Significance Testing
Method would be used to conduct statistical significance to prove the hypothesis is through
calculating Pearson Correlation and p-value between attributes.
Correlation is a measure of the extent of interdependence between variables.
Causation is the relationship between cause and effect between two variables.
It is important to know the difference between these two and that correlation does not imply
causation. Determining correlation is much simpler than determining causation as causation may
require independent experimentation.
Pearson Correlation
The Pearson Correlation measures the linear dependence between two variables X & Y. The
resulting coefficient is a value between -1 and 1 inclusive, where:
● 1: Total positive linear correlation.
● 0: No linear correlation, the two variables most likely do not affect each other.
● -1: Total negative linear correlation.
P-value
P-value is the probability value that the correlation between these two variables is statistically
significant. Normally, we choose a significance level of 0.05, which means that we are 95%
confident that the correlation between the variables is significant. By convention when:
p-value is < 0.001: we say there is strong evidence that the correlation is significant. p-value is <
0.05: there is moderate evidence that the correlation is significant.
p-value is < 0.1: there is weak evidence that the correlation is significant.
p-value is > 0.1: there is no evidence that the correlation is significant.
15
17.
New ‘target’ variable is made to define successful startup or not. As mentioned in the first section,
successful startups are those who have Merger & Acquisition record or have become public
companies.
Result below shows that correlation between USA startup and successful startup is extremely
weak linear relationship, since the p-value is < 0.001, the correlation between country code USA
and become successful startup is statistically significant, although the linear relationship is weak
(~0.09), means that even if you are from USA and build a startup it does not mean it would be
successful later on.
Same idea with relationship between getting venture investment and becoming successful
startup, since the p-value is < 0.001, the correlation between getting venture and become
successful startup is statistically significant, although the linear relationship is weak (~0.12), means
getting venture does only have a slight effect on becoming successful startup.
Hence we can conclude that Hypothesis 1 and 2 are not correctly proved.
16
18.
Suggestion & Summary
There are numerous suggestions to take the analysis going further:
1. Using the most updated dataset in Q3 2020, so that we can see the start up dynamics on
what happened during Corona Virus, this can be a basis for the Government to mitigate
the failure of promising startup companies.
2. Predicting whether from startup companies are having high probability to succeed
(defined by Merger & Acquisition or going IPO) by using Machine Learning on
Classification Model.
Summary
The dataset is about startup, corporate actions and investment obtained, sourced from
Crunchbase in csv format, of which the data are worldwide start up companies recorded from
1902 until 2014. It consists of 54,294 entries (rows) with 38 attributes of various data types
between objects and floats with quite messy formatting. Due to this limitation, several steps in
data cleansing are highly required before conducting further analysis.
For further research, would be great if data can also involve variables which have correlation to
target, such as founder experience year, number of patents/copyright/ any goodwill of the
company, grit score of the founders, customer count, etc, by that we can see which ones of them
could boost up number of successful startups. Subsequently could enhance prediction in data
training for Machine Learning.
17