2. Thesis
2
1 Recent software & hardware advancements have made large datasets easier to collect and analyze; firms are finding
new datasets and new ways to apply insights learned, especially in the insurance, lending, and hiring sectors
2 In lending, creditors can better understand applicant risks by analyzing non-traditional datasets and use this
information to target unrepresented potential borrowers, or to reduce interest rates charged existing borrowers
3 In insurance, new data allows insurers to better understand the people or property being insured, enabling better risk
management (such as improved preventative healthcare) and more efficient pricing of insurance products
4 In jobs & hiring, alternative datasets give employers valuable insights about an applicant using behavioral and social
information, as opposed to relying on static, structured indicators of past job and school performance
5
Startups can succeed in niche segments by building scalable products that rely on utilizing previously unused or
unobserved datasets; incumbents need to leverage their already large customer bases to collect new data while
preventing customer attrition
3. Advancements in Data Collection and
Analysis
3
Smartphones, Wearables and Internet-of-Things (IoT)
Smartphones and Wearables
• Location data can be collected in real-time by smartphones or
automobiles as well as through POS systems and APIs provided
by credit card networks (eg: Mastercard’s Locations API)
• This can help businesses provide relevant services by
understanding the locations a customer frequents
• Medical and fitness data is continually recorded through
motion and health sensors built into devices
• Doctors can monitor health markers like heart rate in
real time as opposed to traditional static readings
• Insurance companies can dynamically adjust pricing
and better understand their liabilities using this data
Internet-of-Things (IoT)
• Enterprise IoT sensors on machinery and other equipment can
help manufacturing companies critically examine their supply
chain from end-to-end and lower their costs
• Consumer IoT devices such as smart cars, thermostats and
motion sensors collect time and location data regarding sleep,
movement, work and activity among other everyday tasks
• This data can provide businesses such as e-commerce
companies and advertisers a more complete picture of
the lifestyle, habits and preferences of an individual
• Businesses can use this data for better targeted
advertising, dynamic pricing and promotions based on
variability in an individual consumer’s preferences and
demand over time-of-day or over longer periods
Social Data
Social Data of Individuals
• Advancements in text, speech and image analytics using natural
language processing and artificial intelligence provide
businesses with several tools to analyze social media data
• This can give businesses unique insights about one’s activities
and personality, which is especially significant for recent
graduates and lower-income individuals whose data has not
been collected significantly through traditional channels
• Examples:
• Alternative lenders can evaluate credit risk by analyzing
one’s social media activity and immediate social network
as well as by using social finance apps like Venmo to get
a non-traditional view into a user’s expenditures
• Life and Health Insurance companies can use social data
to adjust pricing based on one’s lifestyle and food habits
Social Data of Businesses
• Social data is also gaining prominence as a barometer for
general sentiment surrounding businesses
• Key data sources include number of social media followers of a
company, online posts of customers as well as employees about
the company and direct online interactions with customers
• This data can be analyzed to obtain insights into employee and
customer satisfaction of a company and can potentially be used
to evaluate it’s financial stability and the price of it’s equity
• Example: Buffalo Wild Wings’ Q3’15 decline in profitability was
closely matched by a decline in tweets related to the company
4. Advancements in Data Collection and
Analysis
Source: Frost & Sullivan, Cisco, Wikibon
4
Global Big Data Market
7.6
19.6
33.31
43.4
55.2
2011 2013 2015 2017 2019
Billions of USD
Data Analysis
Big Data Analytics
• Modern Big Data software apply data sets and application
functions on many different machines, which accomplish the
task in parallel, reducing inefficiencies and calculation time
• Recognition of patterns within the abundance of data
collected, often using machine learning algorithms, is key to
making the data actionable for businesses
• Example: Treato, a social health startup, utilizes machine
learning to identify drug side-effects and prescription patterns
using data from social networks and patient health forums
Examples of Powerful Big Data Software
• Apache Hadoop – Software using parallel data execution
frameworks to process persisted big data sets
• Apache Spark – Similar to Apache Hadoop but processes data
within memory itself to reduce latencies
• Apache Storm - Used for analysis/filtering on streamed data
(rather than simply persisted datasets)
• HPCC Systems – Parallel-processing computing platform that is
flexible for cloud support
• Grid Gains – Software that is specialized for transactional and
analytical processing (which are the main uses of Big Data)
• Mesosphere DCOS – Software that consolidates resources
across a distributed system for physical and virtual applications
• Concord.IO – Used for real-time data procesing like Apache
Storm but provides added speed improvements
Global Data Traffic
20.0
32.8
72.4
109.0
168.0
2011 2013 2015 2017 2019
Exabytes of Data
Global data
traffic has
doubled in the
last two years
alone and is
forecast to
double again
by 2019
With rising
demand for
data analytics,
the global big
data market is
expected to
surpass $50B
by 2019
5. Significance of Alternative Data Sets
5
Industry Application
________________________________
Major Tasks Requiring Data
______________________________________________________________
Traditional and Alternative Data Sets
______________________________________________________________
Employee Evaluation,
Compensation and Hiring
Employee Performance Evaluation, Evaluation and
Hiring of Job Applicants, Wage Determination
Performance Data, Sales Data, Employee Survey
Data, Social Media Data, Wage, Attrition &
Revenue Analytics
Insurance
Evaluation of Financial Status of Applicant,
Calculation of Probability of Claims, Matching
Timing of Assets and Liabilities
Social Media Data, Medical Records, Wearable
Device Data, Auto Records and Driver Tracking Data
Supply Chain
Planning and Scheduling, Purchase and Inventory
Optimization, Demand Responsiveness
Real-time Inventory and Supplies Data, IoT Sensor
Data from Machinery and other Moving Equipment
Text Analytics
Customer Relationship Management, Competitive
Business Intelligence, Brand Reputation Awareness
Customer Survey Data, Social Media Data for
Individuals and Businesses
Alternative Lending
Identity Verification, Evaluation of Credit Risk,
Determination of Ideal Lending Structure and
Terms for Specific Borrowers
Social Media Data, Earnings & Spending Data,
Personal Background Data, Expected Career Path
Information
6. Emerging Uses of Alternative Data Sets
6
Industry Application
________________________________
Example Use Cases
_____________________________________________________________
Emerging/Potential Use Cases
________________________________________________________________
Employee Evaluation,
Compensation and Hiring
Visier utilizes a cloud-based platform to aggregate
employee data and provide predictive analytics
on issues such as employee attrition
Speech and image recognition to analyze qualitative
metrics such as confidence, tone of voice, posture,
and body language can help companies automate
parts of the hiring process to reduce costs
Insurance
MetroMile uses in-car hardware to monitor
driving habits and evaluate the safety of its
policyholders. Premiums are adjusted based on
driver performance and charged per mile driven
Health insurers can use data from wearables, sleep
data, and mobile data to get a more complete
understanding of a policyholder’s lifestyle and better
understand the timing of its claims
Supply Chain
Sight Machine has developed tools specifically
designed to aggregate and analyze data generated
by factory sensors, machines, cameras, PLCs, and
robots
Manufacturing equipment can be equipped with
sensors providing feedback on the quality of its own
operation as well as the employee managing it, to
optimize task allocation and performance
Text Analytics
Clarabridge uses machine learning and natural
language processing to aggregate and analyze
customer responses from surveys to better help
businesses process and utilize feedback
Text analytics can be used to evaluate the content of
social media posts, which has uses in insurance,
lending, employee evaluation & hiring and several
other areas
Alternative Lending
Earnest and SoFi use data to evaluate career
prospects, earnings and savings history to
evaluate lenders. Trustingsocial focuses on social
data to determine rates in emerging markets
Lenders can utilize social media and location data to
learn the spending locations and habits of
consumers to better evaluate credit risk based on
expenditure estimates
7. Innovative Applications of Collected Data
7
Company
____________________________
Funding
____________________________
Business Focus
_______________________________________
Innovative Use of Data
__________________________________________________________
Earnest
$24.1 million Alternative Lending
Evaluates credit risk using savings habits,
educational background, and career path in
addition to financial history and income
SoFi
$1.8 billion Alternative Lending
Sets interest rates based on future earnings
evaluated using career experience, monthly
income vs. expenses, education
Trustingsocial
Undisclosed Alternative Lending
Evaluates consumer credit risk in emerging
markets by analyzing social, web, and mobile
data using machine learning
CloverHealth
$100 million Health Insurance
Health insurer focused on analyzing patient
data to optimize preventative care measures,
increasing health outcomes and profitability
Affirm
$320 million Online Purchase Financing
Instant credit for online purchases, with
interest rates based on traditional metrics as
well as social media data
8. Applications of Previously Unobserved
Data
8
Company
____________________________
Funding
____________________________
Business Focus
_______________________________________
Innovative Use of Data
__________________________________________________________
ProducePay
Undisclosed Agricultural Lending
Collects and utilizes agricultural inventory data
to provide next-day loans to farmers, using the
produce that they ship as collateral
PlaceIQ
$27.0 million Location Data Service
Uses location-tracking data to help companies
obtain a spatial understanding of the digital
activity of consumers
MetroMile
$14.0 million Automobile Insurance
Pay-per-mile car insurance with pricing
determined using an in-car device to track
driver habits and safety
Feedzai
$26.1 million Fraud Detection
Uses Machine Learning and Behavioral Analysis
of consumer purchasing data to identify
potentially fraudulent transactions
DataWallet
$320 million Online Marketplace for Data
Helps better match the specific data needs of
companies by compensating consumers for
sharing their data
10. Lending – Simplified Process Map
Key data buckets and metrics in the current lending landscape
10
Business or Individual
Seeks Traditional Loan
Traditional Credit Analysis
• Credit score based on past
spending and borrowing
habits
• More comprehensive
reporting expectations for
businesses’ financial data
Bank or Other Lending
Institution
Analyzes Creditworthiness
• Historical spending and
income data used to
extrapolate future ability to
make contractual
payments for individuals
and businesses
Individual Seeks ‘Tech’
Loan
Aggregates Credit Data
• Existing tech-enabled
lending platforms request
a variety of financial,
career-related, and
personal data
• Data in application,
minimal monitoring
Individual Lender or
Market for ‘Tech’
Loans
Analyzes Creditworthiness
• Individual or platform
providing loan assesses
provided data
• In many cases, personal
data used to verify
creditworthiness
Feedback
Platform Performance History
• Some tech-enabled
lending platforms provide
historical data about loan
performance based on
their assigned ratings
Feedback
Write-Offs Drive Refinement
• Feedback about a lender’s
credit analysis model is
based on past losses
• Little analysis beyond
changes in reported
financials
11. Lending – New Datasets
11
Description Source of Data Merits Challenges
Social media
connectivity and
popularity
Social networks are used to hold
individuals accountable to others and
judge the responsibility of a potential
borrower - those with creditworthy
friends may be more creditworthy
Social media data
from sites like
Facebook, Twitter,
Instagram, and
others
Publicly available
data is easy to
access and
analyze
May be seen as
invasive of
personal privacy;
inferences could be
misleading
Smartphone
usage and
location data
Devices are used to analyze and
track leisure habits and spending by
location and product category which
could help determine a borrower’s
expenditures and thus,
creditworthiness
Smartphones, GPS
devices, Credit
Card spending
data
Increasing
popularity of
smartphones and
functionality
makes data
accessible
Developing usable
model based on
location and leisure
data is challenging;
could also be
regulatory
challenges
Social media
and employment
data
A better understanding of how
individuals are linked socially as well
as professionally could introduce
opportunities to link people in a
network for loans and potential
partnerships
Cross-referencing
social connectivity
data from social
media sites and
employment data
Introduces social
aspect to
business lending;
socializes,
strengthens the
incentive to repay
Regulatory
concerns; desire to
separate
professional and
social lives
Online data
about a region’s
economic
activity and cost
of living
Social media indicators of regional
employment, population, and cost of
living in a region provide immediate
indicators of job security and
expenditures of borrowers in region
Social employment
data, social media
text analytics,
credit card
companies to
determine macro
indicators
Information is
easily accessible
and provides
more immediate
regional view
Data may not be
very in-depth and
there are no
required reporting
standards
12. Insurance – Simplified Process Map
12
Property & Casualty
Applicant
Property-Linked Data
• Age, Location
• Property Condition Survey
• Owner Records
Driver-Linked Data
• Insurance records
• Make and model of car
• Primary car use reasons
Property & Casualty
Insurer
Collects Property/Driver-
Linked Data
• Historical data used to set
pricing for premiums
• Minimal thresholds
determine eligibility for
insurance coverage
Life Insurance
Applicant
RX Lookups, Personal Health
through Fluids Testing
• Disjointed data from mix of
self-reported and poorly
organized health records
• Timely reporting process
involving significant patient
input and effort
Life Insurer
Analyzes Prescription Data
• Algorithms based on
historical data used to set
premiums
• Regulations greatly restrict
the type & amount of
pricing discrepancies
Feedback
Static, Regulated Feedback
• Prescription data is only
updated when there is a
recorded visit
• No optimization of (or
immediate feedback on)
lifestyle choices
Feedback
Data is Mostly Static
• Pricing is adjusted only in
the case of an
event/accident
• Adjustments made only
after a reported incident,
lag between dangerous
behavior and adjustment
Key data buckets and metrics in the current insurance landscape
13. Insurance – New Datasets
13
Description Source of Data Merits Challenges
Social media
and text-based
analytics data
Text-based analytics of content such
as social media posts helps insurers
determine riskiness, aggression, or
other factors that could affect
insurability
Social media
websites and
applications
Assess underlying
riskiness and
aggressiveness of
all types of
policyholders
Invasive into
applicants privacy
and may produce
In-vehicle real-
time location
and performance
data
Real-time location and performance
data allows for more precise pricing
based on specific driver behaviors
and travel through especially
dangerous areas or road sections
OBD-II sensors
and eventually
manufacturer-
installed native
vehicle devices
Real-time data,
geographic
overlays allow for
precise risk
adjustments
Manufacturer-
installed devices
reduce user input
needed but raise
privacy concerns
Quantified self
data about
biological
factors
Data from wearable devices or smart
appliances, purchase histories
provide feedback about lifestyles and
allow insurers to better understand
their liability pools using predictive
analytics
Wearable devices,
IOT sensor-
equipped devices
(smart beds, etc.),
financial records
Real-time data can
help policyholders
better understand
lifestyle choices
and adjust pricing
Regulators and
users may not be
comfortable
sharing and using
personal data
Smart pills and
medicinal intake
data
Information about drug intake allows
insurers to reward patients for
sticking with prescribed medical
regimens and alert care providers
when patients deviate from these
Sensor-equipped
drug delivery units,
smart pill boxes
that track intake
Minimally intrusive
monitoring allows
insurers to reward
those who stick to
medicine regiments
Synchronizing
insurers with
prescription and
device data; data
use requires
explicit user
consent
14. Insurance – New Datasets cont’d
14
Description Source of Data Merits Challenges
Active or
passive
monitoring of
property and
environment
Data collected from sources such as
drones, satellite imaging, and
weather probes could provide
immediate feedback about the status
or risks of insured properties
Camera-equipped
drones, imaging
satellites, weather
satellites and
probes
Real-time updates
of property risks
and analysis of
potential losses
Active monitoring
with drones or
video may be
seen as overly
intrusive
Purchases and
receipt history
Data about previous purchases from
credit card receipts could be used to
validate claims for lost property and
the value of those claims
Credit card or
mobile payment
histories and
receipts
Easily verifiable
data with specific
pricing data
Must coordinate
with transaction
service
companies,
consumer privacy
15. Jobs & Hiring – Simplified Process Map
15
Internal Job Applicant
Employee Data
• Sales record
• Client relationships
• Past performance
evaluations
• Reputation amongst
colleagues
Hiring Manager
Makes Decision Based on
Proprietary Data
• Employee data is analyzed
to see if he/she is fit for
promotion
• Proprietary data allows for
more in-depth knowledge
of applicant
External Job Applicant
Personal Health Data
• Resume
• Referrals
• Body language during in-
person interview
• Performance on an
assessment (If given)
Hiring Manager
Makes Decision Based on
External Data
• Must predict applicant’s
aptitude based solely on
external data
• Riskier since applicant has
not worked there prior
Feedback
Inherently Static
• Resumes can be out-of-
date by the time applicant
is interviewed
• Referrals only glimpse into
historic performance, may
not predict future
performance
Feedback
Updated Regularly
• Employee metrics are
often updated on fixed
schedules, eg quarterly
sales numbers, mid-year
evaluations
• Some of this data is
subjective
Key data buckets and metrics in the current jobs & hiring landscape
16. Jobs & Hiring – New Datasets
16
Description Source of Data Merits Challenges
Social media
and text-based
analytics data
Text-based analytics of content such
as social media posts allows
employers to determine personality
of the applicant and whether it is
suited for the job
Social media
websites and
applications
Assess the
personality of
applicants and
determine fit
Data quality
varies
significantly by
user
Smartphone
productivity data
Smartphone data related to time
spent on different apps coupled with
general organization patterns helps
determine if an applicant will transfer
these skills or lack thereof to the job
Smartphone and
specific app usage
data
Ties into the key
functions of many
employees
Would be
considered an
invasion of
privacy without
permission
Algorithmic
Jobs Tests
Pre-employment job tests that select
candidates algorithmically based on
their responses have been shown by
NBER to result in hires that stay with
the company longer and are more
productive
Generated by the
job applicant when
they fill out the pre-
employment test
More accurate than
humans in
predicting future
tenure and
productivity of
employees
“Algorithmic
aversion” (trusting
human instincts
over computers)
Body language
and Voice
Cameras help recognize nuances in
both body movements as well as
vocal inflection, picking up on subtle
cues of the limbic system that are
more honest than the words spoken
by the applicant
Camera (via
applicant’s
computer or placed
at the site of
interview) and
software to analyze
the audio/video
Data will reveal a
lot about applicant
in a standardized
fashion
Candidates need
to be comfortable
with being
recorded,
requires specific
technology
18. Case Study: SoFi and Even
Background
Location & HQ San Francisco, CA
Funding
$1.37B in 6 Rounds
from 19 Investors
Investors
Business Description
Leading online lender and the #1 provider of
student loan refinancing with over $7 billion lent to
date
Alternative Pricing Data Application
• Uses non-traditional information including
education and employer data to look at ‘where
you are today’ and ‘where you’re headed’ and
potentially offer lower rates to students
• Offering more products to existing customers
instead of widening customer base by loosening
credit standards decreases acquisition costs &
provides SoFi a reliable history of repayment
data on borrowers
Background
Location & HQ Oakland, CA
Funding
$1.5M in 1 Round from
13 Investors
Investors
Business Description
Automatically manages your personal bank account
by making interest-free loans when pay is below
average and savings when pay is above average
Alternative Pricing Data Application
• Analyzes bank deposits to determine average
paycheck over the past 6 months
• Algorithm treats more recent paychecks with
greater weight and analyzes expenses to
determine weekly required income
• Spending and income risk analysis allows Even to
make short-term interest-free loans to make up
for lower weekly paychecks
Established student loan refinancer
Predictive data: less risky student loans,
allows for lower interest student financing
Early-stage startup with many backers
Income & spending data: low-risk interest-
free loans to smooth personal income
18
19. Case Study: ProducePay & Mighty
Background
Location & HQ Glendale, CA
Funding
Undisclosed amount: 2
rounds, 7 investors
Investors
Business Description
Provides inventory management and cash flow
solutions to farmers allowing them to receive credit
soon after shipment
Alternative Pricing Data Application
• Provides an online inventory management
platform to buyers and sellers of produce that
allows ProducePay to track farming, production,
location and inventory data
• ProducePay uses this platform to track when the
produce of a non-US farmer reaches the US and
thus arbitrages credit risk by lending to non-US
farmers against their US assets (the US-based
produce)
Mighty Background
Location & HQ New York, NY
Funding
$5.25 million Series A
Investors
Business Description
Online marketplace that enables plaintiffs to access
portion of future settlement to alleviate legal costs
Alternative Pricing Data Application
• Analyzes historical financial performance, credit
ratings, attorney’s peer review rankings, and firm
performance
• Provides enhanced perspective of an applicant
and potential settlement to reduce financing risk
• Allows plaintiffs to bring better-funded cases
against defendants, utilizing potential settlement
gains immediately
Early stage agricultural finance startup Early stage legal finance startup
Production and consumption data helps
de-risk international agricultural financing
Analysis of legal data allows for lower
risk, lower interest litigation financing
19
20. Case Study: Square & Metromile
Background
Location & HQ San Francisco, CA
Funding
Public company
NYSE:SQ
Investors
Background
Location & HQ San Francisco, CA
Funding
$14M in 2 rounds from
5 investors
Investors
Business Description
Insures vehicles by charging a base rate premium
plus a per-mile charge and monitors vehicle health
and local driving hazards using vehicle’s OBD-II port
Alternative Pricing Data Application
• Per-mile insurance plans are a new way of pricing
auto insurance, allowing drivers who use their
vehicles less to save dramatically
• Monitoring services allow Metromile to help keep
drivers safe and reduce policy outlays
• As cars are used less and shared more, flexible
pricing options like that offered by Metromile
become more important
Business Description
Offers full POS hard/software capable of credit
transactions and inventory accounting with
expansion into cash transaction services
Alternative Pricing Data Application
• Proprietary database of transaction volume from
from their POS devices used to develop
inventory and sales management software
• P2P electronic loan service Square Cash, and
and short-term business loan service Square
Capital using propriety database to manage risk
risk
• Charges a percentage of amount transacted
across all services and products offered
Public transaction services company Early stage auto-insurance company
Proprietary transaction database reduces
risk of making short-term business loans
Per-mile plans and vehicle monitoring
make insurance flexible and preventative
20
22. Incumbents vs. Startups
22
Discussion
Target Markets • Incumbents may be less concerned with new startups and more concerned with existing
competitors adopting new technologies
• Startups will tend to target new consumers or specific niches of bigger industries
• Competitive landscapes may be able to support both incumbents and startups if there isn’t
much direct competition
• However, consolidation through mergers and startup acquisitions may make the industry
competitive
Network Effects • Incumbents can leverage large existing customer bases
• Startups can develop new product features with explicit goal of achieving network effects,
perhaps by trying to ‘own’ the customer by providing several additional services
• Networked markets demand high invested capital and create winner-takes-all marketplace
Ease of Integration • Incumbent’s customers may be unwilling to re-define how they engage with company
• Startups can explicitly develop products to ease data collection and customer use and appeal to
the millennial generation
• Ease of collection critical for generating robust, unbiased datasets
Private Data
Security • Incumbents already trusted with personal data and many have established security systems
• Startups may struggle with high fixed costs to implement security measures
• Crucial for brand image to be associated with data security
23. Key Determinants of Success - Startups
23
Description Merits Challenges
Novel Data Must utilize data that was either
previously unobservable and is valuable
in analysis or data that was previously
observable and valuable, but unused
Utilizing new datasets can
provide more accurate
risk measurement, that
can translate to lower
rates for customers
Identifying useful data is
difficult and it is costly to
develop analysis tools
with new insights
Customer
Ownership
Providing additional services, creating
high switching costs will help startups
retain customers and fully utilize
customer acquisition expenses
Retaining customers
builds large network of
data, optimizes acquisition
costs
Building additional
products costly, switching
costs reduce customer
satisfaction
Competitive Pricing
Capability
Startups can leverage new datasets to
provide similar services to incumbents
at reduced rates
Startups can capture
market share from
incumbents through lower
pricing
If replicable, creates race
to the bottom and
continually decreasing
prices over time
24. Key Determinants of Success - Incumbents
24
Description Merits Challenges
Switching Costs Incumbents with a large customer
bases may find it more economical to
develop switching costs than to develop
or acquire a products to compete with
new entrants
More economical than
developing or acquiring
new product or service
Reduces customer
satisfaction, fewer
customer acquisitions
than new products
Internal R&D
Capabilities & Cost
Ability to integrate new datasets with
existing products & customers reduces
development and integration risks
associated with M&A
Using existing resources
requires less capital
investment
Internal development may
not necessarily succeed,
opportunity cost of not
spending more on existing
segments of the business
Acquisitions Purchasing other companies is an easy
and popular way for incumbents to
achieve novel data gathering and
analysis capabilities
Foregoes the risk of
experimental internal
development not
succeeding
Expensive, integration
issues, regulatory hurdles
26. New Entrants
26
Description Funding Background
Blue Shift Re-imagining how businesses engage users
to make them frequent customers,
automating segment-of-one marketing
Raised $10.6M in 2
rounds from 4 investors
backed by NEA, Nexus,
Great Oaks
Silicon Valley, CA
Founded in 2014
CEO: Mehul Shah
Node.io Using online data to understand relationships
between people, companies, and keywords
Raised $8.3M in 2
rounds, investors
include NEA, Avalon,
Canaan Partners
San Francisco, CA
Still in stealth mode
CEO: Falon Fatimi
Tamr Enterprise data unification software that
integrates data for business analytics
Raised $41.2M in 4
rounds from 7 investors
backed by Google
Ventures and NEA
Cambridge, MA
Founded in 2013
CEO: Andy Palmer
FiveTran Zero-configuration data integration: data
connector for extracting value from diverse
cloud & database sources and loading it into
Amazon Redshift data warehouse
Raised an undisclosed
amount in 2 rounds
from 2 investors from
Y Combinator
San Francisco, CA
Founded in 2012
CEO: Taylor Brown
27. New Entrants
27
Description Funding Background
DataHero Cloud-based service collects data from
disparate sources and presents an easy-to-
use dashboard for professionals with a range
of backgrounds and expertise
Raised $10.3M in 3
rounds from 7 investors
backed by Foundry
Group
San Francisco, CA
Founded in 2011
Acquired in 2016
By Cloudability
Kyvos Insights Developed online analytical processing
software for interactive, multidimensional
analysis on structured and unstructured
Hadoop data
Raised undisclosed
amount from
undisclosed investors
San Jose, CA
Founded in 2012,
exited stealth mode in
June, 2015
ThoughtSpot Providing users with access to range of data
analytics using simple search interface
Raised $40.7M in 2
rounds from 6 investors
backed by Lightspeed,
Khosla
Palo Alto, CA
Founded in 2012
CEO: Ajeet Singh
Arcadia Data Visual analytics software that overcomes
traditional challenges with Hadoop data by
using Hadoop as operating system
Raisd $11.5M in 1
round form 3 investors
backed by Intel,
Mayfield, and Blumberg
San Mateo, CA
Founded in 2012
CEO: Sushil Thomas
28. New Entrants
28
Description Funding Background
Interana Events-based software analyzes streaming
data to understand customers and product
usage
Raised $28.2M in 2
rounds from 8 investors
backed by Index,
Battery Ventures
Redwood City, CA
Founded in 2013
CEO: Ann Johnson
Looker Saas company providing embeddable
analytics software that unifies data form
multiple sources
Raised $96M in 4
rounds from 6 investors
backed by Kleiner
Perkins, First Round
Santa Cruz, CA
Founded in 2011
CEO: Frank Bien
AtScale Software allows commonly used business
intelligence tools to access data in Hadoop
clusters
Raised $9M in 2 rounds
from 4 investors
backed by XSeed,
UMC, Storm, AME
Cloud
San Mateo, CA
Founded in 2013
CEO: Dave Mariani
Confluent Technology and services to help companies
adopt Apache Kafka, critical and highly
scalable tool for analyzing high-volume
streaming data
Raised $30.9M in 2
rounds from 4 investors
backed by Index,
Benchmark
Mountain View, CA
Founded in 2014
CEO: Jay Kreps