1. FROM “BIG DATA” TO
DATAWARE
SIM Technology Leadership Summit
May 20, 2015
2. MADRONA OVERVIEW
• Madrona is a leading venture capital firm focused on sourcing and
growing early-stage technology companies in the Pacific Northwest
• About $1 billion under management across five funds
–Investors include the University of Washington, University of Virginia, Irvine
Foundation, University of North Carolina, and strategic individuals
• Investments made in over 100 companies the past 20 years with over 50
active portfolio companies and over 40 positive exits
• Madrona team
–7 Managing Directors
–Strategic Directors and Venture Partners include: Sujal Patel, Steve Singh,
John McAdam, Prof. Oren Etzioni, and Prof. Dan Weld
3. THE PNW TECH ECOSYSTEM IS STRONG AND
GROWING
Anchor Tenants
Large Tech Satellite Offices
Mid-Cap Tech with Seattle HQ
World-Class Research
4. OUR FUTURE
1995 TODAY (2015) 2035
COMMUNICATION Snail mail, fax, early email
SMS, Facebook, Skype,
Snapchat & Twitter
Virtual Reality Rooms
DEVICES Desktop PCs Smart Mobile Devices
Embedded on you &
everything else (IoT)
SOFTWARE/
DATAWARE
Packaged/Licensed SaaS subscription/Apps Intelligent apps
INTERNET/
CONNECTIVITY
Dial up modem 56k
“Ubiquitous” broadband 100
Mbps to mobile
“Always On” and IoT
COMPUTE/STORAGE
Pentium processor 100 MIPS
Single-core
~$1 million/TB
Intel Xeon E7 processor –
4000 MIPS
Multi-core
$59/TB
$5/Petabyte
INFRASTRUCTURE Internet & Dedicated servers Cloud Real-time hybrid marketplace
COMMERCE 1 book/10 days/$5 delivery
Anything 2 days free; 50,000
items in 2 hours free delivery
Drones or autonomous car
delivery & 3D printed
5. WHAT IS “DATAWARE”?
A framework for describing the combination of data, software, math
formulas and “predictive” analytics that help data savvy teams turn
information and insights into profitable actions.
5
Why Now?
• Cloud Enablement: “Cloud” abstracts hardware into software and enables
unprecedented elasticity, scale and speed
• Big Data: The volume, velocity and variety of data types and stores has expanded
rapidly while the value of retaining/leveraging data often exceeds the cost
• Legacy “Datastores”: Highly structured and constrained systems (databases, data
warehouses, BI tools) that are too rigid to unlock data’s full value yet too ubiquitous
and important to NOT leverage
• Emerging Solutions: A combination of point solutions, systematic approaches and
“vertical” services emerging to leverage these trends in an agile manner. These
solutions require a structured framework to prioritize market opportunities
7. MADRONA DATAWARE FRAMEWORK
7
INTELLIGENT APPS &
SERVICES
DATA
INTELLIGENCE
ENABLING
INFRASTRUCTURE
AgileDataStack
Marc Benihoff, Founder and CEO of Salesforce.com, when asked what he
thinks is the major tech trend of the next five years responded that we are
in an “AI Spring.” Fortune Term Sheet 1/6/15
8. WHAT MAKES THE DATA “BIG”?
Value More valuable to store than throw away
8
Variety Different sources & structures create opportunities…
& challenges
Volume Easy, plentiful & cheap data to collect & store
Velocity Speed of turning data into actionable insights – batch vs.
real-time!
9. DATA INPUTS
• Legacy Databases: Highly structured, transactional focused,
generally rigid
– Databases with SQL queries (OLTP)
– Historic “Extract, Transform, Load” tools (ETL)
– Data warehouses and data cubes
– Business Intelligence (BI) and “Online Analytics Processing (OLAP)”
• “Big Data” Sources: Structure variety, high volume/velocity, agile
– “Not Only SQL” (NoSQL) data repositories
– Allow for “Extract, Load, Transform” (ELT) flexibility
– Continuous, online (streamed) data flows
– Relationship focus vs. Relational focus
9
10. Places Things
Profiles
WHERE DOES DATA & METADATA COME FROM?
People
• Consumers
• Office Workers
• Field Workers
• Citizens
• Partners
• Customers
• Home
• Work
• Stores
• Destinations
• Routes
• Individuals
• Demographics
• Devices
• Locations
• Objects
• “Campaigns”
• Biology
• “Networks"
• Devices
• Vehicles
• Machines
• Medical
• Homes
• Content
13. BIG COMPANY “LEADING INDICATORS”
• Microsoft-AzureML, Revolution Analytics, much more
• HP reorganizes software business around “Big Data”
• Salesforce.com buys RelateIQ for $390M for “data
cloud”
• Oracle builds “data cloud” team including Blue Kai and
Datalogix
• SAP promotes HANA, buys Concur
• IBM advertises Watson, Blue Mix
• AWS – AmazonML, Lambda, Kinesis
13
14. KEY QUESTIONS
• How do big, especially software-driven, companies unlock their “data
silos”?
• How will traditional databases/warehouses, newer “big data” stores and
integrated big data “lakes” compliment or compete?
• What models will emerge to capture value in “data intelligence”?
• To what extent can intelligent apps and services disrupt legacy
apps/services?
14
16. KEYS TO EMBRACING DATAWARE
1. Enabling infrastructure complex (Hadoop/Cloudera,
NoSQL/MongoDB, Spark, Legacy) & hard/expensive but getting
simplified and cheaper
2. Data Intelligence holds big promise but scarcity of “data
scientists” requires professional services (Dato, Context
Relevant, Atigeo, Palantir) and systematic, standardized
approaches from emerging companies
3. Early “App Intelligence” that is real-time and agile already exists
(ad serving, content recommendations, personalization, vertical
markets). Tremendous opportunity here to reinvent categories
4. Opportunities also exist in the data pipeline (Trifacta) and data
management, but tend to be deeper technical systems
16
17. APPLICATION INTELLIGENCE
1. What will an “application” look like in 5+ years?
2. What will make that application “intelligent”?
17
=
+
+
Apps
Algos
Data
App Intelligence
18. MADRONA DATAWARE INVESTMENTS
18
INTELLIGENT
APPS &
SERVICES
DATA
INTELLIGENCE
ENABLING
INFRASTRUCTURE
AGILEDATASTACK
YIELDEX
DATO
BOOMERANG
JOBALINE HIGHSPOTBIZIBLE
PLACED
M
A
X
P
O
I
N
T
A
P
P
T
I
O
S
E
E
Q
Q
U
M
U
L
O
C
O
N
T
E
X
T
R
E
L
E
V
A
N
T
ALGORITHMIA
IGNEOUS
I
C
E
B
R
G
E
X
T
R
A
H
O
P
Fund III Fund IV Fund V
20. Dataware Case Study: Apptio
20
Category: “Full Stack”
Focus: Data-driven enterprise SAAS for CIO & team to run the
business of IT (TBM)
Revenue: $100M+
Lineage: Startups, HP, IBM/rational
Keys: • Combine legacy General Ledger & modern usage data to
“cost” services and share with users
• Define industry data & metadata standard – ATUM
• Deliver real-time enterprise SAAS solution
Investors: Madrona Venture Group, Greylock Partners, Shasta
Ventures, Andreessen Horowitz, T. Rowe Price
21. Dataware Case Study: Cloudera
21
Category: Enabling Infrastructure
Focus: Became the industry standard for extracting, storing and
managing a variety of data types so that they can enable
data intelligence and data-driven services to suceed
Revenue: $100M+
Lineage: Hadoop, Open Source, Google, UW
Keys: • Early player in being a diverse, indexed data store
• Helped define the “file system”, called HDFS, for
managing large-scale data stores
• Attempting to be the underlying platform for dataware
Investors: Accel Partners, Greylock Partners, Intel, T. Rowe Price
22. Dataware Case Study: Dato
22
Category: Data Intelligence
Focus: Leverage machine learning and various data types from
inspiration to insight and to build scalable, predictive and
recommendation systems
Revenue: < $10M
Lineage: UW, Carnegie Mellon
Keys: • Use S-frames to combine graph, table, text & image
data types
• Build an “end to end” data intelligence system from
prototype to production
• Deliver predictive and recommender systems as services
or stand alone applications for business customers
Investors: Madrona Venture Group, NEA, Vulcan
23. Dataware Case Study: Placed.com
23
Category: App Intelligence
Focus: Combine location database & active panel data to analyze
and optimize advertising and marketing programs
Revenue: < $10M
Lineage: Farecast, Quantcast, aQuantive
Keys: • Leverage data science to build highly accurate place
database
• Create statistically significant panels to measure
physical world impact of digital advertising
• Embed service into mobile add ecosystem to deliver
actionable insights
Investors: Madrona Venture Group, Two Sigma
24. Dataware Case Study: Trifacta
24
Category: Continuous Data Pipeline
Focus: Automate the process of cleaning, normalizing and
preparing data for “Data Intelligence” use cases
Revenue: Unknown
Lineage: Stanford (Jeff Herr), Cal (Joe Hellerstein)
Keys: • Focus on core “Data Wrangling” problem
• Use machine learning to recognize patterns & suggest
automated fixes
• Simple visualization/UI
Investors: Greylock Partners, Accel Partners, Ignition Partners