The Future of Data Management: The Enterprise Data Hub

The Future of Data Management:
The Enterprise Data Hub
Dr. Amr Awadallah (@awadallah)
Cofounder & Chief Technology Officer
Cloudera, Inc.
1 ©2014 Cloudera, Inc. All rights reserved.

©2014 Cloudera, Inc. All rights reserved.
Cloudera Snapshot
2
Founded 2008, by former employees of
Employees Today ~ 700
World Class Support 24x7 Global Staff
Pro-active & Predictive Support Programs
Mission Critical Thousands of Enterprise Users
Over 400 Paying Subscription Customers
The Largest Ecosystem Over 1000+ Partners
Cloudera University Over 100,000+ Trained
Open Source Leaders Cloudera Employees are Leading Developers & Contributors
Total Capital Raised A lot! (from Intel, Google, Dell, T. Rowe Price, Accel, Greylock)
Mission Help Organizations Leverage the Power of
All Their Data to Ask Bigger Questions.

Why is Big Data Happening Now?
©2014 Cloudera, Inc. 3 All rights reserved.

10TB to 10PB
IT’S ALL
(BIG)
DATA
(NOT)

MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
CONSUMER
PACKAGED GOODS
Sentiment
analysis of
what’s hot,
customer service
TRAVEL & TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer
sentiment
RETAIL
Consumer sentiment
Optimized
marketing
EDUCATION
& RESEARCH
Experiment
sensor
analysis
LIFE SCIENCES
Clinical trials
Genomics
AUTOMOTIVE
Auto sensors
reporting location,
problems
COMMUNICATIONS
Location-based
advertising
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
UTILITIES
Smart Meter
analysis for
network
capacity
OIL & GAS
Drilling
exploration
sensor
analysis
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
It Isn’t Just About Web 2.0 / Social
©2014 Cloudera, 5 Inc. All rights reserved.

Customer Success Across Industries
Financial &
Business Services
Telecom &
Technology
Healthcare &
Life Sciences
Media &
Information
Retail &
Consumer
Energy &
Public Sector

Expanding Data Requires A New Approach
7
What we do
Copy Data to Applications
What we should do
Bring Applications to Data
Data
Information-centric
businesses use all Data:
Multi-structured,
Internal & external data
of all types
App
App
App
Process-centric
businesses use:
• Structured data mainly
• Internal data only
• “Important” data only
• Multiple copies of data
App
App
App
Data
Data
Data
Data

The Power of the Enterprise Data Hub is …
8
THE OLD WAY EDH

Hadoop Changes the Game:
Storage and Compute on One Platform
The Old Way The Hadoop Way
Network
Expensive, Special purpose, “Reliable” Servers
Expensive Licensed Software
• Hard to scale
• Network is a bottleneck
• Only handles relational data
• Difficult to add new fields & data types
Expensive & Unattainable
$30,000+ per TB
Data Storage
(SAN, NAS)
Compute
(RDBMS, EDW)
Commodity “Unreliable” Servers
Hybrid Open Source Software
• Scales out forever
• No bottlenecks
• Easy to ingest any data
• Agile data access
Affordable & Attainable
$300-$1,000 per TB
Compute
(CPU)
Memory Storage
(Disk)
z
z

The Old Way: Bringing Data to Compute
3
2
10
Complex Architecture
• Many special-purpose systems
• Moving data around
• No complete views
Cost of Analytics
• Existing systems strained
• No agility
• “BI backlog”
Time to Data
• Up-front modeling
• Transforms slow
• Transforms lose data
Missing Data
• Leaving data behind
• Risk and compliance
• High cost of storage
4
1
EDWS MARTS SERVERS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES

The New Way: Bringing Applications to Data
2
11
SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ESTERNAL DATA SOURCES
Diverse Analytic Platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True analytic agility
4
1
3 4
Active Compliance Archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage
1
Persistent Staging
• One source of data for all analytics
• Persist state of transformed data
• Significantly faster & cheaper
2
Self-Service Exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests
3

Core Benefits of the Enterprise Data Hub (EDH)
• Full-Fidelity Active Compliance Archive
• Accelerate Time to Insight (Scale)
• Unlock Agility and Innovation
• Consolidate Silos for 360o View
• Enable Converged Analytics
12

A Look Inside The Enterprise Data Hub
CLOUDERA’S ENTERPRISE DATA HUB
13
Open Source,
Scalable,
Flexible, and
Cost-Effective
✔
Unified and
Managed ✖
✔
✔
✔
Open
Architecture ✖
Secure and
Governed ✖
3RD PARTY
APPS
(Many)
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE (Sentry, Gazzang, Rhino)
BATCH
PROCESSING
(MR, Hive, Pig)
INTERACTIVE
SQL
(Impala)
SEARCH
ENGINE
(SOLR)
MACHINE
LEARNING
(SPARK)
STREAM
PROCESSING
(SPARK)
WORKLOAD MANAGEMENT (YARN)
FILESYSTEM
(HDFS)
ONLINE NOSQL
(HBASE)
DATA
MANAGEMENT
(Navigator)
SYSTEM
MANAGEMENT
(Cloudera Manager)
DATA COLLECTION (Flume, Sqoop, NFS)

Enabling The App Store of Big Data
BI and Analytics
Partners
SI, Cloud, MSP
Partners
Database
Partners
Resellers
Data Integration
Partners
Hardware
Partners

2014 Gartner MQ for Data Warehouse DBMS
“A data warehouse DBMS is now expected
to coordinate data virtualization strategies,
and distributed file and/or processing
approaches, to address changes in data
management and access requirements.”

The Modern Information Architecture
Data Architects System Operators Engineers Data Scientists Analysts Business Users
BI / ANALYTICS REPORTING MACHINE
LEARNING
ENTERPRISE
ENTERPRISE DATA
WAREHOUSE
ONLINE SERVING
SYSTEM
WEB/MOBILE APPLICATIONS
CONVERGED
APPLICATIONS
CLOUDERA
MANAGER
META DATA /
ETL TOOLS
ENTERPRISE DATA HUB
©2014 Cloudera, Inc. All Rights Reserved.
Customers & End Users
SYS LOGS WEB LOGS FILES RDBMS
16

A High Level View of the Journey
Data
Science
Agile
Exploration
Operational Efficiency
(Faster, Bigger, Cheaper)
ETL
Acceleration
Transformative Applications
(New Business Value)
Cheap
Storage
EDW
Optimization
Converged
Analytics
IT Business

Core Benefits of the Enterprise Data Hub (EDH)
• Full-Fidelity Active Compliance Archive
• Accelerate Time to Insight (Scale)
• Unlock Agility and Innovation
• Consolidate Silos for 360o View
• Enable Converged Analytics

Thank You!
Amr Awadallah (@awadallah)

The Future of Data Management: The Enterprise Data Hub

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Future of Data Management: The Enterprise Data Hub

Similar to The Future of Data Management: The Enterprise Data Hub (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

The Future of Data Management: The Enterprise Data Hub

Editor's Notes