More Related Content Similar to The Future of Data Management: The Enterprise Data Hub (20) More from Cloudera, Inc. (20) The Future of Data Management: The Enterprise Data Hub1. The Future of Data Management:
The Enterprise Data Hub
Dr. Amr Awadallah (@awadallah)
Cofounder & Chief Technology Officer
Cloudera, Inc.
1 ©2014 Cloudera, Inc. All rights reserved.
2. ©2014 Cloudera, Inc. All rights reserved.
Cloudera Snapshot
2
Founded 2008, by former employees of
Employees Today ~ 700
World Class Support 24x7 Global Staff
Pro-active & Predictive Support Programs
Mission Critical Thousands of Enterprise Users
Over 400 Paying Subscription Customers
The Largest Ecosystem Over 1000+ Partners
Cloudera University Over 100,000+ Trained
Open Source Leaders Cloudera Employees are Leading Developers & Contributors
Total Capital Raised A lot! (from Intel, Google, Dell, T. Rowe Price, Accel, Greylock)
Mission Help Organizations Leverage the Power of
All Their Data to Ask Bigger Questions.
3. Why is Big Data Happening Now?
©2014 Cloudera, Inc. 3 All rights reserved.
4. 10TB to 10PB
IT’S ALL
(BIG)
DATA
(NOT)
©2014 Cloudera, Inc. 4 All rights reserved.
5. MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
CONSUMER
PACKAGED GOODS
Sentiment
analysis of
what’s hot,
customer service
TRAVEL & TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer
sentiment
RETAIL
Consumer sentiment
Optimized
marketing
EDUCATION
& RESEARCH
Experiment
sensor
analysis
LIFE SCIENCES
Clinical trials
Genomics
AUTOMOTIVE
Auto sensors
reporting location,
problems
COMMUNICATIONS
Location-based
advertising
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
UTILITIES
Smart Meter
analysis for
network
capacity
OIL & GAS
Drilling
exploration
sensor
analysis
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
It Isn’t Just About Web 2.0 / Social
©2014 Cloudera, 5 Inc. All rights reserved.
6. Customer Success Across Industries
Financial &
Business Services
Telecom &
Technology
Healthcare &
Life Sciences
Media &
Information
Retail &
Consumer
Energy &
Public Sector
7. Expanding Data Requires A New Approach
©2014 Cloudera, Inc. All rights reserved.
7
What we do
Copy Data to Applications
What we should do
Bring Applications to Data
Data
Information-centric
businesses use all Data:
Multi-structured,
Internal & external data
of all types
App
App
App
Process-centric
businesses use:
• Structured data mainly
• Internal data only
• “Important” data only
• Multiple copies of data
App
App
App
Data
Data
Data
Data
8. The Power of the Enterprise Data Hub is …
©2014 Cloudera, Inc. All rights reserved.
8
THE OLD WAY EDH
9. Hadoop Changes the Game:
Storage and Compute on One Platform
The Old Way The Hadoop Way
Network
Expensive, Special purpose, “Reliable” Servers
Expensive Licensed Software
• Hard to scale
• Network is a bottleneck
• Only handles relational data
• Difficult to add new fields & data types
Expensive & Unattainable
$30,000+ per TB
Data Storage
(SAN, NAS)
Compute
(RDBMS, EDW)
Commodity “Unreliable” Servers
Hybrid Open Source Software
• Scales out forever
• No bottlenecks
• Easy to ingest any data
• Agile data access
Affordable & Attainable
$300-$1,000 per TB
Compute
(CPU)
Memory Storage
(Disk)
z
z
9 ©2014 Cloudera, Inc. All rights reserved.
10. The Old Way: Bringing Data to Compute
©2014 Cloudera, Inc. All rights reserved.
3
2
10
Complex Architecture
• Many special-purpose systems
• Moving data around
• No complete views
Cost of Analytics
• Existing systems strained
• No agility
• “BI backlog”
Time to Data
• Up-front modeling
• Transforms slow
• Transforms lose data
Missing Data
• Leaving data behind
• Risk and compliance
• High cost of storage
4
1
EDWS MARTS SERVERS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
11. The New Way: Bringing Applications to Data
2
©2014 Cloudera, Inc. All rights reserved.
11
SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE
ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ESTERNAL DATA SOURCES
Diverse Analytic Platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True analytic agility
4
1
3 4
Active Compliance Archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage
1
Persistent Staging
• One source of data for all analytics
• Persist state of transformed data
• Significantly faster & cheaper
2
Self-Service Exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests
3
12. Core Benefits of the Enterprise Data Hub (EDH)
• Full-Fidelity Active Compliance Archive
• Accelerate Time to Insight (Scale)
• Unlock Agility and Innovation
• Consolidate Silos for 360o View
• Enable Converged Analytics
©2014 Cloudera, Inc. All rights reserved.
12
13. A Look Inside The Enterprise Data Hub
CLOUDERA’S ENTERPRISE DATA HUB
©2014 Cloudera, Inc. All rights reserved.
13
Open Source,
Scalable,
Flexible, and
Cost-Effective
✔
Unified and
Managed ✖
✔
✔
✔
Open
Architecture ✖
Secure and
Governed ✖
3RD PARTY
APPS
(Many)
STORAGE FOR ANY TYPE OF DATA
UNIFIED, ELASTIC, RESILIENT, SECURE (Sentry, Gazzang, Rhino)
BATCH
PROCESSING
(MR, Hive, Pig)
INTERACTIVE
SQL
(Impala)
SEARCH
ENGINE
(SOLR)
MACHINE
LEARNING
(SPARK)
STREAM
PROCESSING
(SPARK)
WORKLOAD MANAGEMENT (YARN)
FILESYSTEM
(HDFS)
ONLINE NOSQL
(HBASE)
DATA
MANAGEMENT
(Navigator)
SYSTEM
MANAGEMENT
(Cloudera Manager)
DATA COLLECTION (Flume, Sqoop, NFS)
14. Enabling The App Store of Big Data
BI and Analytics
Partners
SI, Cloud, MSP
Partners
Database
Partners
Resellers
Data Integration
Partners
Hardware
Partners
©2014 Cloudera, Inc. 14 All rights reserved.
15. 2014 Gartner MQ for Data Warehouse DBMS
“A data warehouse DBMS is now expected
to coordinate data virtualization strategies,
and distributed file and/or processing
approaches, to address changes in data
management and access requirements.”
©2014 Cloudera, Inc. 15 All rights reserved.
16. The Modern Information Architecture
Data Architects System Operators Engineers Data Scientists Analysts Business Users
BI / ANALYTICS REPORTING MACHINE
LEARNING
ENTERPRISE
ENTERPRISE DATA
WAREHOUSE
ONLINE SERVING
SYSTEM
WEB/MOBILE APPLICATIONS
CONVERGED
APPLICATIONS
CLOUDERA
MANAGER
META DATA /
ETL TOOLS
ENTERPRISE DATA HUB
©2014 Cloudera, Inc. All Rights Reserved.
Customers & End Users
SYS LOGS WEB LOGS FILES RDBMS
16
17. A High Level View of the Journey
Data
Science
Agile
Exploration
Operational Efficiency
(Faster, Bigger, Cheaper)
ETL
Acceleration
Transformative Applications
(New Business Value)
Cheap
Storage
EDW
Optimization
Converged
Analytics
IT Business
17 ©2014 Cloudera, Inc. All rights reserved.
18. Core Benefits of the Enterprise Data Hub (EDH)
• Full-Fidelity Active Compliance Archive
• Accelerate Time to Insight (Scale)
• Unlock Agility and Innovation
• Consolidate Silos for 360o View
• Enable Converged Analytics
©2014 Cloudera, Inc. 18 All rights reserved.
Editor's Notes Cloudera has built and maintains the industry’s most extensive partner ecosystem to ensure that our customers can leverage their existing investments in tools and skills as they adopt new Hadoop technology. Our goal is to help you minimize disruption while delivering maximum value.
Over 800 partners across hardware, software and services – including the leaders in all major market segments
Continue to leverage the technologies and solution providers you’ve already invested in
Drive additional value through integrated solutions Alright, we have a partner we’d like to now bring up to the stage. It is with great pleasure that I introduce Sanjay Gojija from Intel to give you some insight on Accelerating Enterprise Big Data Success!