Hadoop’s Impact on the
Future of Data Management
Introduction of the Enterprise Data Hub

1

©2014 Cloudera, Inc. All rights reserved.
Expanding Data Requires A New Approach
1980s

Now

Bring Data to Compute

Bring Compute to Data
Compute

Compute

Data

Compute

• Structured data mainly
• Internal data only
• “Important” data only

Data

Data

Process-centric
businesses use:
Compute

Data
Compute

Multi-structured,
internal & external data
of all types
Compute

Data
Relative size & complexity

2

Information-centric
businesses use all data:

©2014 Cloudera, Inc. All rights reserved.
The Old Way: Bringing Data to Compute
4

3

2

1

3

Complex Architecture
• Many special-purpose
systems
• Moving data around
• No complete views

Cost of Analytics
• Existing systems strained
• No agility
• BI backlog

Time to Data
• Up-front modeling
• Transforms slow
• Transforms lose data

EDWS

MARTS

SERVERS

DOCUMENTS

STORAGE

SEARCH

ARCHIVE

Visibility
• Leaving data behind
• Risk and compliance
• High cost of storage

ERP, CRM, RDBMS, MACHINES

FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS

©2014 Cloudera, Inc. All rights reserved.

EXTERNAL DATA SOURCES
The New Way: Bringing Compute to Data
4

3

2

1

4

Multi-workload analytic platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True BI agility

4

Self-service exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests

Data management, transform
• One source of data for all analytics
• Persisted state of transformed data
• Significantly faster & cheaper

3

2

SERVERS

MARTS

EDWS

DOCUMENTS

STORAGE SEARCH

1

ARCHIVE

1

Active archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage

ERP, CRM, RDBMS, MACHINES

FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS

©2014 Cloudera, Inc. All rights reserved.

ESTERNAL DATA SOURCES
EDH for Public Sector

APACHE
HADOOP™

5
EDH for Public Sector
Improve Data
Visibility and Analysis

Ensure Compliance
and Security
Maximize
Infrastructure &
Human Capital
6

APACHE
HADOOP™
EDH for Public Sector
Improve Data
Visibility and Analysis

Ensure Compliance
and Security
Maximize
Infrastructure &
Human Capital
7

Too many, too much, too
diverse, too rigid, too rapid
Known unknowns
and unknown
unknowns
EDH for Public Sector
Improve Data
Visibility and Analysis

Ensure Compliance
and Security
Maximize
Infrastructure &
Human Capital
8

Too constrained, too slow, too
complicated, too unclear

Who, what, where,
when, and how
EDH for Public Sector
Improve Data
Visibility and Analysis

Ensure Compliance
and Security
Maximize
Infrastructure &
Human Capital
9

Too costly, too valuable, too
complicated, too disruptive

Familiar, consistent,
flexible, open
Cloudera’s Enterprise Data Hub
Integration with Over 200 ISVs
• Self-Service BI
• Data Exploration
• Visualization

Flexible Deployment Options
• On-Premise or Cloud
• Appliances
• Engineered Systems

Powerful Security Solution
• Risk Analysis
• Fraud Prevention
• Compliance

Infinite Analytic Storage
• Multi-Structured Data
• In-place Analytics
• Active Archive

Advanced Analytics Engine
• 360° Customer View
• Recommendation Engines
• Processing & Analytics

Improve IT Operations
• ETL Acceleration
• EDW Rationalization
• Mainframe Offload

10
‹#›
11

©2014 Cloudera, Inc. All rights reserved.
12

©2014 Cloudera, Inc. All rights reserved.

Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management

  • 1.
    Hadoop’s Impact onthe Future of Data Management Introduction of the Enterprise Data Hub 1 ©2014 Cloudera, Inc. All rights reserved.
  • 2.
    Expanding Data RequiresA New Approach 1980s Now Bring Data to Compute Bring Compute to Data Compute Compute Data Compute • Structured data mainly • Internal data only • “Important” data only Data Data Process-centric businesses use: Compute Data Compute Multi-structured, internal & external data of all types Compute Data Relative size & complexity 2 Information-centric businesses use all data: ©2014 Cloudera, Inc. All rights reserved.
  • 3.
    The Old Way:Bringing Data to Compute 4 3 2 1 3 Complex Architecture • Many special-purpose systems • Moving data around • No complete views Cost of Analytics • Existing systems strained • No agility • BI backlog Time to Data • Up-front modeling • Transforms slow • Transforms lose data EDWS MARTS SERVERS DOCUMENTS STORAGE SEARCH ARCHIVE Visibility • Leaving data behind • Risk and compliance • High cost of storage ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ©2014 Cloudera, Inc. All rights reserved. EXTERNAL DATA SOURCES
  • 4.
    The New Way:Bringing Compute to Data 4 3 2 1 4 Multi-workload analytic platform • Bring applications to data • Combine different workloads on common data (i.e. SQL + Search) • True BI agility 4 Self-service exploratory BI • Simple search + BI tools • “Schema on read” agility • Reduce BI user backlog requests Data management, transform • One source of data for all analytics • Persisted state of transformed data • Significantly faster & cheaper 3 2 SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH 1 ARCHIVE 1 Active archive • Full fidelity original data • Indefinite time, any source • Lowest cost storage ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS ©2014 Cloudera, Inc. All rights reserved. ESTERNAL DATA SOURCES
  • 5.
    EDH for PublicSector APACHE HADOOP™ 5
  • 6.
    EDH for PublicSector Improve Data Visibility and Analysis Ensure Compliance and Security Maximize Infrastructure & Human Capital 6 APACHE HADOOP™
  • 7.
    EDH for PublicSector Improve Data Visibility and Analysis Ensure Compliance and Security Maximize Infrastructure & Human Capital 7 Too many, too much, too diverse, too rigid, too rapid Known unknowns and unknown unknowns
  • 8.
    EDH for PublicSector Improve Data Visibility and Analysis Ensure Compliance and Security Maximize Infrastructure & Human Capital 8 Too constrained, too slow, too complicated, too unclear Who, what, where, when, and how
  • 9.
    EDH for PublicSector Improve Data Visibility and Analysis Ensure Compliance and Security Maximize Infrastructure & Human Capital 9 Too costly, too valuable, too complicated, too disruptive Familiar, consistent, flexible, open
  • 10.
    Cloudera’s Enterprise DataHub Integration with Over 200 ISVs • Self-Service BI • Data Exploration • Visualization Flexible Deployment Options • On-Premise or Cloud • Appliances • Engineered Systems Powerful Security Solution • Risk Analysis • Fraud Prevention • Compliance Infinite Analytic Storage • Multi-Structured Data • In-place Analytics • Active Archive Advanced Analytics Engine • 360° Customer View • Recommendation Engines • Processing & Analytics Improve IT Operations • ETL Acceleration • EDW Rationalization • Mainframe Offload 10
  • 11.
  • 12.
    12 ©2014 Cloudera, Inc.All rights reserved.