The	
  Future	
  of	
  Data	
  Management:	
  	
  
The	
  Enterprise	
  Data	
  Hub	
  
Clarke	
  Pa)erson|	
  Sr.	
  Director,	
  Cloudera	
  
1	
   ©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Data	
  PotenAal	
  is	
  Out	
  There	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  2	
  
An	
  Environment	
  of	
  Change	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  3	
  
ConsumpAon	
   InstrumentaAon	
  
Value	
   ExploraAon	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  4	
  
5	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  6	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  7	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  8	
  
IT’S	
  ALL	
  
(BIG)	
  
DATA	
  
10TB	
  to	
  10PB	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  9	
  
0%	
   10%	
   20%	
   30%	
   40%	
   50%	
   60%	
  
Mainframe	
  
Enterprise	
  Data	
  Warehouse	
  
Storage	
  
AnalyAc	
  Databases	
  
ETL	
  Processing	
  
What	
  Infrastructure	
  Have	
  you	
  Augmented	
  	
  
with	
  Big	
  Data	
  SoluAons?	
  
Source:	
  King	
  Research,	
  3922	
  Respondents	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  10	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
ComplicaAons	
  of	
  Status	
  Quo	
  
Structure	
   Storage	
   Network	
   Silos	
  
INGEST	
   STORE	
   EXPLORE	
  
PROCESS	
  
ANALYZE	
  
SERVE	
  
11	
  
How	
  Important	
  are	
  These	
  CapabiliAes	
  in	
  Your	
  
SelecAon	
  of	
  a	
  Big	
  Data	
  Vendor?	
  
7	
   7.5	
   8	
   8.5	
   9	
   9.5	
  
Open	
  Source	
  Socware	
  
Technically	
  Superior	
  Product	
  
Cost	
  
IntegraAon	
  with	
  Other	
  Systems	
  
Secure	
  Technology	
  
Reliable	
  /	
  Trusted	
  Vendor	
  
Flexibility	
  
Performance	
  
Scalability	
  
Source:	
  King	
  Research,	
  3922	
  Respondents	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  12	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  13	
  
What	
  are	
  the	
  Primary	
  Benefits	
  You’ve	
  Seen	
  Doing	
  
a	
  Big	
  Data	
  Product	
  with	
  an	
  EDH	
  
Source:	
  King	
  Research,	
  3922	
  Respondents	
  
10%	
   30%	
   50%	
   70%	
  
Gain	
  CompeAAve	
  Advantage	
  
Improve	
  Efficiency	
  
Increase	
  Business	
  Value	
  from	
  Data	
  
Make	
  Be)er	
  Decisions,	
  Faster	
  
Improved	
  Data	
  Processing	
  
Improved	
  Data	
  AnalyAcs	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  14	
  
15%	
   25%	
   35%	
   45%	
  
OperaAonal	
  Improvement	
  
Customer	
  Experience	
  Analysis	
  
Market	
  TargeAng	
  
Customer	
  Insights	
  
Behavioral	
  Analysis	
  
Research	
  /	
  InnovaAon	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
What	
  are	
  Your	
  Big	
  Data	
  ApplicaAons?	
  
15	
  
Source:	
  King	
  Research,	
  3922	
  Respondents	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Expanding	
  Data	
  Requires	
  A	
  New	
  Approach	
  
16	
  
Then	
  
Bring	
  Data	
  to	
  Compute	
  
Now	
  
Bring	
  Compute	
  to	
  Data	
  
Data	
  
InformaFon-­‐centric	
  
businesses	
  use	
  all	
  Data:	
  
	
  	
  
MulF-­‐structured,	
  	
  
Internal	
  &	
  external	
  data	
  	
  
of	
  all	
  types	
  
Compute	
  
Compute	
  
Compute	
  
Process-­‐centric	
  	
  
businesses	
  use:	
  
	
  
• Structured	
  data	
  mainly	
  
• Internal	
  data	
  only	
  
• “Important”	
  data	
  only	
  
	
  
	
  
Compute	
  
Compute	
  
Compute	
  
Data	
  
Data	
  
Data	
  
Data	
  
Hadoop	
  Changes	
  the	
  Game:	
  	
  
Storage	
  and	
  Compute	
  on	
  One	
  Plalorm	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  17	
  
The	
  Hadoop	
  Way	
  The	
  Old	
  Way	
  
$30,000+	
  per	
  TB	
  
Expensive	
  &	
  UnaWainable	
  
•  Hard	
  to	
  scale	
  
•  Network	
  is	
  a	
  bo)leneck	
  
•  Only	
  handles	
  relaAonal	
  data	
  
•  Difficult	
  to	
  add	
  new	
  fields	
  &	
  data	
  types	
  
Expensive,	
  Special	
  purpose,	
  “Reliable”	
  Servers	
  
Expensive	
  Licensed	
  So[ware	
  
Network	
  
Data	
  Storage	
  
(SAN,	
  NAS)	
  
Compute	
  
(RDBMS,	
  EDW)	
  
$300-­‐$1,000	
  per	
  TB	
  
Affordable	
  &	
  AWainable	
  
•  Scales	
  out	
  forever	
  
•  No	
  bo)lenecks	
  
•  Easy	
  to	
  ingest	
  any	
  data	
  
•  Agile	
  data	
  access	
  
Commodity	
  “Unreliable”	
  Servers	
  
Hybrid	
  Open	
  Source	
  So[ware	
  
Compute	
  
(CPU)	
  
Memory	
   Storage	
  
(Disk)	
  
z	
  
z	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  18	
  
The	
  Old	
  Way	
  
Expensive	
  &	
  UnaWainable	
  
The	
  Hadoop	
  Way	
  
Affordable	
  &	
  AWainable	
  
Hadoop	
  Changes	
  the	
  Game:	
  	
  
Storage	
  and	
  Compute	
  on	
  One	
  Plalorm	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
The	
  Old	
  Way:	
  Bringing	
  Data	
  to	
  Compute	
  
19	
  
Complex	
  Architecture	
  
•  Many	
  special-­‐purpose	
  systems	
  
•  Moving	
  data	
  around	
  
•  No	
  complete	
  views	
  
Missing	
  Data	
  
•  Leaving	
  data	
  behind	
  
•  Risk	
  and	
  compliance	
  
•  High	
  cost	
  of	
  storage	
  
Time	
  to	
  Data	
  
•  Up-­‐front	
  modeling	
  
•  Transforms	
  slow	
  
•  Transforms	
  lose	
  data	
  
Cost	
  of	
  AnalyFcs	
  
•  ExisAng	
  systems	
  strained	
  
•  No	
  agility	
  
•  “BI	
  backlog”	
  
4	
  
1	
  
2	
  
3	
  
SERVERS	
  MARTS	
  EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   EXTERNAL	
  DATA	
  SOURCES	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
The	
  New	
  Way:	
  Bringing	
  Compute	
  to	
  Data	
  
20	
  
SERVERS	
   MARTS	
   EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   ESTERNAL	
  DATA	
  SOURCES	
  
Diverse	
  AnalyFc	
  Pla]orm	
  
•  Bring	
  applicaAons	
  to	
  data	
  
•  Combine	
  different	
  workloads	
  on	
  	
  
common	
  data	
  (i.e.	
  SQL	
  +	
  Search)	
  
•  True	
  analy*c	
  agility	
  
4	
  
1	
  
2	
  
3	
   4	
  
AcFve	
  Compliance	
  Archive	
  
•  Full	
  fidelity	
  original	
  data	
  
•  Indefinite	
  Ame,	
  any	
  source	
  
•  Lowest	
  cost	
  storage	
  
1	
  
Persistent	
  Staging	
  
•  One	
  source	
  of	
  data	
  for	
  all	
  analyAcs	
  
•  Persist	
  state	
  of	
  transformed	
  data	
  
•  Significantly	
  faster	
  &	
  cheaper	
  
2	
  
Self-­‐Service	
  Exploratory	
  BI	
  
•  Simple	
  search	
  +	
  BI	
  tools	
  
•  “Schema	
  on	
  read”	
  agility	
  
•  Reduce	
  BI	
  user	
  backlog	
  requests	
  
3	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Hadoop	
  and	
  The	
  Enterprise	
  Data	
  Hub	
  
21	
  
Open	
  Source	
  
Scalable	
  
Flexible	
  
Cost-­‐EffecFve	
  
✔	
  
Managed	
  
✖	
  
Open	
  
Architecture	
   ✖	
  
Secure	
  and	
  
Governed	
   ✖	
  
✔	
  
✔	
  
✔	
  
3RD	
  PARTY	
  
APPS	
  
STORAGE	
  FOR	
  ANY	
  TYPE	
  OF	
  DATA	
  
UNIFIED,	
  ELASTIC,	
  RESILIENT,	
  SECURE	
  
	
  
	
  
	
  
	
  
	
  
CLOUDERA’S	
  ENTERPRISE	
  DATA	
  HUB	
  
BATCH	
  
PROCESSING	
  
ANALYTIC	
  
SQL	
  
SEARCH	
  
ENGINE	
  
MACHINE	
  
LEARNING	
  
STREAM	
  
PROCESSING	
  
WORKLOAD	
  MANAGEMENT	
  
FILESYSTEM	
   ONLINE	
  NOSQL	
  
DATA	
  
MANAGEMENT	
  
SYSTEM	
  
MANAGEMENT	
  
,	
  SECURE	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
The	
  Power	
  of	
  the	
  EDH	
  
22	
  
THE	
  OLD	
  WAY	
   EDH	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
TransformaAve	
  ApplicaAons	
  Drive	
  Revenue	
  
23	
  
5%	
   15%	
   25%	
   35%	
   45%	
  
Research	
  /	
  innovaAon	
  
Behavioral	
  analysis	
  
Customer	
  insights	
  
MarkeAng	
  targeAng	
  /	
  
Customer	
  experience	
  
OperaAons	
  improvement	
  
Fraud	
  prevenAon	
  and	
  
Pricing	
  analyAcs	
  and	
  choice	
  
Risk	
  Modeling	
  /	
  
Network	
  monitoring	
  
Service	
  quality	
  
Customer	
  lifecycle	
  
Capacity	
  forecasAng	
  
Inventory	
  management	
  
eDiscovery	
  /	
  document	
  
What	
  are	
  your	
  	
  
Big	
  Data	
  ApplicaAons?	
  
Source:	
  King	
  Research	
  survey,	
  September	
  2013,	
  3,922	
  Respondents	
  
So	
  How	
  Do	
  We	
  Get	
  There?	
  
24	
   ©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
The	
  Typical	
  Enterprise	
  Data	
  AnalyAcs	
  Stack	
  
Business	
  Intelligence	
  /	
  ApplicaFons	
  
RDBMS	
  
ETL	
  Processing	
  
Staging	
  /	
  Storage	
  
CollecFon	
  
Step	
  1:	
  EDH	
  for	
  Storage/Staging/AcAve	
  Archive	
  
Business	
  Intelligence	
  /	
  ApplicaFons	
  
RDBMS	
  
ETL	
  Processing	
  
EDH	
  for	
  Storage	
  AcFve	
  Archive	
  
CollecFon	
  
EDH	
  for	
  CollecFon	
  &	
  Storage.	
  
Step	
  2:	
  EDH	
  for	
  Data	
  CollecAon	
  (Sqoop/Flume)	
  
Business	
  Intelligence	
  /	
  ApplicaFons	
  
RDBMS	
  
ETL	
  Processing	
  
Step	
  3:	
  EDH	
  for	
  ETL	
  Processing	
  AcceleraAon	
  
Business	
  Intelligence	
  /	
  ApplicaFons	
  
RDBMS	
  
EDH	
  for	
  CollecFon,	
  Storage	
  	
  
&	
  ETL	
  Processing	
  AcceleraFon.	
  
ETL	
  /	
  Data	
  
IntegraAon	
  
Tools	
  
Step	
  4:	
  EDH	
  for	
  EDW	
  OpAmizaAon	
  (Impala)	
  
	
  
EDH	
  for	
  CollecFon,	
  Storage,	
  	
  
ETL	
  Processing	
  AcceleraFon	
  
&	
  Historical	
  RDBMS	
  Data/Queries.	
  
Business	
  Intelligence	
  /	
  ApplicaFons	
  
RDBMS	
   Rarely	
  Used	
  Data	
  
Step	
  5:	
  EDH	
  for	
  Agile	
  ExploraAon	
  
	
  
EDH	
  for	
  CollecFon,	
  Storage,	
  
ETL	
  Processing	
  AcceleraFon,	
  
Historical	
  RDBMS	
  Data/Queries,	
  
And	
  Agile	
  ExploraFon	
  
RDBMS	
  
BI	
  /	
  ApplicaFons	
   Agile	
  ExploraFon	
  
Step	
  6:	
  EDH	
  for	
  Data	
  Science	
  (Not	
  Only	
  SQL)	
  
	
  
EDH	
  for	
  CollecFon,	
  Storage,	
  
ETL	
  Processing	
  AcceleraFon,	
  
Historical	
  RDBMS	
  Data/Queries,	
  
&	
  Generic	
  Data	
  ComputaFon	
  
RDBMS	
  
BI	
  /	
  
ApplicaFons	
  
Agile	
  
ExploraFon	
  
Data	
  
Science	
  
Step	
  7:	
  Converged	
  AnalyAcs	
  -­‐	
  Apps	
  Come	
  to	
  Data	
  
	
  
	
  
EDH	
  for	
  CollecFon,	
  Storage,	
  
ETL	
  Processing	
  AcceleraFon,	
  
Historical	
  RDBMS	
  Data/Queries,	
  
Generic	
  Data	
  ComputaFon,	
  
And	
  MulFple-­‐Workloads.	
  
RDBMS	
  
BI	
   Explore	
  
Data	
  
Science	
  
SAS,	
  R,	
  
Spark	
  
InformaFca	
  
SyncSort,	
  
Pentaho	
  
Hunk	
  
...	
  
Data	
  
Science	
  
Agile	
  
ExploraFon	
  
ETL	
  
AcceleraFon	
  
OperaFonal	
  Efficiency	
  
(Faster,	
  Bigger,	
  Cheaper)	
  
TransformaFve	
  ApplicaFons	
  
(New	
  Business	
  Value)	
  
Cheap	
  
Storage	
  
Business	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  IT	
  
A	
  High	
  Level	
  View	
  of	
  the	
  Journey	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
EDW	
  
OpFmizaFon	
  
Converged	
  
AnalyFcs	
  
WEB/MOBILE	
  APPLICATIONS	
  
ONLINE	
  SERVING	
  
SYSTEM	
  
ENTERPRISE	
  DATA	
  
WAREHOUSE	
  	
  
ENTERPRISE	
  
REPORTING	
  BI	
  /	
  ANALYTICS	
  MACHINE	
  
LEARNING	
  
CONVERGED	
  
APPLICATIONS	
  
CLOUDERA	
  
MANAGER	
  
META	
  DATA	
  /	
  	
  
ETL	
  TOOLS	
  
ENTERPRISE	
  DATA	
  HUB	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  Rights	
  Reserved.	
  
The	
  Modern	
  InformaAon	
  Architecture	
  
Data	
  Architects	
   System	
  Operators	
   Engineers	
   Data	
  ScienFsts	
   Analysts	
   Business	
  Users	
  
Customers	
  &	
  End	
  Users	
  
SYS	
  LOGS	
   WEB	
  LOGS	
   FILES	
   RDBMS	
  
Enabling	
  The	
  App	
  Store	
  of	
  Big	
  Data	
  
So[ware	
  (BI,	
  AnalyFcs,	
  &	
  Data	
  IntegraFon)	
  
System	
  IntegraFon	
   Cloud	
  &	
  MSP	
  
Hardware	
   Database	
  
Note:	
  Display	
  Cloudera	
  Connect	
  PlaAnum	
  and	
  Gold	
  partners	
  only	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Customer	
  Success	
  Across	
  Industries	
  
Financial	
  &	
  
Business	
  Services	
  
Telecom	
  &	
  	
  
Technology	
  
Healthcare	
  &	
  
Life	
  Sciences	
  
Media	
  &	
  
InformaAon	
  
Retail	
  &	
  
Consumer	
  
Energy	
  &	
  	
  
Public	
  Sector	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Enterprise	
  Data	
  Hub:	
  A	
  Complete	
  Big	
  Data	
  SoluAon	
  	
  
•  Efficient	
  Data	
  Management	
  System	
  
•  Consolidated	
  Silos	
  for	
  Truly	
  Big	
  Data	
  
•  Accelerated	
  Time	
  to	
  Insight	
  
•  Diverse	
  Business	
  User	
  CapabiliAes	
  
•  Full-­‐Fidelity	
  AcAve	
  Archive	
  
•  Enterprise-­‐Grade	
  Data	
  Security,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Lineage,	
  AudiAng,	
  Governance	
  
•  High	
  OpAon	
  Value	
  for	
  ExploraAon,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Data	
  Science,	
  Consolidated	
  360o	
  View	
  
•  Complete	
  Plalorm	
  for	
  Converged	
  AnalyAcs	
  
©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  
Thank	
  You!	
  
38	
   ©2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  	
  	
  

The Future of Data Management: The Enterprise Data Hub

  • 1.
    The  Future  of  Data  Management:     The  Enterprise  Data  Hub   Clarke  Pa)erson|  Sr.  Director,  Cloudera   1   ©2014  Cloudera,  Inc.  All  rights  reserved.      
  • 2.
    Data  PotenAal  is  Out  There   ©2014  Cloudera,  Inc.  All  rights  reserved.      2  
  • 3.
    An  Environment  of  Change   ©2014  Cloudera,  Inc.  All  rights  reserved.      3   ConsumpAon   InstrumentaAon   Value   ExploraAon  
  • 4.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      4  
  • 5.
  • 6.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      6  
  • 7.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      7  
  • 8.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      8  
  • 9.
    IT’S  ALL   (BIG)   DATA   10TB  to  10PB   ©2014  Cloudera,  Inc.  All  rights  reserved.      9  
  • 10.
    0%   10%   20%   30%   40%   50%   60%   Mainframe   Enterprise  Data  Warehouse   Storage   AnalyAc  Databases   ETL  Processing   What  Infrastructure  Have  you  Augmented     with  Big  Data  SoluAons?   Source:  King  Research,  3922  Respondents   ©2014  Cloudera,  Inc.  All  rights  reserved.      10  
  • 11.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       ComplicaAons  of  Status  Quo   Structure   Storage   Network   Silos   INGEST   STORE   EXPLORE   PROCESS   ANALYZE   SERVE   11  
  • 12.
    How  Important  are  These  CapabiliAes  in  Your   SelecAon  of  a  Big  Data  Vendor?   7   7.5   8   8.5   9   9.5   Open  Source  Socware   Technically  Superior  Product   Cost   IntegraAon  with  Other  Systems   Secure  Technology   Reliable  /  Trusted  Vendor   Flexibility   Performance   Scalability   Source:  King  Research,  3922  Respondents   ©2014  Cloudera,  Inc.  All  rights  reserved.      12  
  • 13.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      13  
  • 14.
    What  are  the  Primary  Benefits  You’ve  Seen  Doing   a  Big  Data  Product  with  an  EDH   Source:  King  Research,  3922  Respondents   10%   30%   50%   70%   Gain  CompeAAve  Advantage   Improve  Efficiency   Increase  Business  Value  from  Data   Make  Be)er  Decisions,  Faster   Improved  Data  Processing   Improved  Data  AnalyAcs   ©2014  Cloudera,  Inc.  All  rights  reserved.      14  
  • 15.
    15%   25%   35%   45%   OperaAonal  Improvement   Customer  Experience  Analysis   Market  TargeAng   Customer  Insights   Behavioral  Analysis   Research  /  InnovaAon   ©2014  Cloudera,  Inc.  All  rights  reserved.       What  are  Your  Big  Data  ApplicaAons?   15   Source:  King  Research,  3922  Respondents  
  • 16.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       Expanding  Data  Requires  A  New  Approach   16   Then   Bring  Data  to  Compute   Now   Bring  Compute  to  Data   Data   InformaFon-­‐centric   businesses  use  all  Data:       MulF-­‐structured,     Internal  &  external  data     of  all  types   Compute   Compute   Compute   Process-­‐centric     businesses  use:     • Structured  data  mainly   • Internal  data  only   • “Important”  data  only       Compute   Compute   Compute   Data   Data   Data   Data  
  • 17.
    Hadoop  Changes  the  Game:     Storage  and  Compute  on  One  Plalorm   ©2014  Cloudera,  Inc.  All  rights  reserved.      17   The  Hadoop  Way  The  Old  Way   $30,000+  per  TB   Expensive  &  UnaWainable   •  Hard  to  scale   •  Network  is  a  bo)leneck   •  Only  handles  relaAonal  data   •  Difficult  to  add  new  fields  &  data  types   Expensive,  Special  purpose,  “Reliable”  Servers   Expensive  Licensed  So[ware   Network   Data  Storage   (SAN,  NAS)   Compute   (RDBMS,  EDW)   $300-­‐$1,000  per  TB   Affordable  &  AWainable   •  Scales  out  forever   •  No  bo)lenecks   •  Easy  to  ingest  any  data   •  Agile  data  access   Commodity  “Unreliable”  Servers   Hybrid  Open  Source  So[ware   Compute   (CPU)   Memory   Storage   (Disk)   z   z  
  • 18.
    ©2014  Cloudera,  Inc.  All  rights  reserved.      18   The  Old  Way   Expensive  &  UnaWainable   The  Hadoop  Way   Affordable  &  AWainable   Hadoop  Changes  the  Game:     Storage  and  Compute  on  One  Plalorm  
  • 19.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       The  Old  Way:  Bringing  Data  to  Compute   19   Complex  Architecture   •  Many  special-­‐purpose  systems   •  Moving  data  around   •  No  complete  views   Missing  Data   •  Leaving  data  behind   •  Risk  and  compliance   •  High  cost  of  storage   Time  to  Data   •  Up-­‐front  modeling   •  Transforms  slow   •  Transforms  lose  data   Cost  of  AnalyFcs   •  ExisAng  systems  strained   •  No  agility   •  “BI  backlog”   4   1   2   3   SERVERS  MARTS  EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   EXTERNAL  DATA  SOURCES  
  • 20.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       The  New  Way:  Bringing  Compute  to  Data   20   SERVERS   MARTS   EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   ESTERNAL  DATA  SOURCES   Diverse  AnalyFc  Pla]orm   •  Bring  applicaAons  to  data   •  Combine  different  workloads  on     common  data  (i.e.  SQL  +  Search)   •  True  analy*c  agility   4   1   2   3   4   AcFve  Compliance  Archive   •  Full  fidelity  original  data   •  Indefinite  Ame,  any  source   •  Lowest  cost  storage   1   Persistent  Staging   •  One  source  of  data  for  all  analyAcs   •  Persist  state  of  transformed  data   •  Significantly  faster  &  cheaper   2   Self-­‐Service  Exploratory  BI   •  Simple  search  +  BI  tools   •  “Schema  on  read”  agility   •  Reduce  BI  user  backlog  requests   3  
  • 21.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       Hadoop  and  The  Enterprise  Data  Hub   21   Open  Source   Scalable   Flexible   Cost-­‐EffecFve   ✔   Managed   ✖   Open   Architecture   ✖   Secure  and   Governed   ✖   ✔   ✔   ✔   3RD  PARTY   APPS   STORAGE  FOR  ANY  TYPE  OF  DATA   UNIFIED,  ELASTIC,  RESILIENT,  SECURE             CLOUDERA’S  ENTERPRISE  DATA  HUB   BATCH   PROCESSING   ANALYTIC   SQL   SEARCH   ENGINE   MACHINE   LEARNING   STREAM   PROCESSING   WORKLOAD  MANAGEMENT   FILESYSTEM   ONLINE  NOSQL   DATA   MANAGEMENT   SYSTEM   MANAGEMENT   ,  SECURE  
  • 22.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       The  Power  of  the  EDH   22   THE  OLD  WAY   EDH  
  • 23.
    ©2014  Cloudera,  Inc.  All  rights  reserved.       TransformaAve  ApplicaAons  Drive  Revenue   23   5%   15%   25%   35%   45%   Research  /  innovaAon   Behavioral  analysis   Customer  insights   MarkeAng  targeAng  /   Customer  experience   OperaAons  improvement   Fraud  prevenAon  and   Pricing  analyAcs  and  choice   Risk  Modeling  /   Network  monitoring   Service  quality   Customer  lifecycle   Capacity  forecasAng   Inventory  management   eDiscovery  /  document   What  are  your     Big  Data  ApplicaAons?   Source:  King  Research  survey,  September  2013,  3,922  Respondents  
  • 24.
    So  How  Do  We  Get  There?   24   ©2014  Cloudera,  Inc.  All  rights  reserved.      
  • 25.
    The  Typical  Enterprise  Data  AnalyAcs  Stack   Business  Intelligence  /  ApplicaFons   RDBMS   ETL  Processing   Staging  /  Storage   CollecFon  
  • 26.
    Step  1:  EDH  for  Storage/Staging/AcAve  Archive   Business  Intelligence  /  ApplicaFons   RDBMS   ETL  Processing   EDH  for  Storage  AcFve  Archive   CollecFon  
  • 27.
    EDH  for  CollecFon  &  Storage.   Step  2:  EDH  for  Data  CollecAon  (Sqoop/Flume)   Business  Intelligence  /  ApplicaFons   RDBMS   ETL  Processing  
  • 28.
    Step  3:  EDH  for  ETL  Processing  AcceleraAon   Business  Intelligence  /  ApplicaFons   RDBMS   EDH  for  CollecFon,  Storage     &  ETL  Processing  AcceleraFon.   ETL  /  Data   IntegraAon   Tools  
  • 29.
    Step  4:  EDH  for  EDW  OpAmizaAon  (Impala)     EDH  for  CollecFon,  Storage,     ETL  Processing  AcceleraFon   &  Historical  RDBMS  Data/Queries.   Business  Intelligence  /  ApplicaFons   RDBMS   Rarely  Used  Data  
  • 30.
    Step  5:  EDH  for  Agile  ExploraAon     EDH  for  CollecFon,  Storage,   ETL  Processing  AcceleraFon,   Historical  RDBMS  Data/Queries,   And  Agile  ExploraFon   RDBMS   BI  /  ApplicaFons   Agile  ExploraFon  
  • 31.
    Step  6:  EDH  for  Data  Science  (Not  Only  SQL)     EDH  for  CollecFon,  Storage,   ETL  Processing  AcceleraFon,   Historical  RDBMS  Data/Queries,   &  Generic  Data  ComputaFon   RDBMS   BI  /   ApplicaFons   Agile   ExploraFon   Data   Science  
  • 32.
    Step  7:  Converged  AnalyAcs  -­‐  Apps  Come  to  Data       EDH  for  CollecFon,  Storage,   ETL  Processing  AcceleraFon,   Historical  RDBMS  Data/Queries,   Generic  Data  ComputaFon,   And  MulFple-­‐Workloads.   RDBMS   BI   Explore   Data   Science   SAS,  R,   Spark   InformaFca   SyncSort,   Pentaho   Hunk   ...  
  • 33.
    Data   Science   Agile   ExploraFon   ETL   AcceleraFon   OperaFonal  Efficiency   (Faster,  Bigger,  Cheaper)   TransformaFve  ApplicaFons   (New  Business  Value)   Cheap   Storage   Business                          IT   A  High  Level  View  of  the  Journey   ©2014  Cloudera,  Inc.  All  Rights  Reserved.   EDW   OpFmizaFon   Converged   AnalyFcs  
  • 34.
    WEB/MOBILE  APPLICATIONS   ONLINE  SERVING   SYSTEM   ENTERPRISE  DATA   WAREHOUSE     ENTERPRISE   REPORTING  BI  /  ANALYTICS  MACHINE   LEARNING   CONVERGED   APPLICATIONS   CLOUDERA   MANAGER   META  DATA  /     ETL  TOOLS   ENTERPRISE  DATA  HUB   ©2014  Cloudera,  Inc.  All  Rights  Reserved.   The  Modern  InformaAon  Architecture   Data  Architects   System  Operators   Engineers   Data  ScienFsts   Analysts   Business  Users   Customers  &  End  Users   SYS  LOGS   WEB  LOGS   FILES   RDBMS  
  • 35.
    Enabling  The  App  Store  of  Big  Data   So[ware  (BI,  AnalyFcs,  &  Data  IntegraFon)   System  IntegraFon   Cloud  &  MSP   Hardware   Database   Note:  Display  Cloudera  Connect  PlaAnum  and  Gold  partners  only   ©2014  Cloudera,  Inc.  All  rights  reserved.      
  • 36.
    Customer  Success  Across  Industries   Financial  &   Business  Services   Telecom  &     Technology   Healthcare  &   Life  Sciences   Media  &   InformaAon   Retail  &   Consumer   Energy  &     Public  Sector   ©2014  Cloudera,  Inc.  All  rights  reserved.      
  • 37.
    Enterprise  Data  Hub:  A  Complete  Big  Data  SoluAon     •  Efficient  Data  Management  System   •  Consolidated  Silos  for  Truly  Big  Data   •  Accelerated  Time  to  Insight   •  Diverse  Business  User  CapabiliAes   •  Full-­‐Fidelity  AcAve  Archive   •  Enterprise-­‐Grade  Data  Security,                                       Lineage,  AudiAng,  Governance   •  High  OpAon  Value  for  ExploraAon,                                           Data  Science,  Consolidated  360o  View   •  Complete  Plalorm  for  Converged  AnalyAcs   ©2014  Cloudera,  Inc.  All  rights  reserved.      
  • 38.
    Thank  You!   38   ©2014  Cloudera,  Inc.  All  rights  reserved.