1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mike	
  Olson	
  |	
  co-­‐founder	
  and	
  chief	
  strategy	
  officer	
  
Spark	
  in	
  the	
  Hadoop	
  Ecosystem	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Hadoop:	
  From	
  MapReduce	
  to	
  an	
  Enterprise	
  Data	
  Hub	
  
Hadoop	
  delivers:	
  
•  One	
  place	
  for	
  unlimited	
  data	
  
•  Unified,	
  mulM-­‐framework	
  data	
  access	
  
	
  
Enterprises	
  require:	
  
•  Leading	
  Performance	
  
•  Open	
  Source,	
  Open	
  Standards	
  
•  Enterprise	
  Security	
  
•  Data	
  Governance	
  
•  Complete	
  Management	
  
Security	
  and	
  AdministraMon	
  
Unlimited	
  Storage	
  
Process	
   Discover	
   Model	
   Serve	
  
Deployment	
  
Flexibility	
  
On-­‐Premises	
  
Appliances	
  
Engineered	
  Systems	
  
Public	
  Cloud	
  
Private	
  Cloud	
  
Hybrid	
  Cloud	
  
A	
  modern	
  data	
  plaSorm	
  plus	
  what	
  the	
  enterprise	
  requires.	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Where	
  Spark	
  Fits	
  in	
  the	
  Hadoop	
  Ecosystem	
  
YARN: Shared resource management
HDFS and HBase: Shared storage
Impala
Hive
Pig
MapReduce2
Search
Spark
Spark
Streaming
Hive
(beta)
Pig
(beta)
…	
  
With	
  common	
  
	
  
•  Security	
  
•  Data	
  governance	
  
•  ConfiguraMon,	
  
deployment	
  and	
  
operaMons	
  
	
  
across	
  all	
  
components	
  in	
  the	
  
stack	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Process	
  millions	
  of	
  
equity	
  and	
  bond	
  	
  
market	
  posiMons,	
  and	
  
evaluate	
  against	
  
future	
  scenarios	
  in	
  
minutes,	
  versus	
  days	
  
with	
  MapReduce.	
  
Major	
  Global	
  
Financial	
  InsMtuMon	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Monitor	
  on-­‐line	
  user	
  
acMvity	
  and	
  opMmize	
  
content	
  delivery	
  and	
  
search	
  results	
  in	
  real	
  
Mme.	
  
Large	
  Consumer	
  
Company	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Ingest	
  and	
  analyze	
  
complex	
  data	
  from	
  a	
  
variety	
  of	
  sources	
  
conMnually,	
  building	
  
new	
  risk	
  and	
  value	
  
models	
  in	
  real	
  Mme	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Combine	
  genomic	
  and	
  
phenotype	
  data	
  with	
  
other	
  data	
  sources	
  to	
  
understand	
  disease	
  
onset	
  and	
  progression	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Spark	
  extends	
  the	
  
Hadoop	
  ecosystem	
  
with	
  new	
  analyMc	
  
and	
  processing	
  
capabiliMes.	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Thank	
  you!	
  
Mike	
  Olson,	
  chief	
  strategy	
  officer	
  
mike.olson@cloudera.com	
  
@mikeolson	
  

Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)

  • 1.
    1  ©  Cloudera,  Inc.  All  rights  reserved.   Mike  Olson  |  co-­‐founder  and  chief  strategy  officer   Spark  in  the  Hadoop  Ecosystem  
  • 2.
    2  ©  Cloudera,  Inc.  All  rights  reserved.   Hadoop:  From  MapReduce  to  an  Enterprise  Data  Hub   Hadoop  delivers:   •  One  place  for  unlimited  data   •  Unified,  mulM-­‐framework  data  access     Enterprises  require:   •  Leading  Performance   •  Open  Source,  Open  Standards   •  Enterprise  Security   •  Data  Governance   •  Complete  Management   Security  and  AdministraMon   Unlimited  Storage   Process   Discover   Model   Serve   Deployment   Flexibility   On-­‐Premises   Appliances   Engineered  Systems   Public  Cloud   Private  Cloud   Hybrid  Cloud   A  modern  data  plaSorm  plus  what  the  enterprise  requires.  
  • 3.
    3  ©  Cloudera,  Inc.  All  rights  reserved.   Where  Spark  Fits  in  the  Hadoop  Ecosystem   YARN: Shared resource management HDFS and HBase: Shared storage Impala Hive Pig MapReduce2 Search Spark Spark Streaming Hive (beta) Pig (beta) …   With  common     •  Security   •  Data  governance   •  ConfiguraMon,   deployment  and   operaMons     across  all   components  in  the   stack  
  • 4.
    4  ©  Cloudera,  Inc.  All  rights  reserved.   Process  millions  of   equity  and  bond     market  posiMons,  and   evaluate  against   future  scenarios  in   minutes,  versus  days   with  MapReduce.   Major  Global   Financial  InsMtuMon  
  • 5.
    5  ©  Cloudera,  Inc.  All  rights  reserved.   Monitor  on-­‐line  user   acMvity  and  opMmize   content  delivery  and   search  results  in  real   Mme.   Large  Consumer   Company  
  • 6.
    6  ©  Cloudera,  Inc.  All  rights  reserved.   Ingest  and  analyze   complex  data  from  a   variety  of  sources   conMnually,  building   new  risk  and  value   models  in  real  Mme  
  • 7.
    7  ©  Cloudera,  Inc.  All  rights  reserved.   Combine  genomic  and   phenotype  data  with   other  data  sources  to   understand  disease   onset  and  progression  
  • 8.
    8  ©  Cloudera,  Inc.  All  rights  reserved.   Spark  extends  the   Hadoop  ecosystem   with  new  analyMc   and  processing   capabiliMes.   8  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 9.
    9  ©  Cloudera,  Inc.  All  rights  reserved.   Thank  you!   Mike  Olson,  chief  strategy  officer   mike.olson@cloudera.com   @mikeolson