Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Upcoming SlideShare
Loading in...5
×
 

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013

on

  • 5,947 views

 

Statistics

Views

Total Views
5,947
Views on SlideShare
1,219
Embed Views
4,728

Actions

Likes
3
Downloads
30
Comments
0

21 Embeds 4,728

http://blog.xebia.fr 4443
http://cloud.feedly.com 217
http://flavors.me 17
http://digg.com 8
http://fr.flavors.me 7
http://es.flavors.me 7
http://webcache.googleusercontent.com 4
http://feed.boiteataquets.org 4
http://pt.flavors.me 3
http://www.newsblur.com 3
http://127.0.0.1 2
http://www.goread.io 2
http://de.flavors.me 2
http://jp.flavors.me 2
http://summary 1
http://marty.alwaysdata.net 1
http://translate.googleusercontent.com 1
http://reader.nunux.org 1
http://nicolas-delsaux.hd.free.fr 1
http://www.cobestran.com 1
http://smashingreader.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013 Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013 Presentation Transcript

  • 1 Ask  Bigger  Ques,ons   with  Cloudera   and  Apache  Hadoop   Graham  Gear   graham@cloudera.com   JUNE  2013      
  • Data  Has  Changed  in  the  Last  30  Years  DATA  GROWTH   END-­‐USER   APPLICATIONS   THE  INTERNET   MOBILE  DEVICES   SOPHISTICATED   MACHINES   STRUCTURED  DATA  –  10%   1980   2012   UNSTRUCTURED  DATA  –  90%  
  • Data  Management  Strategies   Have  Stayed  the  Same     •  Raw  data  on  SAN,  NAS   and  tape     •  Data  moved  from   storage  to  compute     •  Rela,onal  models  with   predesigned  schemas  
  • Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough  
  • Too  Much  Data,  Too  Many  Sources   $ ! $ $ $ •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store  
  • Too  Much  Data,  Too  Many  Sources   1 2 3 4 5 •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places  
  • Too  Much  Data,  Too  Many  Sources   •  Can’t  ingest  fast  enough     •  Costs  too  much  to  store     •  Exists  in  different  places     •  Archived  data  is  lost  
  • Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long  
  • Can’t  Use  It  The  Way  You  Want  To   1 2 3 4 5 •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos  
  • Can’t  Use  It  The  Way  You  Want  To   ? ? ? •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  ques,ons  
  • Can’t  Use  It  The  Way  You  Want  To   •  Analysis  and  processing   takes  too  long     •  Data  exists  in  silos     •  Can’t  ask  new  ques,ons     •  Can’t  analyze   unstructured  data  
  • 12 Transform  The  Way  You  Think  About  Data   Cloudera  
  • Ask  Bigger  Ques,ons   13   When  customer  x  visits  my  store  what   can  I  recommend  based  on  their   recent  web  behavior  across  our   various  brand  websites?   What  is  the  best  loca,on  in  North   America  to  efficiently  produce  both   tomato  plants  and  corn?   What  does  every  fraudulent  ac,vity  in   the  last  2  years  have  in  common  that   will  help  us  iden,fy  and  proac,vely   prevent  the  next  incident?   Are  hotel  room  sales  at  Christmas   slow  because  of  inventory  or   compe,,ve  pricing?     What  did  customer  x  view   on  their  last  website  visit?     `   What  makes  tomato  plants   more  frui[ul  than  others  ?     What  incidents  of  fraud  did   we  detect  last  year?     What  search  terms  are  used   most  oen  when  looking  for   hotels  in  NYC?                                                                                                    
  •                                SIMPLIFIED,  UNIFIED,  EFFICIENT   •  Bulk  of  data  stored  on  scalable  low  cost  pla[orm   •  Perform  end-­‐to-­‐end  workflows   •  Specialized  systems  reserved  for  specialized  workloads   •  Provides  data  access  across  departments  or  LOB        COMPLEX,  FRAGMENTED,  COSTLY   •Data  silos  by  department  or  LOB   •  Lots  of  data  stored  in  expensive  specialized  systems     •  Analysts  pull  select  data  into  EDW   •  No  one  has  a  complete  view     The  Cloudera  Approach   14   Meet  enterprise  demands  with  a  new  way  to  think  about  data.   THE  CLOUDERA  WAY  THE  OLD  WAY   Single  data  pla[orm  to   support  BI,  Repor,ng  &     App  Serving   Mul,ple  pla[orms     for  mul,ple  workloads  
  •     INGEST   STORE   EXPLORE   PROCESS   ANALYZE   SERVE   CDH   CLOUDERA   MANAGER   CLOUDERA   SUPPORT   Cloudera  Enterprise:  The  Pla[orm  for  Big  Data   15   BRINGS  STORAGE  &   COMPUTE  TOGETHER   WORKS  WITH  EVERY   TYPE  OF  DATA   CHANGES  THE   ECONOMICS  OF  DATA   MANGAGEMENT   A  Revolu,onary  Solu,on  Built  on  Apache  Hadoop   CLOUDERA   NAVIGATOR  
  • 16   Cloudera  Enterprise   Includes  Advanced  System  Management  &  Support  for  the  Core  CDH  Projects       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 17   RTD  SubscripVon   Includes  Support  &  Indemnity  for  Apache  HBase       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 18   RTQ  SubscripVon   Includes  Support  &  Indemnity  for  Cloudera  Impala       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 19   RTS  SubscripVon   Includes  Support  &  Indemnity  for  Cloudera  Search       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   Search  IMPALA  
  • 20   BDR  SubscripVon   Includes  Centralized  Management  For  Disaster  Recovery  Workflows       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 21   Navigator  SubscripVon   Enables  Cloudera  Navigator  for  Automated  Data  Management       CDH   100%  OPEN  SOURCE   HADOOP  DISTRIBUTION   CLOUDERA  MANAGER   END-­‐TO-­‐END  SYSTEM  MANAGEMENT   CORE  PROJECTS   PREMIUM  PROJECTS   CONNECTORS   HDFS   MAPREDUCE   FLUME   HCATALOG   MICROSTRATEGY   NETEZZA   ORACLE   QLIKVIEW   TABLEAU   TERADATA   HIVE   HUE   MAHOUT   OOZIE   PIG   SQOOP   WHIRR   ZOOKEEPER   HBASE   IMPALA   SEARCH  (BETA)   DEPLOYMENT   MONITORING   API   SNMP   CONFIG  ROLLBACKS   PHONE  HOME   SERVICE  MGMT   DIAGNOSTICS   ROLLING  UPGRADES   LDAP   REPORTING   BACKUP/DR   CLOUDERA  SUPPORT   BEST-­‐IN-­‐CLASS  TECHNICAL  SUPPORT,   COMMUNICTY  ADVOCACY  &   INDEMNIFICATION   CLOUDERA  NAVIGATOR   END-­‐TO-­‐END  DATA  MANAGEMENT   ACCESS  MGMT   DATA  AUDIT   CORE  HADOOP   PROJECTS   CLOUDERA   MANAGER   CLOUDERA   NAVIGATOR   HBASE   IMPALA   Search  
  • 22 Customer  Case  Studies      
  • A  mul,na,onal  bank  saves  millions  by   op,mizing  DW  for  analy,cs  &  reducing  data   storage  costs  by  99%.     Ask  Bigger  Ques,ons:   How  can  we  op,mize  our   data  warehouse  investment?  
  • Cloudera  op,mizes  the  EDW,  saves  millions   24   The  Challenge:   •  Teradata  EDW  at  capacity:  ETL  processes  consume  7  days;  takes  5  weeks  to   make  historical  data  available  for  analysis   •  Performance  issues  in  business  cri,cal  apps;  liqle  room  for  discovery,  analy,cs,   ROI  from  opportuni,es   Mul,na,onal  bank  saves  millions  by   op,mizing  exis,ng  DW  for  analy,cs  &   reducing  data  storage  costs  by  99%.   The  Solu,on:   •  Cloudera  Enterprise  offloads  data   storage,  processing  &  some   analy,cs  from  EDW   •  Teradata  can  focus  on  opera,onal   func,ons  &  analy,cs  
  • A  Semiconductor  Manufacturer  uses     predic,ve  analy,cs  to  take  preventa,ve  ac,on   on  chips  likely  to  fail.   Ask  Bigger  Ques,ons:   Which  semiconductor   chips  will  fail?  
  • Cloudera  enables  beqer  predic,ons   26   The  Challenge:   •  Want  to  capture  greater  granular  and  historical  data  for  more  accurate   predic,ve  yield  modeling   •  Storing  9  months’  data  on  Oracle  is  expensive       Semiconductor  manufacturer  can   prevent  chip  failure  with  more   accurate  predic,ve  yield  models.   The  Solu,on:   • Dell  |  Cloudera  solu,on  for  Apache   Hadoop   • 53  nodes;  plan  to  store  up  to  10   years  (~10PB)   • Capturing  &  processing  data  from   each  phase  of  manufacturing  process   CONFIDENTIAL  -­‐  RESTRICTED  
  • The  quant  risk  LOB  within  a  mul,na,onal  bank   saves  millions  through  beqer  risk  exposure   analysis  &  fraud  preven,on.   Ask  Bigger  Ques,ons:   How  can  we  prevent   fraud?  
  • Cloudera  delivers  savings  through  fraud  preven,on   28   The  Challenge:   •  Fraud  detec,on  is  a  cumbersome,  mul,-­‐step  analy,c  process  requiring  data   sampling   •  2B  transac,ons/month  necessitate  constant  revisions  to  risk  profiles   •  Highly  tuned  100TB  Teradata  DW  drives  over-­‐budget  capital  reserves  &  lower   investment  returns   Quant  risk  LOB  in  mul,na,onal  bank   saves  millions  through  beqer  risk   exposure  analysis  &  fraud  preven,on   The  Solu,on:   •  Cloudera  Enterprise  data  factory  for   fraud  preven,on,  credit  &   opera,onal  risk  analysis   •  Look  at  every  incidence  of  fraud  for   5  years  for  each  person   •  Reduced  costs;  expensive  CPU  no   longer  consumed  by  data  processing  
  • BlackBerry  eliminates  data  sampling  &   simplifies  data  processing  for  beqer,  more   comprehensive  analysis.   Ask  Bigger  Ques,ons:   How  do  we  retain  customers   in  a  compe,,ve  market?  
  • Cloudera  delivers  ROI  through  storage  alone   30   The  Challenge:   •  BlackBerry  Services  generates  .5PB  (50-­‐60TB  compressed)  data  per  day   •  RDBMS  is  expensive  –  limited  to  1%  data  sampling  for  analy,cs   BlackBerry  can  analyze  all  their  data   vs.  relying  on  1%  sample  for  beqer   network  capacity  trending  &   management.   The  Solu,on:   •  Cloudera  Enterprise  manages  global   data  set  of  ~100PB   •  Collec,ng  device  content,  machine-­‐ generated  log  data,  audit  details   •  90%  ETL  code  base  reduc,on  
  • 31 A  global  retailer’s  customers  benefit  from   more  personalized  communica,ons  and  offers   based  on  interac,ons  across  all  channels.     Ask  Bigger  Ques,ons:   How  can  we  offer  customers   the  best  experience?  
  • Cloudera  op,mizes  the  DW  for  improved  ROI   32   Global  retailer’s  customers  benefit   from  more  personalized   communica,ons  based  on   interac,ons  across  all  channels.   The  Solu,on:   •  Cloudera  Enterprise  with  Impala  —   1PB  over  250  nodes   •  Consolidated  pla[orm  for  Big  Data   with  single  environment  for  query   and  machine  learning             CONFIDENTIAL  -­‐  RESTRICTED   The  Challenge:   •   Need  to  correlate  online/offline  data  across  disparate,  costly  legacy  DWs   •   Data  takes  up  to  4  weeks  to  get  data  from  one  group  –  inhibits  produc,vity    
  • 33 Any  Ques,ons,  Big  or  Small?