Cmu-2011-09.pptx

1,222 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,222
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cmu-2011-09.pptx

  1. 1. MapR,  Implica0ons  for  Integra0on   CMU  –  September  2011   10/11/11   ©  MapR  Confiden0al   1  
  2. 2. Outline  •  MapR  system  overview   •  Map-­‐reduce  review   •  MapR  architecture   •  Performance  Results   •  Map-­‐reduce  on  MapR  •  Architectural  implica0ons   •  Search  indexing  /  deployment   •  EM  algorithm  for  machine  learning   •  …  and  more  …   10/11/11   ©  MapR  Confiden0al   2  
  3. 3. !"!#Map-­‐Reduce   !" @/-,9) !# A.0B Input   Output   !" @/-,9) !# A.0B Shuffle   $%&()"*" +,&)!%-(./%0) 12!!3)"*4 536-3) 8(&()"930) 10/11/11   "*# ©  MapR  Confiden0al   !%-(./%0) "*: 3   "*7
  4. 4. BoQlenecks  and  Issues  •  Read-­‐only  files  •  Many  copies  in  I/O  path  •  Shuffle  based  on  HTTP   •  Can’t  use  new  technologies   •  Eats  file  descriptors  •  Spills  go  to  local  file  space   •  Bad  for  skewed  distribu0on  of  sizes   10/11/11   ©  MapR  Confiden0al   4  
  5. 5. MapR  Areas  of  Development   HBase   Map   Reduce   Ecosystem   Storage   Management   Services   10/11/11   ©  MapR  Confiden0al   5  
  6. 6. MapR  Improvements  •  Faster  file  system   •  Fewer  copies   •  Mul0ple  NICS   •  No  file  descriptor  or  page-­‐buf  compe00on  •  Faster  map-­‐reduce   •  Uses  distributed  file  system   •  Direct  RPC  to  receiver   •  Very  wide  merges   10/11/11   ©  MapR  Confiden0al   6  
  7. 7. MapR  Innova0ons  •  Volumes   •  Distributed  management   •  Data  placement  •  Read/write  random  access  file  system   •  Allows  distributed  meta-­‐data   •  Improved  scaling   •  Enables  NFS  access  •  Applica0on-­‐level  NIC  bonding  •  Transac0onally  correct  snapshots  and  mirrors   10/11/11   ©  MapR  Confiden0al   7  
  8. 8. MapRs  Containers   Files/directories  are  sharded  into  blocks,  which   are  placed  into  mini  NNs  (containers  )  on  disks   l  Each  container  contains   l  Directories  &  files   l  Data  blocks   l  Replicated  on  servers  Containers  are   l  No  need  to  manage  16-­‐32  GB  segments   directly  of  disk,  placed  on  nodes   10/11/11   ©  MapR  Confiden0al   8  
  9. 9. MapRs  Containers   l  Each  container  has  a   replica0on  chain   l  Updates  are  transac0onal   l  Failures  are  handled  by   rearranging  replica0on  10/11/11   ©  MapR  Confiden0al   9  
  10. 10. Container  loca0ons  and  replica0on   N1,  N2   N1   N3,  N2   N1,  N2   N1,  N3   N2   N3,  N2   CLDB   N3   Container  loca0on  database   (CLDB)  keeps  track  of  nodes   hos0ng  each  container  and   replica0on  chain  order   10/11/11   ©  MapR  Confiden0al   10  
  11. 11. MapR  Scaling  Containers  represent  16  -­‐  32GB  of  data   l  Each  can  hold  up  to    1  Billion  files  and  directories   l  100M  containers  =    ~  2  Exabytes    (a  very  large  cluster)  250  bytes  DRAM  to  cache  a  container   l  25GB  to  cache  all  containers  for  2EB  cluster     But  not  necessary,  can  page  to  disk   l  Typical  large  10PB  cluster  needs  2GB  Container-­‐reports  are  100x  -­‐  1000x    <    HDFS  block-­‐reports   l  Serve  100x  more  data-­‐nodes   l  Increase  container  size  to  64G  to  serve  4EB  cluster   l  Map/reduce  not  affected   10/11/11   ©  MapR  Confiden0al   11  
  12. 12. MapRs  Streaming  Performance   2250 2250 11  x  7200rpm  SATA   11  x  15Krpm  SAS   2000 2000 1750 1750 1500 1500 1250 1250 Hardware MapR 1000 1000MB   Hadoop 750 750per  sec   500 500 250 250 0 0 Read Write Read Write Higher  is  be;er   Tests:          i.    16  streams  x  120GB              ii.    2000  streams  x  1GB   10/11/11   ©  MapR  Confiden0al   12  
  13. 13. Terasort  on  MapR   10+1  nodes:  8  core,  24GB  DRAM,  11  x  1TB  SATA  7200  rpm   60 300 50 250 40 200Elapsed   30 150 MapR=me   Hadoop(mins)   20 100 10 50 0 0 1.0  TB 3.5  TB Lower  is  be;er   10/11/11   ©  MapR  Confiden0al   13  
  14. 14. HBase  on  MapR   YCSB  Random  Read    with  1  billion  1K  records   10+1  node  cluster:  8  core,  24GB  DRAM,  11  x  1TB  7200  RPM   25000   20000  Records   15000   per   MapR  second   10000   Apache   5000   0   Zipfian   Uniform   Higher  is  be;er   10/11/11   ©  MapR  Confiden0al   14  
  15. 15. Small  Files  (Apache  Hadoop,  10  nodes)   Out  of  box   Op:    -­‐  create  file  Rate (files/sec)                  -­‐  write  100  bytes   Tuned                    -­‐  close   Notes:   -­‐  NN  not  replicated   -­‐  NN  uses  20G  DRAM   -­‐  DN  uses    2G    DRAM   #  of  files  (m)   10/11/11   ©  MapR  Confiden0al   15  
  16. 16. MUCH  faster  for  some  opera0ons  Same  10  nodes  …   Create   Rate   #  of  files  (millions)   10/11/11   ©  MapR  Confiden0al   16  
  17. 17. What  MapR  is  not  •  Volumes  !=  federa0on   •  MapR  supports  >  10,000  volumes  all  with   independent  placement  and  defaults   •  Volumes  support  snapshots  and  mirroring  •  NFS  !=  FUSE   •  Checksum  and  compress  at  gateway   •  IP  fail-­‐over   •  Read/write/update  seman0cs  at  full  speed  •  MapR  !=  maprfs   10/11/11   ©  MapR  Confiden0al   17  
  18. 18. New  Capabili0es  10/11/11   ©  MapR  Confiden0al   18  
  19. 19. Alterna0ve  NFS  moun0ng  models  •  Export  to  the  world   •  NFS  gateway  runs  on  selected  gateway  hosts  •  Local  server   •  NFS  gateway  runs  on  local  host   •  Enables  local  compression  and  check  summing  •  Export  to  self   •  NFS  gateway  runs  on  all  data  nodes,  mounted   from  localhost   10/11/11   ©  MapR  Confiden0al   19  
  20. 20. Export  to  the  world   NFS   NFS   Server   NFS   Server   NFS   Server   NFS   Server   Client   10/11/11   ©  MapR  Confiden0al   20  
  21. 21. Local  server   Applica0on   NFS   Server   Client   Cluster  Nodes   10/11/11   ©  MapR  Confiden0al   21  
  22. 22. Universal  export  to  self   Cluster  Nodes   Task   NFS   Cluster   Server   Node   10/11/11   ©  MapR  Confiden0al   22  
  23. 23. Nodes  are  iden0cal   Task   Task   NFS   NFS   Cluster   Server   Node   Cluster   Server   Node   Task   NFS   Cluster   Server   Node   10/11/11   ©  MapR  Confiden0al   23  
  24. 24. Applica0on  architecture  •  High  performance  map-­‐reduce  is  nice  •  But  algorithmic  flexibility  is  even  nicer   10/11/11   ©  MapR  Confiden0al   24  
  25. 25. Sharded  text  Indexing   Assign  documents   Index  text  to  local  disk   to  shards   and  then  copy  index  to   distributed  file  store   Clustered   Reducer   index  storage   Input   Map   documents   Copy  to  local  disk   Local   required  before   Local   typically  disk   Search   index  can  be  loaded   disk   Engine   10/11/11   ©  MapR  Confiden0al   25  
  26. 26. Sharded  text  indexing  •  Mapper  assigns  document  to  shard   •  Shard  is  usually  hash  of  document  id  •  Reducer  indexes  all  documents  for  a  shard   •  Indexes  created  on  local  disk   •  On  success,  copy  index  to  DFS   •  On  failure,  delete  local  files  •  Must  avoid  directory  collisions     •  can’t  use  shard  id!  •  Must  manage  and  reclaim  local  disk  space   10/11/11   ©  MapR  Confiden0al   26  
  27. 27. Conven0onal  data  flow   Failure  of  search   engine  requires   Failure  of  a  reducer   another  download   causes  garbage  to   of  the  index  from   accumulate  in  the   clustered  storage.   Clustered   local  disk   Reducer   index  storage   Input   Map   documents   Local   disk   Local   Search   disk   Engine   10/11/11   ©  MapR  Confiden0al   27  
  28. 28. Simplified  NFS  data  flows   Index  to  task  work   directory  via  NFS   Search   Engine   Reducer   Input   Map   Clustered   documents   index  storage   Failure  of  a  reducer   Search  engine   is  cleaned  up  by   reads  mirrored   map-­‐reduce   index  directly.   framework   10/11/11   ©  MapR  Confiden0al   28  
  29. 29. Simplified  NFS  data  flows   Search   Mirroring  allows   Engine   exact  placement   of  index  data   Reducer   Input   Map   documents   Search   Engine   Aribitrary  levels   of  replica0on   also  possible   Mirrors   10/11/11   ©  MapR  Confiden0al   29  
  30. 30. How  about  another  one?  10/11/11   ©  MapR  Confiden0al   30  
  31. 31. K-­‐means  •  Classic  E-­‐M  based  algorithm  •  Given  cluster  centroids,   •  Assign  each  data  point  to  nearest  centroid   •  Accumulate  new  centroids   •  Rinse,  lather,  repeat   10/11/11   ©  MapR  Confiden0al   31  
  32. 32. K-­‐means,  the  movie   Centroids   I   n   Assign   Aggregate   p   to   new   u   Nearest   centroids   t   centroid   10/11/11   ©  MapR  Confiden0al   32  
  33. 33. But  …  10/11/11   ©  MapR  Confiden0al   33  
  34. 34. Parallel  Stochas0c  Gradient  Descent   Model   I   n   Train   Average   p   sub   models   u   model   t   10/11/11   ©  MapR  Confiden0al   34  
  35. 35. Varia0onal  Dirichlet  Assignment   Model   I   n   Gather   Update   p   sufficient   model   u   sta0s0cs   t   10/11/11   ©  MapR  Confiden0al   35  
  36. 36. Old  tricks,  new  dogs   Read  from  local  disk  •  Mapper   from  distributed  cache   •  Assign  point  to  cluster   Read  from   •  Emit  cluster  id,  (1,  point)   HDFS  to  local  disk   by  distributed  cache  •  Combiner  and  reducer   •  Sum  counts,  weighted  sum  of  points   •  Emit  cluster  id,  (n,  sum/n)   WriQen  by   map-­‐reduce  •  Output  to  HDFS   10/11/11   ©  MapR  Confiden0al   36  
  37. 37. Old  tricks,  new  dogs  •  Mapper   Read   •  Assign  point  to  cluster   from   •  Emit  cluster  id,  (1,  point)   NFS  •  Combiner  and  reducer   •  Sum  counts,  weighted  sum  of  points   •  Emit  cluster  id,  (n,  sum/n)   WriQen  by   map-­‐reduce  •  Output  to  HDFS   MapR  FS   10/11/11   ©  MapR  Confiden0al   37  
  38. 38. Poor  man’s  Pregel  •  Mapper   while not done:! read and accumulate input models! for each input:! accumulate model! write model! synchronize! reset input format! emit summary!•  Lines  in  bold  can  use  conven0onal  I/O  via  NFS   10/11/11   ©  MapR  Confiden0al   38   38  
  39. 39. Click  modeling  architecture   Side-­‐data   Now  via  NFS  I   Feature  n   Sequen0al   extrac0on   Data  p   SGD   and   join  u   Learning   down  t   sampling   Map-­‐reduce   10/11/11   ©  MapR  Confiden0al   39  
  40. 40. Click  modeling  architecture   Side-­‐data   Map-­‐reduce   cooperates   Sequen0al   with  NFS   SGD   Learning   Sequen0al   SGD  I   Learning   Feature  n   Sequen0al   extrac0on   Data  p   SGD   and   join  u   Learning   down  t   sampling   Sequen0al   SGD   Learning   Map-­‐reduce   Map-­‐reduce   10/11/11   ©  MapR  Confiden0al   40  
  41. 41. And  another…  10/11/11   ©  MapR  Confiden0al   41  
  42. 42. Hybrid  model  flow   Feature  extrac0on     and     Down     down  sampling   stream     modeling   Map-­‐reduce   Deployed   Map-­‐reduce   Model   SVD   (PageRank)   (spectral)   ??   10/11/11   ©  MapR  Confiden0al   42  
  43. 43. 10/11/11   ©  MapR  Confiden0al   43  
  44. 44. Hybrid  model  flow   Feature  extrac0on     and     Down     down  sampling   stream     modeling   Deployed   Model   SVD   (PageRank)   (spectral)   Sequen0al   Map-­‐reduce   10/11/11   ©  MapR  Confiden0al   44  
  45. 45. And  visualiza0on…  10/11/11   ©  MapR  Confiden0al   45  
  46. 46. Trivial  visualiza0on  interface  •  Map-­‐reduce  output  is  visible  via  NFS   $ R! > x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)! > plot(error ~ t, x)! > q(save=‘n’)!•  Legacy  visualiza0on  just  works   10/11/11   ©  MapR  Confiden0al   46  
  47. 47. Conclusions  •  We  used  to  know  all  this  •  Tab  comple0on  used  to  work  •  5  years  of  work-­‐arounds  have  clouded  our   memories  •  We  just  have  to  remember  the  future   10/11/11   ©  MapR  Confiden0al   47  

×