Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Published on

MapReduce Data Model ETL & BI Workload Benchmark The complexity of the functions used in Map and/or Reduce has a large impact on the job completion time and network traffic. Yahoo  TeraSort  –  ETL  Workload  –  Most  Network  Intensive   Reducers  Start   Map  Start   Map  Finish   Job  Finish  •  Input,  Shuffle  and  Output  data  size  is  the  same  –  e.g.  10  TB  data  set  in  all  phases  •  Yahoo  Terasort  has  a  more  balanced  Map  vs.  Reduce  funcEons  -­‐  linear  compute  and  IO   Shakespeare  WordCount  –  BI  Workload   Reducers  Start   Map  Finish   Map  Start   Job  Finish   •  Data  set  size  varies  in  various  phase  –  Varying  impact  on  the  network  e.g.  1TB  Input,   10MB  Shuffle,  1MB  Output   •  Most  of  the  processing  in  the  Map  FuncEons,  smaller  intermediate  and  even  smaller  final   Data     18

Published in: Technology, Education
  • Be the first to comment