Multilevel aggregation for Hadoop/MapReduce

3,481 views
3,097 views

Published on

The presentation at Pre Prestrata/Hadoop World Meetup on 23th, Oct, 2012

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,481
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Multilevel aggregation for Hadoop/MapReduce

  1. 1. Multi-level aggregation for Hadoop MapReduce Tsuyoshi Ozawa NTT © 2012 NTT Software Innovation Center
  2. 2. Overview• Background • Shuffle cost• Approach • Multi-level aggregation• Progress • Discussion on MAPREDUCE-4502 • Design note is available on this JIRA • Prototyped to launch combiner per node © 2012 NTT Software Innovation Center 2
  3. 3. MapReduce Architecture• MapReduce • Programming model for large scale processing • 3 processing phases Map Phase Reduce Phase Shuffle Phase Map Reduce Map Map Reduce Map © 2012 NTT Software Innovation Center 3
  4. 4. Shuffle Phase• What happens? • Reducers retrieve the outputs of Mappers • Mapper side read -> Reducer side write• Problem • Can be bottleneck in jobs • Cause disk IO • Cause network IO• Current Solution for aggregation processing • Combiner • Reduce IO by mapper-side aggregation • Apps: WordCount, N-gram, Co-occurrence of freq. WordCount Example: Data is aggregated (apple, 1,1,1,1) => (apple, 4) => Get smaller! (banana, 1,1) => (banana,2) © 2012 NTT Software Innovation Center 4
  5. 5. Limitation of combiners• Scope is limited within only one MapTask © 2012 NTT Software Innovation Center 5
  6. 6. Limitation of combiners (1) • Scope is limited within only one MapTask 1. Many-core environment • Xeon E5 series : 16 threads /CPU => 16 outputs are generated • These files must be transferred through networkAggregation Per map Map Map Map Map IFile IFile IFile IFile IFile IFile IFile IFile Combiner Combiner Combiner Combiner IFile IFile IFile IFile Still large… Reduce © 2012 NTT Software Innovation Center 6
  7. 7. Limitation of combiners(2) • Scope is limited within only one MapTask 1. Many-core environment • Xeon E5 series : 16 threads /CPU => 16 outputs are generated 2. Processing middle scale data(TB scale) • Processing Larger data needs more network bandwidth & disk IO All raw IFile must be sent 10GbE 1GbE over racksAggregation Per map Map Map 1GbE 1GbEIFile IFile IFile IFile Combiner IFile IFile Reducer © 2012 NTT Software Innovation Center 7
  8. 8. Multi-level aggregation • Aggregating the result of maps per node /rack Smaller IFile is sent 10GbE over racks 1GbE Map Map 1GbE 1GbEIFile IFile IFile IFile Combiner IFile IFile Reducer Aggregation Aggregation Per Node Per Rack © 2012 NTT Software Innovation Center 8
  9. 9. Design Concept• Minimize overhead • Adding new task type causes lots of overheads • Modified Mapper to aggregate at the end stage• Keep the current MapReduce design • Fault tolerance against a few machine failures • Each aggregation must be in Containers for YARN• Point of view from Hadoopers • Easy to switch ON/OFF the feature (ideally, add only one line) Public static void main(String[] argv) { … conf.setCombinerClass(Reducer.class); conf.enableNodeLevelAggregation(); conf.enableRackLevelAggregation(); … } © 2012 NTT Software Innovation Center 9
  10. 10. Progress• Prototype • Modified Mapper to call combiner function at the last stage• Benchmark • Environment • 40 nodes • Core 2 Duo 2.4GHz x2 • Memory 4GB • 1GbE • Configuration • Reducer : 1 • Input • Texts generated by RandomTextWriter • Benchmark Program • In-mapper combined Word Count © 2012 NTT Software Innovation Center 10
  11. 11. Prototype Benchmark – Job Time - ON OFF• About 2 times faster• Shuffle cost is decreased to 50% at most. © 2012 NTT Software Innovation Center 11
  12. 12. TODOs• Node level aggregation with FT• Rack level aggregation with FT • The design note is available at MAPREDUCE-4502 • Need to change umbilical protocol to support FT• Support for High level languages • Pig /Hive support – when issuing “GROUP BY” statement • The other case may be switch off multi-level aggregation © 2012 NTT Software Innovation Center 12
  13. 13. Summary• Multi-level aggregation with combining the result of maps per node /rack • Node /rack-level combiner • Needs extended umbilical protocol for FT• Benchmark with prototype version • 1.7 times faster • Can restrict the shuffle costs maximum 50%• TODOs • Fault Tolerance • Pig /Hive support• Special Thanks to have discussion with me, Chris, Karthik, Siddarsh, Robert, Bikas• Any Feedbacks are welcome! © 2012 NTT Software Innovation Center 13

×