0
Sort of Vinyl: Ordered Record Collection<br />Chris Douglas<br />01.18.2010<br />
Obligatory MapReduce Flow Slide<br />Split 2<br />Map 2<br />Combine*<br />Reduce 1<br />Split 1<br />Map 1<br />hdfs://ho...
Obligatory MapReduce Flow Slide<br />Map Output Collection<br />Split 2<br />Map 2<br />Combine*<br />Reduce 1<br />Split ...
Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-...
Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-...
Awesome!<br />
Problem Description<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />
Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br...
Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br...
Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br...
Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br ...
Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br ...
Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br ...
Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br ...
Problem Description<br />int<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<b...
Problem Description<br />For all calls to collect(K2 keyn, V2 valn):<br /><ul><li>Store result of partition(K2 keyn, V2 valn)
Ordered set of write(byte[], int, int) for keyn
Ordered set of write(byte[], int, int) for valn</li></ul>Challenges:<br /><ul><li>Size of key/value unknown a priori
Records must be grouped for efficient fetch from reduce
Sort occurs after the records are serialized</li></li></ul><li>Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br...
Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />Sequenc...
Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />key0.wr...
Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />key0.wr...
Hadoop (∞, 0.10)<br />Not necessarily true. SeqFile may buffer configurable amount of data to effect block compresion, str...
Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(...
Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(...
Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(...
Hadoop (∞, 0.10)<br />Reduce k<br />Reduce 0<br />…<br />TaskTracker<br />…<br />
Hadoop (∞, 0.10)<br />Reduce k<br />Reduce 0<br />…<br />TaskTracker<br />…<br />
Hadoop (∞, 0.10)<br />Reduce 0<br />sort/merge  localFS<br />…<br />
Hadoop (∞, 0.10)<br />Pro:<br /><ul><li>Complexity of sort/merge encapsulated in SequenceFile, shared between MapTask and ...
Very versatile Combiner semantics (change sort order, partition)</li></ul>Con:<br /><ul><li>Copy/sort can take a long time...
Job cleanup is expensive (e.g. 7k reducer job must delete 7k files per map on that TT)
Combiner is expensive to use and its memory usage is difficult to track
OOMExceptions from untracked memory in buffers, particularly when using compression (HADOOP-570)</li></li></ul><li>Overvie...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />Add memory used by all BufferSorter implementations...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br ...
Hadoop [0.10, 0.17)<br />mergeParts()<br />0<br />0<br />0<br />1<br />1<br />1<br />…<br />…<br />…<br />…<br />…<br />…<...
Hadoop [0.10, 0.17)<br />mergeParts()<br />0<br />0<br />0<br />0<br />1<br />1<br />1<br />…<br />…<br />…<br />…<br />…<...
Hadoop [0.10, 0.17)<br />Reduce 0<br />0<br />1<br />…<br />TaskTracker<br />…<br />…<br />k<br />Reduce k<br />
Hadoop [0.10, 0.17)<br />Reduce 0<br />0<br />1<br />…<br />TaskTracker<br />…<br />…<br />k<br />Reduce k<br />
Hadoop [0.10, 0.17)<br />Pro:<br /><ul><li>Distributes the sort/merge across all maps; reducer need only merge its inputs
Much more predictable memory footprint
Shared, in-memory buffer across all partitions w/ efficient sort
Combines over each spill, defined by memory usage, instead of record count
Running the combiner doesn’t require storing a clone of each record (fewer serializ.)
In 0.16, spill was made concurrent with collection (HADOOP-1965)</li></ul>Con:<br /><ul><li>Expanding buffers may impose a...
MergeSort copies indices on each level of recursion
Deserializing the key/value before appending to the SequenceFile is avoidable
Combiner weakened by requiring sort order and partition to remain consistent
Though tracked, BufferSort instances take non-negligible space (HADOOP-1698)</li></li></ul><li>Overview<br />Hadoop (∞, 0....
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />co...
Hadoop [0.17, 0.22)<br />Pro:<br /><ul><li>Predictable memory footprint, collection (though not spill) agnostic to number ...
No resizing of buffers, copying of serialized record data or metadata
Upcoming SlideShare
Loading in...5
×

Ordered Record Collection

2,944

Published on

Chris Douglas, Yahoo!

0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,944
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
107
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide
  • Every presenter must include a slide like this one, and protocol demands that it contain no fewer than 5 inaccuracies
  • Transcript of "Ordered Record Collection"

    1. 1. Sort of Vinyl: Ordered Record Collection<br />Chris Douglas<br />01.18.2010<br />
    2. 2. Obligatory MapReduce Flow Slide<br />Split 2<br />Map 2<br />Combine*<br />Reduce 1<br />Split 1<br />Map 1<br />hdfs://host:8020/input/data<br />hdfs://host:8020/output/data<br />HDFS<br />HDFS<br />Combine*<br />Reduce 1<br />Split 0<br />Map 0<br />Combine*<br />
    3. 3. Obligatory MapReduce Flow Slide<br />Map Output Collection<br />Split 2<br />Map 2<br />Combine*<br />Reduce 1<br />Split 1<br />Map 1<br />hdfs://host:8020/input/data<br />hdfs://host:8020/output/data<br />HDFS<br />HDFS<br />Combine*<br />Reduce 1<br />Split 0<br />Map 0<br />Combine*<br />
    4. 4. Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-2919<br />
    5. 5. Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-2919<br />Cretaceous<br />Jurassic<br />Triassic<br />
    6. 6. Awesome!<br />
    7. 7. Problem Description<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />
    8. 8. Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />
    9. 9. Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />
    10. 10. Problem Description<br />p0  partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />
    11. 11. Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />key0<br />
    12. 12. Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />key0<br />
    13. 13. Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />key0<br />val0<br />
    14. 14. Problem Description<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />key0<br />val0<br />
    15. 15. Problem Description<br />int<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />Serialization<br />collect(K2,V2)<br />*<br />K2.write(DataOutput)<br />write(byte[], int, int)<br />*<br />V2.write(DataOutput)<br />write(byte[], int, int)<br />key0<br />val0<br />byte[]<br />byte[]<br />
    16. 16. Problem Description<br />For all calls to collect(K2 keyn, V2 valn):<br /><ul><li>Store result of partition(K2 keyn, V2 valn)
    17. 17. Ordered set of write(byte[], int, int) for keyn
    18. 18. Ordered set of write(byte[], int, int) for valn</li></ul>Challenges:<br /><ul><li>Size of key/value unknown a priori
    19. 19. Records must be grouped for efficient fetch from reduce
    20. 20. Sort occurs after the records are serialized</li></li></ul><li>Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-2919<br />Cretaceous<br />Jurassic<br />Triassic<br />
    21. 21. Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />SequenceFile::Writer[p0].append(key0, val0)<br />…<br />…<br />
    22. 22. Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />key0.write(localFS)<br />SequenceFile::Writer[p0].append(key0, val0)<br />val0.write(localFS)<br />…<br />…<br />
    23. 23. Hadoop (∞, 0.10)<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />key0.write(localFS)<br />SequenceFile::Writer[p0].append(key0, val0)<br />val0.write(localFS)<br />…<br />…<br />
    24. 24. Hadoop (∞, 0.10)<br />Not necessarily true. SeqFile may buffer configurable amount of data to effect block compresion, stream buffering, etc.<br />p0 partition(key0,val0)<br />map(K1,V1)<br />*<br />collect(K2,V2)<br />collect(K2,V2)<br />key0.write(localFS)<br />SequenceFile::Writer[p0].append(key0, val0)<br />val0.write(localFS)<br />…<br />…<br />
    25. 25. Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(K2,V2)<br />collect(K2,V2)<br />reduce(keyn, val*)<br />SequenceFile::Writer[p0].append(keyn’, valn’)<br />…<br />p0 partition(key0,val0)<br />…<br />
    26. 26. Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(K2,V2)<br />collect(K2,V2)<br />reduce(keyn, val*)<br />SequenceFile::Writer[p0].append(keyn’, valn’)<br />…<br />p0 partition(key0,val0)<br />…<br />
    27. 27. Hadoop (∞, 0.10)<br />key0<br />key1<br />clone(key0, val0)<br />map(K1,V1)<br />key2<br />*<br />flush()<br />collect(K2,V2)<br />collect(K2,V2)<br />reduce(keyn, val*)<br />SequenceFile::Writer[p0].append(keyn’, valn’)<br />…<br />p0 partition(key0,val0)<br />…<br />Combiner may change the partition and ordering of input records. This is no longer supported<br />
    28. 28. Hadoop (∞, 0.10)<br />Reduce k<br />Reduce 0<br />…<br />TaskTracker<br />…<br />
    29. 29. Hadoop (∞, 0.10)<br />Reduce k<br />Reduce 0<br />…<br />TaskTracker<br />…<br />
    30. 30. Hadoop (∞, 0.10)<br />Reduce 0<br />sort/merge  localFS<br />…<br />
    31. 31. Hadoop (∞, 0.10)<br />Pro:<br /><ul><li>Complexity of sort/merge encapsulated in SequenceFile, shared between MapTask and ReduceTask
    32. 32. Very versatile Combiner semantics (change sort order, partition)</li></ul>Con:<br /><ul><li>Copy/sort can take a long time for each reduce (lost opportunity to parallelize sort)
    33. 33. Job cleanup is expensive (e.g. 7k reducer job must delete 7k files per map on that TT)
    34. 34. Combiner is expensive to use and its memory usage is difficult to track
    35. 35. OOMExceptions from untracked memory in buffers, particularly when using compression (HADOOP-570)</li></li></ul><li>Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-2919<br />Cretaceous<br />Jurassic<br />Triassic<br />
    36. 36. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />
    37. 37. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />
    38. 38. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />
    39. 39. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />
    40. 40. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />
    41. 41. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />Add memory used by all BufferSorter implementations and keyValBuffer. If spill threshold exceeded, then spill contents to disk<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />Keep offset into buffer, length of key, value.<br />
    42. 42. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />*<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />*Sort permutes offsets into (offset,keylen,vallen). Once ordered, each record is output into a SeqFile and the partition offsets recorded<br />0<br />
    43. 43. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />*<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />*Sort permutes offsets into (offset,keylen,vallen). Once ordered, each record is output into a SeqFile and the partition offsets recorded<br />0<br />K2.readFields(DataInput)<br />V2.readFields(DataInput)<br />SequenceFile::append(K2,V2)<br />
    44. 44. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />*If defined, the combiner is now run during the spill, separately over each partition. Values emitted from the combiner are written directly to the output partition.<br />0<br />K2.readFields(DataInput)<br />V2.readFields(DataInput)<br />*<br />&lt;&lt; Combiner &gt;&gt;<br />SequenceFile::append(K2,V2)<br />
    45. 45. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />*<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />0<br />1<br />
    46. 46. Hadoop [0.10, 0.17)<br />map(K1,V1)<br />p0 partition(key0,val0)<br />*<br />collect(K2,V2)<br />K2.write(DataOutput)<br />V2.write(DataOutput)<br />BufferSorter[p0].addKeyValue(recOff, keylen, vallen)<br />…<br />0<br />1<br />k-1<br />k<br />sortAndSpillToDisk()<br />0<br />1<br />…<br />…<br />k<br />
    47. 47. Hadoop [0.10, 0.17)<br />mergeParts()<br />0<br />0<br />0<br />1<br />1<br />1<br />…<br />…<br />…<br />…<br />…<br />…<br />k<br />k<br />k<br />
    48. 48. Hadoop [0.10, 0.17)<br />mergeParts()<br />0<br />0<br />0<br />0<br />1<br />1<br />1<br />…<br />…<br />…<br />…<br />…<br />…<br />k<br />k<br />k<br />
    49. 49. Hadoop [0.10, 0.17)<br />Reduce 0<br />0<br />1<br />…<br />TaskTracker<br />…<br />…<br />k<br />Reduce k<br />
    50. 50. Hadoop [0.10, 0.17)<br />Reduce 0<br />0<br />1<br />…<br />TaskTracker<br />…<br />…<br />k<br />Reduce k<br />
    51. 51. Hadoop [0.10, 0.17)<br />Pro:<br /><ul><li>Distributes the sort/merge across all maps; reducer need only merge its inputs
    52. 52. Much more predictable memory footprint
    53. 53. Shared, in-memory buffer across all partitions w/ efficient sort
    54. 54. Combines over each spill, defined by memory usage, instead of record count
    55. 55. Running the combiner doesn’t require storing a clone of each record (fewer serializ.)
    56. 56. In 0.16, spill was made concurrent with collection (HADOOP-1965)</li></ul>Con:<br /><ul><li>Expanding buffers may impose a performance penalty; used memory calculated on every call to collect(K2,V2)
    57. 57. MergeSort copies indices on each level of recursion
    58. 58. Deserializing the key/value before appending to the SequenceFile is avoidable
    59. 59. Combiner weakened by requiring sort order and partition to remain consistent
    60. 60. Though tracked, BufferSort instances take non-negligible space (HADOOP-1698)</li></li></ul><li>Overview<br />Hadoop (∞, 0.10)<br />Hadoop [ 0.10, 0.17)<br />Hadoop [0.17, 0.22]<br />Lucene<br />HADOOP-331<br />HADOOP-2919<br />Cretaceous<br />Jurassic<br />Triassic<br />
    61. 61. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />
    62. 62. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />io.sort.mb * io.sort.record.percent<br />…<br />io.sort.mb<br />
    63. 63. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />KS.serialize(V2)<br />Instead of explicitly tracking space used by record metadata, allocate a configurable amount of space at the beginning of the task<br />io.sort.mb * io.sort.record.percent<br />…<br />io.sort.mb<br />
    64. 64. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />bufindex<br />bufmark<br />io.sort.mb * io.sort.record.percent<br />kvstart<br />kvend<br />kvindex<br />io.sort.mb<br />
    65. 65. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />bufindex<br />bufmark<br />io.sort.mb * io.sort.record.percent<br />kvstart<br />kvend<br />kvindex<br />io.sort.mb<br />kvoffsets<br />kvindices<br />Partition no longer implicitly tracked. Store (partition, keystart,valstart) for every record collected<br />kvbuffer<br />
    66. 66. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />
    67. 67. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />
    68. 68. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />
    69. 69. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />kvstart<br />kvend<br />kvindex<br />p0<br />bufmark<br />bufindex<br />
    70. 70. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufend<br />kvstart<br />kvend<br />kvindex<br />io.sort.spill.percent<br />bufindex<br />bufmark<br />
    71. 71. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />kvstart<br />kvend<br />kvindex<br />bufend<br />bufindex<br />bufmark<br />
    72. 72. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />bufend<br />
    73. 73. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufstart<br />bufindex<br />bufmark<br />kvstart<br />kvindex<br />kvend<br />bufend<br />
    74. 74. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufindex<br />bufmark<br />kvindex<br />kvstart<br />kvend<br />bufstart<br />bufend<br />
    75. 75. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />Invalid segments in the serialization buffer are marked by bufvoid<br />RawComparator interface requires that the key be contiguous in the byte[]<br />bufmark<br />bufvoid<br />bufindex<br />kvindex<br />kvstart<br />kvend<br />bufstart<br />bufend<br />
    76. 76. Hadoop [0.17, 0.22)<br />map(K1,V1)<br />p0  partition(key0,val0)<br />*<br />Serialization<br />KS.serialize(K2)<br />collect(K2,V2)<br />VS.serialize(V2)<br />bufvoid<br />bufmark<br />bufindex<br />kvindex<br />kvstart<br />kvend<br />bufstart<br />bufend<br />
    77. 77. Hadoop [0.17, 0.22)<br />Pro:<br /><ul><li>Predictable memory footprint, collection (though not spill) agnostic to number of reducers. Most memory used for the sort allocated upfront and maintained for the full task duration.
    78. 78. No resizing of buffers, copying of serialized record data or metadata
    79. 79. Uses SequenceFile::appendRaw to avoid deserialization/serialization pass
    80. 80. Effects record compression in-place (removed in 0.18 with improvements to intermediate data format HADOOP-2095)</li></ul>Other Performance Improvements<br /><ul><li>Improved performance, no metadata copying using QuickSort (HADOOP-3308)
    81. 81. Caching of spill indices (HADOOP-3638)
    82. 82. Run combiner during the merge (HADOOP-3226)
    83. 83. Improved locking and synchronization (HADOOP-{5664,3617})</li></ul>Con:<br /><ul><li>Complexity and new code responsible for several bugs in 0.17
    84. 84. (HADOOP-{3442,3550,3475,3603})
    85. 85. io.sort.record.percent is obscure, critical to performance, and awkward
    86. 86. While predictable, memory usage is arguably too restricted
    87. 87. Really? io.sort.record.percent? (MAPREDUCE-64)</li></li></ul><li>Hadoop [0.22]<br />bufstart<br />bufend<br />bufindex<br />bufmark<br />equator<br />kvstart<br />kvend<br />kvindex<br />
    88. 88. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />
    89. 89. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />
    90. 90. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />
    91. 91. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />
    92. 92. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufmark<br />bufindex<br />p0<br />kvoffsets and kvindices information interlaced into metadata blocks. The sort is effected in a manner identical to 0.17, but metadata is allocated per-record, rather than a priori<br />(kvoffsets)<br />(kvindices)<br />
    93. 93. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />
    94. 94. Hadoop [0.22]<br />bufstart<br />kvstart<br />kvend<br />bufend<br />kvindex<br />equator<br />bufindex<br />bufmark<br />
    95. 95. Hadoop [0.22]<br />bufstart<br />kvstart<br />kvend<br />bufend<br />bufindex<br />bufmark<br />kvindex<br />equator<br />
    96. 96. Hadoop [0.22]<br />kvstart<br />kvend<br />bufstart<br />bufend<br />bufindex<br />bufmark<br />kvindex<br />equator<br />
    97. 97. Hadoop [0.22]<br />bufindex<br />bufmark<br />kvindex<br />equator<br />bufstart<br />bufend<br />kvstart<br />kvend<br />
    98. 98. Hadoop [0.22]<br />bufstart<br />bufend<br />equator<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />
    99. 99. Hadoop [0.22]<br />bufstart<br />kvstart<br />kvend<br />kvindex<br />bufindex<br />bufmark<br />bufend<br />equator<br />
    100. 100. Hadoop [0.22]<br />bufindex<br />kvindex<br />kvstart<br />kvend<br />bufmark<br />bufstart<br />bufend<br />equator<br />
    101. 101. Questions?<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×