Nearly-optimal mergesort: Fast, practical sorting methods that optimally adapt to existing runs

1. Nearly-Optimal Mergesorts Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs Sebastian Wild wild@uwaterloo.ca joint work with Ian Munro ESA 2018 26th Annual European Symposium on Algorithms Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 0 / 15

2. Outline 1 Adaptive Sorting – Status Quo1 Adaptive Sorting – Status Quo 2 Natural Mergesort2 Natural Mergesort 3 Peeksort3 Peeksort 4 Powersort4 Powersort 5 Experiments5 Experiments Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 0 / 15

3. Adaptive Sorting Adaptive algorithm: exploit “structure” of input adaptive sorting: exploit “presortedness” few inversions few runs few outliers ...many more Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 1 / 15

9. Adaptive Sorting Adaptive algorithm: exploit “structure” of input adaptive sorting: exploit “presortedness” few inversions few runs few outliers ...many more optimal algorithms known for many measures of presortedness Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 1 / 15

10. Adaptive Sorting Adaptive algorithm: exploit “structure” of input adaptive sorting: exploit “presortedness” few inversions few runs few outliers ...many more optim up to constant factors! al algorithms known for many measures of presortedness Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 1 / 15

11. Adaptive Sorting Adaptive algorithm: exploit “structure” of input adaptive sorting: exploit “presortedness” few inversions few runs few outliers ...many more optim up to constant factors! al algorithms known for many measures of presortedness Want: Optimal up to lower order terms practical methods low overhead for detecting presortedness competitive on inputs without presortedness Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 1 / 15

16. State of the art 1 “fat-pivot” quicksort Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

17. State of the art 1 “fat-pivot” quicksort split < P, = P, > P Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

18. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

19. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

20. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

21. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

22. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

23. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

24. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

25. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

26. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

27. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

28. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

29. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

30. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Observation: Timsort’s merge rules are quite intricate. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

31. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Observation: Timsort’s merge rules are quite intricate. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

32. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Observation: Timsort’s merge rules are quite intricate. ? Why these rules? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

33. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Observation: Timsort’s merge rules are quite intricate. ? Why these rules? ? Why are they so sensitive to sma cf. Java version! ll changes? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

34. State of the art 1 “fat-pivot” quicksort split < P, = P, > P average adapts to duplicate elements optimal up to small constant factor 1.386 (plain quicksort) 1.188 with median-of-3 1.088 with ninther low overhead implementation ... 2 Timsort adaptive mergesort variant ... adapts to existing runs but not optimally! factor ≥ 1.5 worse (Buss & Knop 2018) Timsort still broken! „it is still possible to cause the Java implementation to fail: [...] causing an error at runtime in Java’s sorting method.” Observation: Timsort’s merge rules are quite intricate. ? Why these rules? ? Why are they so sensitive to sma cf. Java version! ll changes? ... and can’t we ﬁnd simpler rules? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 2 / 15

35. Run-Length Entropy Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

36. Run-Length Entropy Our measure of unsortedness: runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

37. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

38. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

39. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

40. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Comparison Lower Bound n! permutations in total Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

41. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Comparison Lower Bound n! permutations in total but sorted within runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

42. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Comparison Lower Bound n! permutations in total but sorted within runs n! L1! · · · Lr! possible inputs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

43. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Comparison Lower Bound n! permutations in total but sorted within runs n! L1! · · · Lr! possible inputs Need lg n! L1! · · · Lr! = H L1 n , . . . , Lr n · n − O(n) comparisons Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

44. Run-Length Entropy Our measure of unsortedness: runs maximal contiguous sorted range simple version: lg(#runs) ﬁne-grained version: entropy of run lengths runs lengths L1, . . . , Lr H L1 n , . . . , Lr n = r i=1 Li n lg n Li Comparison Lower Bound n! permutations in total but sorted within runs n! L1! · · · Lr! possible inputs Need lg n! L1! · · · Lr! = H L only H in the following 1 n , . . . , Lr n · n − O(n) comparisons Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 3 / 15

46. Natural Mergesort “natural” mergesort = run-adaptive mergesort Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

47. Natural Mergesort “natural” mergesort = run-adaptive mergesort (Knuth 1973) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

48. Natural Mergesort “natural” mergesort = run-adaptive mergesort (Knuth 1973) Conceptually two steps: 1 Find runs in input. 2 Merge them Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

49. Natural Mergesort “natural” mergesort = run-adaptive mergesort (Knuth 1973) Concep interleaved in code tually two steps: 1 Find runs in input. 2 Merge them Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

52. Natural Mergesort “natural” mergesort = run-adaptive mergesort (Knuth 1973) Concep interleaved in code tually two steps: 1 Find runs in input. 2 Merge them in some order. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

53. Natural Mergesort “natural” mergesort = run-adaptive mergesort (Knuth 1973) Concep interleaved in code tually two steps: 1 Find runs in input. 2 Merge them in some order (Knuth: simple bottom-up) . Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

54. Natural Mergesort “natural” mergesort = run-adaptive mergesort Conceptually two steps: 1 Find runs in input. 2 Merge them in some order. Here: only binary merges 2 becomes: merge 2 runs, repeat until single run only stable sorts merge 2 adjacent runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

59. Natural Mergesort “natural” mergesort = run-adaptive mergesort Conceptually two steps: 1 Find runs in input. 2 Merge them in some order. Here: only binary merges 2 becomes: merge 2 runs, repeat until single run only stable sorts merge 2 adjacent runs Merge trees: 15 17 12 19 2 9 13 7 11 1 4 8 10 14 23 5 21 3 6 16 18 20 22 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

67. Natural Mergesort “natural” mergesort = run-adaptive mergesort Conceptually two steps: 1 Find runs in input. 2 Merge them in some order. Here: only binary merges 2 becomes: merge 2 runs, repeat until single run only stable sorts merge 2 adjacent runs Merge trees: 15 17 12 19 2 9 13 7 11 1 4 8 10 14 23 5 21 3 6 16 18 20 22 Merge costs cost of merge := size of output ≈ memory transfers #cmps total cost = total area of Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 4 / 15

72. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

73. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

74. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

75. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 merge costs: 42 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

76. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 merge costs: 42 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

77. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 merge costs: 42 2 4 6 8 10 12 14 16 3 5 1 9 7 17 11 13 15 0 merge costs: 71 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

78. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 15 17 12 19 2 9 13 7 11 1 4 8 10 14 23 5 21 3 6 16 18 20 22 Merge cost = total area of Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

79. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 15 17 12 19 2 9 13 7 11 1 4 8 10 14 23 5 21 3 6 16 18 20 22 Merge cost = total area of = total length of paths to all array entries Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

80. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

81. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

82. Mergesort meets search trees Diﬀerent merge trees yield diﬀerent cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

83. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

84. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

85. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

86. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

87. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

88. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

89. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

90. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

91. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

92. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

93. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

94. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

95. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

96. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

97. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge simple (greedy) linear-time methods! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

98. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge simple (greedy) linear-time methods! almost optimal ( H + 2) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

99. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge simple (greedy) linear-time methods! almost optimal ( H + 2) ŏ have to store lengths Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

100. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge simple (greedy) linear-time methods! almost optimal ( H + 2) ŏ have to store lengths ŏ extra scan to detect runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

101. Mergesort meets search trees Different merge trees yield different cost! 2 32 2 6 2 6 Merge cost = total area of = total length of paths to all array entries = weighted external path w leaf weight(w) · depth(w) length optimal merge tree = optimal BST for leaf weights L1, . . . , Lr How to compute good merge tree? Huffman merge merge shortest runs indep. discovered • Golin & Sedgewick 1993 • Takaoka 1998 • Barbay & Navarro 2009 • Chandramouli & Goldstein 2014 must sort lengths not stable Hu-Tucker merge optimal alphabetic tree have to store lengths complicated algorithm nearly-optimal ...70s are calling BST merge simple (greedy) linear-time methods! almost optimal ( H + 2) ŏ have to store lengths ŏ extra scan to detect runs avoidable? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 5 / 15

103. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

108. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

113. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

114. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

115. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 1⁄2 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

118. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 1⁄2 Peeksort can simulate weight-balancing without knowing/storing all runs! “peek” at middle of array to ﬁnd closest run boundary split there and recurse can avoid redundant work: ﬁnd full run straddling midpoint 4 parameters for recursive calls: re s stores outermost runs each run scanned only once Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

126. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 1⁄2 Peeksort can simulate weight-balancing without knowing/storing all runs! “peek” at middle of array to ﬁnd closest run boundary split there and recurse can avoid redundant work: ﬁnd full run straddling midpoint 4 parameters for recursive calls: re s stores outermost runs empty if = e resp. s = r each run scanned only once Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

127. Peeksort Method 1: weight-balancing (Mehlhorn 1975, Bayer 1975) choose root to balance subtree weights recurse on subtrees 1⁄2 1⁄2 1⁄2 Peeksort can simulate weight-balancing without knowing/storing all runs! “peek” at middle of array to ﬁnd closest run boundary split there and recurse can avoid redundant work: ﬁnd full run straddling midpoint 4 parameters for recursive calls: re s stores outermost runs empty if = e resp. s = r each run scanned only once Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 6 / 15

128. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n + (H + 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

129. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n + (H + 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

132. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

133. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + merge cost 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

136. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + merge cost 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Are we done then? Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

137. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + merge cost 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Are we done then? ŏ have to store lengths Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

138. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + merge cost 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Are we done then? ŏ have to store lengths Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

139. Analysis of peeksort Theorem (Horibe 1977, Bayer 1975:) Weight-balancing on leaf probabilities α1, . . . , αr yields a BST with search cost exp. #cmps to ﬁnd random leaf chosen with prob. αi. C H(α1, . . . , αr) + 2. immediate corollary: Peeksort incurs merge cost M (H + 2)n. Peeksort needs C n detect runs + (H + merge cost 2)n cmps. both are optimal up to O(n) terms Peeksort exploits existing runs optimally up to lower order terms! Are we done then? ŏ have to store lengths ŏ extra scan to detect runs one run at a time we load runs (peeking) without putting memory transfers to good use ... can’t we do better? Timsort does better: newly detected run usually merged soon after Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 7 / 15

145. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

146. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

147. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

148. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

149. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

152. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

153. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

156. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

157. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

158. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

159. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 15⁄16 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

160. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 15⁄16 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

161. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 15⁄16 Alternative view: node powers inner node midpoint interval = normalized interval [1..n] → [0, 1] power = min s.t. contains c · 2− depends only on 2 runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

162. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 15⁄16 Alternative view: node powers inner node midpoint interval = normalized interval [1..n] → [0, 1] power = min s.t. contains c · 2− depends only on 2 runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

163. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 44 33 44 22 44 33 44 11 44 33 44 22 44 33 44 15⁄16 Alternative view: node powers inner node midpoint interval = normalized interval [1..n] → [0, 1] power = min s.t. contains c · 2− depends only on 2 runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

164. The bisection heuristic Timsort proceed left to right: detect the next run push it onto stack of runs merge some devil is in the details runs from the stack cannot use weight-balancing Method 2 : bisection (Mehlhorn 1977) 1⁄2 1⁄2 1⁄4 3 ⁄4 weight-balancing chose this! 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 split out of range! no node created 1⁄2 1⁄4 3 ⁄4 1⁄8 7 ⁄8 44 33 44 22 44 33 44 11 44 33 44 22 44 33 44 15⁄16 3 2 1 2 4 Alternative view: node powers inner node midpoint interval = normalized interval [1..n] → [0, 1] power = min s.t. contains c · 2− depends only on 2 runs Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 8 / 15

165. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

169. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 run1 run2 More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

170. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 run1 run2 a – 3 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

171. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 2 run1 run2 a – 3 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

172. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 2 run1 run2 a – 3 b – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

173. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 2 run1 run2 a – 3 b – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

174. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs a b c d e f 3 2 run1 run2 a – 3 b – 2 run stack merge More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

175. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs ab c d e f 2 run2 ab – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

176. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs ab c d e f 2 1 run1 run2 ab – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

177. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs ab c d e f 2 1 run1 run2 ab – 2 c – 1 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

178. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs ab c d e f 2 1 run1 run2 ab – 2 c – 1 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

179. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs ab c d e f 2 1 run1 run2 ab – 2 c – 1 run stack merge More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

180. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 run2 abc – 1 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

181. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 2 run1 run2 abc – 1 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

182. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 2 run1 run2 abc – 1 d – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

183. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 2 4 run1 run2 abc – 1 d – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

184. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 2 4 run1 run2 abc – 1 d – 2 e – 4 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

185. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abc d e f 1 2 4 abc – 1 d – 2 e – 4 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

186. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc d e f 1 2 4 abc – 1 d – 2 e – 4 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

187. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc d e f 1 2 4 abc – 1 d – 2 e – 4 run stack merge More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

188. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc d ef 1 2 abc – 1 d – 2 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

189. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc d ef 1 2 abc – 1 d – 2 run stack merge More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

190. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc def 1 abc – 1 run stack More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

191. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs merge-down phase abc def 1 abc – 1 run stack merge More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

192. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abcdef More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

193. Powersort Powersort proceed left to right: detect next run & compute power push run onto stack of runs while new node less powerful: merge topmost runs abcdef Theorem (Mehlhorn 1977:) The bisection heuristic yields a BST with search cost C H(α1, . . . , αr) + 2. same merge/cmps cost as Peeksort exploit runs optimally up to lower order terms! but: detects runs lazily! no extra scan! More good properties: power eﬃcient to compute; O(1) with bitwise tricks (clz count leading zeros ) never stores more than lg n runs: powers on stack strictly monotonic (highest on top) stack height max power lg n + 1 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 9 / 15

204. Experimental Evaluation Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

205. Experimental Evaluation Hypotheses: Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

206. Experimental Evaluation Hypotheses: 1 Negligible overhead: Peek- and powersort are as fast as standard mergesort on inputs with high H. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

207. Experimental Evaluation Hypotheses: 1 Negligible overhead: Peek- and powersort are as fast as standard mergesort on inputs with high H. 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

208. Experimental Evaluation Hypotheses: 1 Negligible overhead: Peek- and powersort are as fast as standard mergesort on inputs with high H. 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

209. Experimental Evaluation Hypotheses: 1 Negligible overhead: Peek- and powersort are as fast as standard mergesort on inputs with high H. 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Setup: Java implementations, reproduced in C++ mildly hand-tuned code sorting int[]s, length around 107 Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 10 / 15

210. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4 5 6 7 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

211. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4 5 6 7 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

212. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4 5 6 7 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) galloping merge too slow Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

213. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4.2 4.4 4.6 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

214. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4.2 4.4 4.6 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) no signiﬁcant diﬀerence to standard mergesort Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

215. Negligible Overhead 1 Negligible overhead: Peek- and powersort are as good as standard mergesort on inputs with high H. Study: random permutations, Java runtimes time n lg n C++ 105 106 107 108 4.2 4.4 4.6 top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (Timsort w/o galloping) Arrays.sort(int[]) (quicksort, not stable) no signiﬁcant diﬀerence to standard mergesort Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 11 / 15

216. Run-adaptiveness helps 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. Study: “random runs”: rp w/ ranges of Geo(1/ √ n) sorted, Java runtimes, n = 107 C++ ≈ √ n runs, avg length √ n 10% of runs < 0.1 √ n 5% of runs > 5 √ n moderate presortedness Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 12 / 15

219. Run-adaptiveness helps 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. Study: “random runs”: rp w/ ranges of Geo(1/ √ n) sorted, Java runtimes, n = 107 C++ ≈ √ n runs, avg length √ n 10% of runs < 0.1 √ n 5% of runs > 5 √ n moderate presortedness 500 600 700 time (ms) top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (no galloping) Arrays.sort(int[]) merge cost Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 12 / 15

220. Run-adaptiveness helps 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. Study: “random runs”: rp w/ ranges of Geo(1/ √ n) sorted, Java runtimes, n = 107 C++ ≈ √ n runs, avg length √ n 10% of runs < 0.1 √ n 5% of runs > 5 √ n moderate presortedness 500 600 700 time (ms) top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (no galloping) Arrays.sort(int[]) merge cost beat quicksort by 20% Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 12 / 15

221. Run-adaptiveness helps 2 Run-adaptiveness helps: Adaptive methods are faster on inputs with low H. Study: “random runs”: rp w/ ranges of Geo(1/ √ n) sorted, Java runtimes, n = 107 C++ ≈ √ n runs, avg length √ n 10% of runs < 0.1 √ n 5% of runs > 5 √ n moderate presortedness 500 600 700 time (ms) top-down mergesort bottom-up mergesort peeksort powersort Timsort trotsort (no galloping) Arrays.sort(int[]) merge cost beat quicksort by 20% Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 12 / 15

222. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

223. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ 1,200 1,400 1,600 1,800 2,000 time (ms) td-mergesort bu-mergesort peeksort powersort Timsort trotsort Arrays.sort(int[]) 0.8 0.9 1 1.1 1.2 1.3 merge costs (normalized) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

224. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ 1,200 1,400 1,600 1,800 2,000 time (ms) td-mergesort bu-mergesort peeksort powersort Timsort trotsort Arrays.sort(int[]) 0.8 0.9 1 1.1 1.2 1.3 merge costs (normalized) Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

225. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ 1,200 1,400 1,600 1,800 2,000 time (ms) td-mergesort bu-mergesort peeksort powersort Timsort trotsort Arrays.sort(int[]) 0.8 0.9 1 1.1 1.2 1.3 merge costs (normalized) Timsort/trotsort has 40% higher merge cost 10% higher running time in Java Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

226. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ 1,200 1,400 1,600 1,800 2,000 time (ms) td-mergesort bu-mergesort peeksort powersort Timsort trotsort Arrays.sort(int[]) 0.8 0.9 1 1.1 1.2 1.3 merge costs (normalized) Timsort/trotsort has 40% higher merge cost 10% higher running time in Java Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

227. Timsort’s weak point 3 Timsort’s weak point: Timsort is much slower than peek-/powersort on certain inputs. Study: “Timsort-drags”: known family of bad-case sequences Rtim(n) by Buss & Knop 2018 L1, . . . , Lr Java runtimes, n = 224 ≈ 1.6 · 107 C++ 1,200 1,400 1,600 1,800 2,000 time (ms) td-mergesort bu-mergesort peeksort powersort Timsort trotsort Arrays.sort(int[]) 0.8 0.9 1 1.1 1.2 1.3 merge costs (normalized) Timsort/trotsort has 40% higher merge cost 10% higher running time in Java 40% higher running time in C++ Sebastian Wild Nearly-Optimal Mergesorts 2018-08-20 13 / 15

Nearly-optimal mergesort: Fast, practical sorting methods that optimally adapt to existing runs

Recommended

Recommended

More Related Content

More from Sebastian Wild

More from Sebastian Wild (6)

Recently uploaded

Recently uploaded (20)

Nearly-optimal mergesort: Fast, practical sorting methods that optimally adapt to existing runs