SlideShare a Scribd company logo
Analysis of Branch Misses in Quicksort
Sebastian Wild
wild@cs.uni-kl.de
based on joint work with Conrado Martínez and Markus E. Nebel
04 January 2015
Meeting on Analytic Algorithmics and Combinatorics
Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
These probabilities hold for all elements U,
independent of all other elements!
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
< P ?
swap < Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15

More Related Content

Similar to Analysis of branch misses in Quicksort

CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
Dilum Bandara
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
Ravi Soni
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docx
shruti533256
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
fika sweety
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
ssuser87fa0c1
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha2009
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha2009
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
George Markomanolis
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
MadhuGupta99385
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
Ibrahim Hassan
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
TheInnocentTuber
 
Pruning your code
Pruning your codePruning your code
Pruning your code
Frikkie van Biljon
 
PLSQL Advanced
PLSQL AdvancedPLSQL Advanced
PLSQL Advanced
Quang Minh Đoàn
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introduction
home
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
raja khizar
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGA
nehagaur339
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
Irfan Anjum
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
Mazin Alwaaly
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
Mahmudul Hasan
 
W10: Interrupts
W10: InterruptsW10: Interrupts
W10: Interrupts
Daniel Roggen
 

Similar to Analysis of branch misses in Quicksort (20)

CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docx
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
Pruning your code
Pruning your codePruning your code
Pruning your code
 
PLSQL Advanced
PLSQL AdvancedPLSQL Advanced
PLSQL Advanced
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introduction
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGA
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
W10: Interrupts
W10: InterruptsW10: Interrupts
W10: Interrupts
 

More from Sebastian Wild

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum Problems
Sebastian Wild
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Sebastian Wild
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selection
Sebastian Wild
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot sampling
Sebastian Wild
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Sebastian Wild
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keys
Sebastian Wild
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search Trees
Sebastian Wild
 

More from Sebastian Wild (7)

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum Problems
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selection
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot sampling
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keys
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search Trees
 

Recently uploaded

Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Creative-Biolabs
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
Sérgio Sacani
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
suyashempire
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 

Recently uploaded (20)

Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
Signatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coastsSignatures of wave erosion in Titan’s coasts
Signatures of wave erosion in Titan’s coasts
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Male reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptxMale reproduction physiology by Suyash Garg .pptx
Male reproduction physiology by Suyash Garg .pptx
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 

Analysis of branch misses in Quicksort

  • 1. Analysis of Branch Misses in Quicksort Sebastian Wild wild@cs.uni-kl.de based on joint work with Conrado Martínez and Markus E. Nebel 04 January 2015 Meeting on Analytic Algorithmics and Combinatorics Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
  • 2. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 3. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 4. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 5. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 6. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 7. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 8. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 9. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 10. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 11. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 12. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 13. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 14. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 15. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 16. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 17. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 18. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 19. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 20. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 21. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 22. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 23. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 24. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 25. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 26. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 27. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 28. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 29. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 30. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 31. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 32. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 33. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 34. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 35. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 36. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 37. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 38. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 39. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 40. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 41. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 42. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 43. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 44. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 These probabilities hold for all elements U, independent of all other elements! Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 45. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 46. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 47. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 48. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 49. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 50. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 51. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 52. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 53. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 54. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 55. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 56. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 57. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 58. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 59. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 60. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 61. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 62. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 63. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 64. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 65. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 66. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 67. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 68. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 69. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 70. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 71. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 72. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 73. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 74. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 75. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 76. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 77. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 78. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 79. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 80. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 81. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 82. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 83. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 84. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 85. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 86. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 87. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 88. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 89. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 90. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 91. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 92. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 93. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 94. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 95. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 96. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 97. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 98. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 99. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 100. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 101. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 102. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 103. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 104. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 105. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 106. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 107. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input < P ? swap < Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 108. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 109. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 110. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 111. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 112. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 113. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 114. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 115. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 116. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 117. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 118. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 119. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 120. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 121. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 122. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 123. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 124. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 125. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 126. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 127. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 128. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15