SlideShare a Scribd company logo
1 of 128
Download to read offline
Analysis of Branch Misses in Quicksort
Sebastian Wild
wild@cs.uni-kl.de
based on joint work with Conrado Martínez and Markus E. Nebel
04 January 2015
Meeting on Analytic Algorithmics and Combinatorics
Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
These probabilities hold for all elements U,
independent of all other elements!
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source




other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)




Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
< P ?
swap < Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
 P ?
swap  Q ?
skip swap g
 
 
 Q ?
 P ? skip
swap swap k

 
 P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
Miss Rate for Branches in Quicksort
without sampling: P
D
= Uniform(0, 1)
E[fOPT(P)] =
ˆ 1
0
min{p, 1 − p} dp = 0.25
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) dp = 0.3
E[f2-bit-sc(P)] =
ˆ 1
0
p(1 − p)
1 − 2p(1 − p)
dp =
π
4
−
1
2
≈ 0.285
E[f2-bit-fc(P)] =
ˆ 1
0
2p2
(1 − p)2
+ p(1 − p)
1 − 2p(1 − p)
dp =
2π
√
3
−
10
3
≈ 0.294
Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15

More Related Content

Similar to Analysis of branch misses in Quicksort

Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelinesturki_09
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln ViewRavi Soni
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docxshruti533256
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementationkavitha2009
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdfMadhuGupta99385
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector ProcessingTheInnocentTuber
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introductionhome
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAnehagaur339
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxIrfan Anjum
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesMahmudul Hasan
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of PipeliningSHAKOOR AB
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptHermellaGashaw
 

Similar to Analysis of branch misses in Quicksort (20)

Topic2a ss pipelines
Topic2a ss pipelinesTopic2a ss pipelines
Topic2a ss pipelines
 
Cs718min1 2008soln View
Cs718min1 2008soln ViewCs718min1 2008soln View
Cs718min1 2008soln View
 
6- Threaded Interpretation.docx
6- Threaded Interpretation.docx6- Threaded Interpretation.docx
6- Threaded Interpretation.docx
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
 
markomanolis_phd_defense
markomanolis_phd_defensemarkomanolis_phd_defense
markomanolis_phd_defense
 
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
 
Core pipelining
Core pipelining Core pipelining
Core pipelining
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
Pruning your code
Pruning your codePruning your code
Pruning your code
 
Workshop gl prt exercises-introduction
Workshop gl prt exercises-introductionWorkshop gl prt exercises-introduction
Workshop gl prt exercises-introduction
 
Assembly p1
Assembly p1Assembly p1
Assembly p1
 
Reducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGAReducing computational complexity of Mathematical functions using FPGA
Reducing computational complexity of Mathematical functions using FPGA
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
 
W10: Interrupts
W10: InterruptsW10: Interrupts
W10: Interrupts
 
Concept of Pipelining
Concept of PipeliningConcept of Pipelining
Concept of Pipelining
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
 

More from Sebastian Wild

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSebastian Wild
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceSebastian Wild
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSebastian Wild
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingSebastian Wild
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Sebastian Wild
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysSebastian Wild
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search TreesSebastian Wild
 

More from Sebastian Wild (7)

Succint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum ProblemsSuccint Data Structures for Range Minimum Problems
Succint Data Structures for Range Minimum Problems
 
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceEntropy Trees & Range-Minimum Queries in Optimal Average-Case Space
Entropy Trees & Range-Minimum Queries in Optimal Average-Case Space
 
Sesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selectionSesquickselect: One and a half pivot for cache efficient selection
Sesquickselect: One and a half pivot for cache efficient selection
 
Average cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot samplingAverage cost of QuickXsort with pivot sampling
Average cost of QuickXsort with pivot sampling
 
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...
 
Median-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keysMedian-of-k Quicksort is optimal for many equal keys
Median-of-k Quicksort is optimal for many equal keys
 
Quicksort and Binary Search Trees
Quicksort and Binary Search TreesQuicksort and Binary Search Trees
Quicksort and Binary Search Trees
 

Recently uploaded

Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabkiyorndlab
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestAkashDTejwani
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)GRAPE
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceLayne Sadler
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function. MUKTA MANJARI SAHOO
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeAjay Kumar Mahato
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfsantiagojoderickdoma
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsHassan Jolany
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPirithiRaju
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Sérgio Sacani
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024suelcarter1
 
Genetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxGenetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxaishnasrivastava
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxROVELYNEDELUNA3
 

Recently uploaded (20)

Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlab
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
Substances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening TestSubstances in Common Use for Shahu College Screening Test
Substances in Common Use for Shahu College Screening Test
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data science
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function.
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenome
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbits
 
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdfPests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
Pests of wheat_Identification, Bionomics, Damage symptoms, IPM_Dr.UPR.pdf
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024
 
Genetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptxGenetic Engineering in bacteria for resistance.pptx
Genetic Engineering in bacteria for resistance.pptx
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptxSCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
SCIENCE 6 QUARTER 3 REVIEWER(FRICTION, GRAVITY, ENERGY AND SPEED).pptx
 

Analysis of branch misses in Quicksort

  • 1. Analysis of Branch Misses in Quicksort Sebastian Wild wild@cs.uni-kl.de based on joint work with Conrado Martínez and Markus E. Nebel 04 January 2015 Meeting on Analytic Algorithmics and Combinatorics Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
  • 2. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 3. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 4. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 5. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 6. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 7. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 8. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 9. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 10. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 11. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 12. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 13. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 14. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 15. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 16. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 17. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 18. Instruction Pipelines Computers do not execute instructions fully sequentially Instead they use an “assembly line” Example: 42 43 44 45 46 47 41 48 ... i := i + 1 a := A[i] IF a < p GOTO 42 j := j - 1 a := A[j] IF a > p GOTO 45 ... each instruction broken in 4 stages simpler steps shorter CPU cycles one instruction per cycle finished ... ... except for branches! 1 undo wrong instructions 2 fill pipeline anew Pipeline stalls are costly ... can we avoid (some of) them? Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
  • 19. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 20. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 21. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 22. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 23. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 24. Branch Prediction We could avoid stalls if we knew whether a branch will be taken or not in general not possible prediction with heuristics: Predict same outcome as last time. (1-bit predictor) 1 2 predict taken predict not taken taken not t. not t. taken Predict most frequent outcome with finite memory (2-bit saturating counter) 1 2 3 4 predict taken predict not taken taken not t. not t. not t. not t. takentakentaken Flip prediction only after two consecutive errors (2-bit flip-consecutive) predicttaken predictnottaken 1 2 3 4 taken not t. taken not t. not t. taken not t. taken wilder heuristics exist out there ... not considered here prediction can be wrong branch miss (BM) Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
  • 25. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 26. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 27. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 28. Why Should We Care? misprediction rates of “typical” programs < 10% (Comparison-based) sorting is different! Branch based on comparison result Comparisons reduce entropy (uncertainty about input) The less comparisons we use, the less predictable they become for classic Quicksort: misprediction rate 25 % with median-of-3: 31.25 % Practical Importance (KALIGOSI & SANDERS, ESA 2006): on Pentium 4 Prescott: very skewed pivot faster than median branch misses dominated running time Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
  • 29. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 30. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 31. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 32. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 33. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 34. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 35. Track Record of Dual-Pivot Quicksort Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS) faster than previously used classic Quicksort (CQS) in practice traditional cost measures do not explain this! CQS YQS Relative Running Time (from various experiments) −10±2% Comparisons 2 1.9 −5% Swaps 0.3 0.6 +80% Bytecode Instructions 18 21.7 +20.6% MMIX oops υ 11 13.1 +19.1% MMIX mems µ 2.6 2.8 +5% scanned elements1 (≈ cache misses) 2 1.6 −20% ·n ln n + O(n) , average case results What about branch misses? Can they explain YQS’s success? ... stay tuned. 1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014 Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
  • 36. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 37. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 38. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 39. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 40. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 41. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P Pr U > P = 1 − P 0 1P Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 42. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 43. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 44. Random Model n i. i. d. elements chosen uniformly in [0, 1] 0 1 U1 U2U3 U4U5U6 U7U8 pairwise distinct almost surely relative ranking is a random permutation equivalent to classic model Consider pivot value P fixed: Pr U < P = P = D1 Pr U > P = 1 − P = D2 0 1P D1 D2 Similarly for dual-pivot Quicksort with pivots P Q Pr U < P = D1 Pr P < U < Q = D2 Pr U > Q = D3 0 1P Q D1 D2 D3 These probabilities hold for all elements U, independent of all other elements! Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
  • 45. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 46. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 47. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics) Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 48. Branches in CQS How many branches in first partitioning step of CQS? one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 49. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 50. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 51. Branches in CQS How many branches in first partitioning step of CQS? Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed. one comparison branch per element U: U < P left partition U > P right partition branch taken with prob. P i. i. d. for all elements U! memoryless source     other branches (loop logic etc.) easy to predict only constant number of mispredictions can be ignored (for leading term asymptotics)     Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
  • 52. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 53. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 54. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 55. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 56. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 57. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 58. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 59. Misprediction Rate for Memoryless Sources Branches taken i. i. d. with probability p. Information theoretic lower bound: Miss rate: fOPT(p) = min{p, 1 − p} Can approach lower bound by estimating p. ˆp ≥ 1 2 taken ˆp < 1 2 not taken But: Actual predictors have very little memory! 1-bit Predictor Wrong prediction whenever value changes Miss rate: f1bit(p) = 2p(1 − p) 1 2 predict taken predict not taken p 1 − p 1 − p p Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
  • 60. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 61. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 62. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 63. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 64. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 65. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 66. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 67. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 68. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 69. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 70. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 71. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 72. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 73. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 74. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 75. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 76. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 77. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 78. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 79. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 80. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 81. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 82. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 83. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 84. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 85. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 86. Misprediction Rate for Memoryless Sources [2] 2-bit Saturating Counter Miss rate? ... depends on state! 1 2 3 4 predict taken predict not taken p 1 − p 1 − p 1 − p 1 − p ppp But: Very fast convergence to steady state different initial state distributions 20 iterations for p = 2 3 use steady-state miss-rate: expected miss rate over states in stationary distribution here: f2-bit-sc(p) = q 1 − 2q with q = p(1 − p). similarly for 2-bit Flip-Consecutive f2-bit-fc(p) = q(1 + 2q) 1 − q . Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
  • 87. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 88. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 89. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 90. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 91. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 92. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 93. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 94. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 95. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 96. Distribution of Pivot Values In (classic) Quicksort branch probability is random expected miss rate: E[f(P)]. (expectation over pivot values P) What is the distribution of P? without sampling: P D = Uniform(0, 1) Typical pivot choice: median of k (in practice: k = 3) or pseudomedian of 9 (“ninther”) Here: more general scheme with parameter t = (t1, t2) Example: k = 6 and t = (3, 2): P t1 t2 t = (0, 0) no sampling t = (t, t) gives median-of-(2t + 1) can also sample skewed pivots Distribution of pivot value: P D = Beta(t1 + 1, t2 + 1) Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
  • 97. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 98. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 99. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 100. Miss Rates for Quicksort Branch expected miss rate given by integral E[f(P)] = ˆ 1 0 f(p) · pt1 (1 − p)t2 B(t + 1) dp e. g. for 1-bit predictor E[f1-bit(P)] = ˆ 1 0 2p(1 − p) · pt1 (1 − p)t2 B(t + 1) dp = 2 (t1 + 1)(t2 + 1) (k + 2)(k + 1) no concise representation for other integrals ... (see paper) but: exact values for fixed t Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
  • 101. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 102. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 103. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 104. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 105. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 106. Miss Rate and Branch Misses Miss Rate for CQS with median of 2t+1: 0 2 4 6 8 0.3 0.4 0.5 0.5 t miss rate OPT 1-bit 2-bit sc 2-bit fc miss rates quickly get bad (close to guessing!) but: less comparisons in total! 0 2 4 6 8 1.4 1.6 1.8 2 1/ ln 2 ·n ln n + O(n) t #cmps Consider number of branch misses: #BM = #comparisons · miss rate Overall BM still grows with t. 0 2 4 6 8 0.5 0.6 0.7 0.5/ ln 2 ·n ln n + O(n) t #BM Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
  • 107. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input < P ? swap < Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 108. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 109. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 110. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 111. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 112. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 113. Branch Misses in YQS Original question: Does YQS better than CQS w. r. t. branch misses? Complication for analysis: 4 branch locations how often they are executed depends on input P ? swap Q ? skip swap g Q ? P ? skip swap swap k P P ≤ ◦ ≤ Q ≥ QP Q Example: C(y1) executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D) branch taken i. i. d. with prob D1 . (conditional on D) expected #BM at C(y1) in first partitioning step: E[(D1 + D2) · f(D1)] · n + O(1) Integrals even more “fun” ... but doable Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
  • 114. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 115. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 116. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 117. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 118. Results CQS vs. YQS Original question: Does YQS better than CQS w. r. t. branch misses? Expected number of branch misses without pivot sampling CQS YQS Relative OPT 0.5 0.513 +2.6% 1-bit 0.6 0.673 +1.0% 2-bit sc 0.571 0.585 +2.5% 2-bit fc 0.589 0.602 +2.2% ·n ln n + O(n) CQS median-of-3 vs. YQS tertiles-of-5 CQS YQS Relative OPT 0.536 0.538 +0.4% 1-bit 0.686 0.687 +0.1% 2-bit sc 0.611 0.613 +0.3% 2-bit fc 0.627 0.629 +0.3% ·n ln n + O(n) essentially same number of BM. Branch misses not a plausible explanation for YQS’s success. Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
  • 119. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 120. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 121. Conclusion Precise analysis of branch misses in Quicksort (CQS and YQS) including pivot sampling lower bounds on branch miss rates CQS and YQS cause very similar number of BM Strengthened evidence for the hypothesis that YQS is faster because of better usage of memory hierarchy. Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
  • 122. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 123. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 124. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 125. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 126. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 127. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15
  • 128. Miss Rate for Branches in Quicksort without sampling: P D = Uniform(0, 1) E[fOPT(P)] = ˆ 1 0 min{p, 1 − p} dp = 0.25 E[f1-bit(P)] = ˆ 1 0 2p(1 − p) dp = 0.3 E[f2-bit-sc(P)] = ˆ 1 0 p(1 − p) 1 − 2p(1 − p) dp = π 4 − 1 2 ≈ 0.285 E[f2-bit-fc(P)] = ˆ 1 0 2p2 (1 − p)2 + p(1 − p) 1 − 2p(1 − p) dp = 2π √ 3 − 10 3 ≈ 0.294 Sebastian Wild Branch Misses in Quicksort 2015-01-04 16 / 15