These are the slides of my talk at the Meeting on Analytic Algorithmics and Combinatorics 2015 (ANALCO15) on branch mispredictions in classic Quicksort and Yaroslavskiy's dual-pivot Quicksort used in Java 7.
The talk is based on joint work with Conrado Martínez and Markus E. Nebel.
Find more information and the corresponding paper on my website: http://wwwagak.cs.uni-kl.de/sebastian-wild.html
On codes, machines, and environments: reflections and experiencesVincenzo De Florio
Code explicitly refers to a reference machine and, implicitly, to a set of conditions often called the system model and the fault model.
If one wants to guarantee an agreed-upon quality of service, one needs to either make assumptions about those conditions or adapt to them.
In this lecture I present this problem and a number of solutions, both practical and theoretical, that I have devised in the course of my career.
Although the main accent is on programming languages, here I provide links and references to other approaches that operate at algorithmic- and system-level.
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmSebastian Wild
I gave this talk at the 24th International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2013) on Menorca (Spain).
A paper covering the analyses of this talk (and some more!) has been submitted.
Also, in the talk, I refer to the previous speaker at the conference, my advisor Markus Nebel - corresponding results can be found in an earlier talk of mine:
slideshare.net/sebawild/average-case-analysis-of-java-7s-dual-pivot-quicksort
Check my website for preprints of papers and my other talks:
wwwagak.cs.uni-kl.de/sebastian-wild.html
Engineering Java 7's Dual Pivot Quicksort Using MaLiJAnSebastian Wild
I gave this talk at the 2013 Meeting On Algorithm Engineering and Experiments (ALENEX) meeting.
Find my other talks and the corresponding papers on my web page:
http://wwwagak.cs.uni-kl.de/sebastian-wild.html
Average Case Analysis of Java 7’s Dual Pivot QuicksortSebastian Wild
I gave this talk at the European Symposium on Algorithms 2012 in Ljubljana (Slowenia).
The corresponding paper won the best paper award.
Find my other talks and all corresponding papers on my web page:
http://wwwagak.cs.uni-kl.de/sebastian-wild.html
The document describes dual-pivot quicksort, which uses two pivot elements rather than one. It summarizes previous research that found dual-pivot quicksort often improves upon classic quicksort by reducing the number of element comparisons and cache misses, though it increases the number of swaps. The document then focuses on Yaroslavskiy's dual-pivot partitioning algorithm, which efficiently arranges elements into three groups - less than the first pivot, between the pivots, and greater than the second pivot - through in-place swapping.
On codes, machines, and environments: reflections and experiencesVincenzo De Florio
Code explicitly refers to a reference machine and, implicitly, to a set of conditions often called the system model and the fault model.
If one wants to guarantee an agreed-upon quality of service, one needs to either make assumptions about those conditions or adapt to them.
In this lecture I present this problem and a number of solutions, both practical and theoretical, that I have devised in the course of my career.
Although the main accent is on programming languages, here I provide links and references to other approaches that operate at algorithmic- and system-level.
Quickselect Under Yaroslavskiy's Dual Pivoting AlgorithmSebastian Wild
I gave this talk at the 24th International Meeting on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2013) on Menorca (Spain).
A paper covering the analyses of this talk (and some more!) has been submitted.
Also, in the talk, I refer to the previous speaker at the conference, my advisor Markus Nebel - corresponding results can be found in an earlier talk of mine:
slideshare.net/sebawild/average-case-analysis-of-java-7s-dual-pivot-quicksort
Check my website for preprints of papers and my other talks:
wwwagak.cs.uni-kl.de/sebastian-wild.html
Engineering Java 7's Dual Pivot Quicksort Using MaLiJAnSebastian Wild
I gave this talk at the 2013 Meeting On Algorithm Engineering and Experiments (ALENEX) meeting.
Find my other talks and the corresponding papers on my web page:
http://wwwagak.cs.uni-kl.de/sebastian-wild.html
Average Case Analysis of Java 7’s Dual Pivot QuicksortSebastian Wild
I gave this talk at the European Symposium on Algorithms 2012 in Ljubljana (Slowenia).
The corresponding paper won the best paper award.
Find my other talks and all corresponding papers on my web page:
http://wwwagak.cs.uni-kl.de/sebastian-wild.html
The document describes dual-pivot quicksort, which uses two pivot elements rather than one. It summarizes previous research that found dual-pivot quicksort often improves upon classic quicksort by reducing the number of element comparisons and cache misses, though it increases the number of swaps. The document then focuses on Yaroslavskiy's dual-pivot partitioning algorithm, which efficiently arranges elements into three groups - less than the first pivot, between the pivots, and greater than the second pivot - through in-place swapping.
CPU Pipelining and Hazards - An IntroductionDilum Bandara
Pipelining is a technique used in computer architecture to overlap the execution of instructions to increase throughput. It works by breaking down instruction execution into a series of steps and allowing subsequent instructions to begin execution before previous ones complete. This allows multiple instructions to be in various stages of completion simultaneously. Pipelining improves performance but introduces hazards such as structural, data, and control hazards that can reduce the ideal speedup if not addressed properly. Control hazards due to branches are particularly challenging to handle efficiently.
i) Introducing a new instruction that replaces existing instructions will decrease the number of instructions (N) but likely increase the clock period (T) and cycles per instruction (C) as the new instruction has a more complex task. The maximum possible performance improvement is 4x.
ii) Pipelining will decrease the cycles per instruction (C) but may increase it above 1 due to hazards. The clock period (T) and number of instructions (N) will remain unchanged.
iii) Splitting the stage with maximum delay will decrease the clock period (T) and increase the cycles per instruction (C) as it introduces an additional cycle for affected instructions. But the number of instructions (N) remains
This document discusses different methods for emulating one instruction set architecture (ISA) on a system with a different ISA, including interpretation and binary translation. It describes the basic decoding-and-dispatch interpreter and how threaded interpretation improves performance by removing branches. Threaded interpretation can be indirect through a dispatch table or direct by predecoding instructions. Emulation of complex ISAs requires specialized decoding and dispatch routines to improve performance over a general approach.
The document discusses pipelining in computer processors. It describes how pipelining can increase throughput by overlapping the execution of multiple instructions. It discusses the basic pipeline stages for a RISC instruction set, including fetch, decode, execute, memory access, and writeback. It also describes several types of pipeline hazards that can occur, such as structural hazards caused by resource conflicts, data hazards when instructions depend on previous results, and control hazards with branches. Forwarding techniques are presented to help address data hazards.
This document provides an overview of implementing a simplified MIPS processor with a memory-reference instructions, arithmetic-logical instructions, and control flow instructions. It discusses:
1. Using a program counter to fetch instructions from memory and reading register operands.
2. Executing most instructions via fetching, operand fetching, execution, and storing in a single cycle.
3. Building a datapath with functional units for instruction fetching, ALU operations, memory references, and branches/jumps.
4. Implementing control using a finite state machine that sets multiplexers and control lines based on the instruction.
This document provides an overview of implementing a simplified MIPS processor with a memory-reference instructions, arithmetic-logical instructions, and control flow instructions. It discusses:
1. Using a program counter to fetch instructions from memory and reading register operands.
2. Executing most instructions via fetching, operand fetching, execution, and storing in a single cycle.
3. Building a datapath with functional units for instruction fetching, ALU operations, memory references, and branches/jumps.
4. Implementing control using a finite state machine that sets multiplexers and control lines based on the instruction.
Georgios Markomanolis presented his PhD thesis on performance evaluation and prediction of parallel applications through trace-based simulation. He developed a trace acquisition framework that decouples trace collection from the execution environment. This allows acquiring traces from large application runs in a scalable way. He also created a trace replay tool built on a fast simulation kernel that accurately replays execution traces on different system configurations. The framework was experimentally evaluated using NAS benchmarks, demonstrating scalable trace acquisition and accurate simulation results.
Pipelining is a technique where a microprocessor can begin executing the next instruction before finishing the previous one. It works by dividing instruction processing into discrete stages - fetch, decode, execute, memory, and write back. When an instruction enters one stage, the next instruction can enter the following stage so that multiple instructions are in different stages at the same time, improving efficiency. The pipeline allows for faster overall processing but hazards can occur if instructions depend on previous ones, disrupting the smooth flow.
This presentation provides an overview of instruction pipelining in computer processors. It begins with defining pipelining as a process that allows storing, prioritizing, managing and executing tasks and instructions in an orderly process within a single processor. This allows faster throughput than processing instructions sequentially. The presentation then discusses how pipelining improves performance by overlapping the fetch, decode, execute, and write stages of instruction processing. It also identifies potential problems like data hazards that can occur and techniques like forwarding to handle hazards. In the end, the presentation demonstrates pipelined instruction processing and encourages questions.
The document discusses parallel processing and pipelining techniques in computer organization. It covers topics like parallel processing concepts and classifications, pipelining concepts and how it increases computational speed, arithmetic and instruction pipelining, handling pipeline hazards like data dependencies and branches. The key advantages of pipelining include decomposing tasks into sequential sub-operations that can complete concurrently, improving throughput and achieving speedup close to the number of pipeline stages when the number of tasks is large.
The document discusses pruning code by removing unnecessary branching and conditional statements. It provides examples of how to "prune" code by refactoring conditional logic using polymorphism instead of if/else statements or switch cases. This avoids deep nesting and duplication of conditions. It also moves the condition check to a single location rather than having it scattered throughout the code. The benefits mentioned are that pruned code is easier to read, understand, test, maintain and extend, and can provide performance gains by reducing branches.
This document provides an overview of advanced PL/SQL concepts such as flow control, bulk processing, Oracle hints, and resources. It discusses techniques for optimizing PL/SQL code through improved loop and conditional logic. Bulk processing using FORALL is described as enabling set-based operations. Oracle hints are introduced as a way to suggest execution plans to the optimizer. Parallel query is explained as a way to improve performance on multi-processor systems. Finally, resources for further reading are listed.
This document provides instructions for exercises on a computer simulation of closed-loop control systems. The exercises are designed to teach students about dynamic processes and help them understand relationships between dynamic processes. The training equipment uses a process control system called Freelance, which works with a single engineering tool called Control Builder F. The document outlines the hardware structure of the simulation, how to operate it using DigiVis for monitoring, and how to use the Freelance controller emulator. It then provides specific exercises for students to complete, including recording step responses of control loops with different time constants to observe their behavior.
Pipelining is a technique used in microprocessors to overlap the execution of multiple instructions by dividing instruction execution into discrete stages. It allows the next instruction to begin executing before the previous one has finished. The pipeline is divided into segments that perform discrete operations concurrently. This improves processor throughput by allowing new instructions to enter the pipeline every clock cycle.
This document discusses different types of instruction hazards in pipelines including structural hazards, data hazards, and control hazards. It focuses on control hazards caused by branches, where the destination of the branch is unknown until it is evaluated. To resolve this, it discusses different branch prediction strategies like stalling, deciding the branch in the ID stage, delayed branches using compiler reordering, and branch prediction. Branch prediction involves using a branch history table (BHT) to predict if the branch will be taken or not based on its past behavior. The document provides statistics on typical branch behavior and analyzes the accuracy of 1-bit branch prediction. It also discusses scheduling instructions into the delay slot of delayed branches.
Design pipeline architecture for various stage pipelinesMahmudul Hasan
This document discusses the concepts of single-cycle control, multi-cycle control, and pipelining in processors. It explains that single-cycle control has a low CPI but a long clock period, while multi-cycle control has a short clock period but high CPI. Pipelining allows overlapping the execution of instructions to improve throughput. The document presents diagrams of 5-stage instruction pipelines and describes the fetch, decode, execute, memory, and write-back stages. It also discusses pipeline hazards and performance improvements from pipelining over single-cycle and multi-cycle designs.
In this unit we introduce interrupts in processors and microcontrollers. We explain how the UoS processor (which doesn't support interrupts currently) could be extended to support interrupts.
Unit duration: 50mn.
License: LGPL 2.1
Succint Data Structures for Range Minimum ProblemsSebastian Wild
This was an invited talk I gave at Purdue University. It introduces some concepts and techniques of succinct data structures along the example of the range-minimum query problem, and presents my new, average-case space optimal solution.
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceSebastian Wild
I gave this talk at the Dagstuhl Seminar 19051
on Data Structures for the Cloud and External Memory Data
(https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19051)
CPU Pipelining and Hazards - An IntroductionDilum Bandara
Pipelining is a technique used in computer architecture to overlap the execution of instructions to increase throughput. It works by breaking down instruction execution into a series of steps and allowing subsequent instructions to begin execution before previous ones complete. This allows multiple instructions to be in various stages of completion simultaneously. Pipelining improves performance but introduces hazards such as structural, data, and control hazards that can reduce the ideal speedup if not addressed properly. Control hazards due to branches are particularly challenging to handle efficiently.
i) Introducing a new instruction that replaces existing instructions will decrease the number of instructions (N) but likely increase the clock period (T) and cycles per instruction (C) as the new instruction has a more complex task. The maximum possible performance improvement is 4x.
ii) Pipelining will decrease the cycles per instruction (C) but may increase it above 1 due to hazards. The clock period (T) and number of instructions (N) will remain unchanged.
iii) Splitting the stage with maximum delay will decrease the clock period (T) and increase the cycles per instruction (C) as it introduces an additional cycle for affected instructions. But the number of instructions (N) remains
This document discusses different methods for emulating one instruction set architecture (ISA) on a system with a different ISA, including interpretation and binary translation. It describes the basic decoding-and-dispatch interpreter and how threaded interpretation improves performance by removing branches. Threaded interpretation can be indirect through a dispatch table or direct by predecoding instructions. Emulation of complex ISAs requires specialized decoding and dispatch routines to improve performance over a general approach.
The document discusses pipelining in computer processors. It describes how pipelining can increase throughput by overlapping the execution of multiple instructions. It discusses the basic pipeline stages for a RISC instruction set, including fetch, decode, execute, memory access, and writeback. It also describes several types of pipeline hazards that can occur, such as structural hazards caused by resource conflicts, data hazards when instructions depend on previous results, and control hazards with branches. Forwarding techniques are presented to help address data hazards.
This document provides an overview of implementing a simplified MIPS processor with a memory-reference instructions, arithmetic-logical instructions, and control flow instructions. It discusses:
1. Using a program counter to fetch instructions from memory and reading register operands.
2. Executing most instructions via fetching, operand fetching, execution, and storing in a single cycle.
3. Building a datapath with functional units for instruction fetching, ALU operations, memory references, and branches/jumps.
4. Implementing control using a finite state machine that sets multiplexers and control lines based on the instruction.
This document provides an overview of implementing a simplified MIPS processor with a memory-reference instructions, arithmetic-logical instructions, and control flow instructions. It discusses:
1. Using a program counter to fetch instructions from memory and reading register operands.
2. Executing most instructions via fetching, operand fetching, execution, and storing in a single cycle.
3. Building a datapath with functional units for instruction fetching, ALU operations, memory references, and branches/jumps.
4. Implementing control using a finite state machine that sets multiplexers and control lines based on the instruction.
Georgios Markomanolis presented his PhD thesis on performance evaluation and prediction of parallel applications through trace-based simulation. He developed a trace acquisition framework that decouples trace collection from the execution environment. This allows acquiring traces from large application runs in a scalable way. He also created a trace replay tool built on a fast simulation kernel that accurately replays execution traces on different system configurations. The framework was experimentally evaluated using NAS benchmarks, demonstrating scalable trace acquisition and accurate simulation results.
Pipelining is a technique where a microprocessor can begin executing the next instruction before finishing the previous one. It works by dividing instruction processing into discrete stages - fetch, decode, execute, memory, and write back. When an instruction enters one stage, the next instruction can enter the following stage so that multiple instructions are in different stages at the same time, improving efficiency. The pipeline allows for faster overall processing but hazards can occur if instructions depend on previous ones, disrupting the smooth flow.
This presentation provides an overview of instruction pipelining in computer processors. It begins with defining pipelining as a process that allows storing, prioritizing, managing and executing tasks and instructions in an orderly process within a single processor. This allows faster throughput than processing instructions sequentially. The presentation then discusses how pipelining improves performance by overlapping the fetch, decode, execute, and write stages of instruction processing. It also identifies potential problems like data hazards that can occur and techniques like forwarding to handle hazards. In the end, the presentation demonstrates pipelined instruction processing and encourages questions.
The document discusses parallel processing and pipelining techniques in computer organization. It covers topics like parallel processing concepts and classifications, pipelining concepts and how it increases computational speed, arithmetic and instruction pipelining, handling pipeline hazards like data dependencies and branches. The key advantages of pipelining include decomposing tasks into sequential sub-operations that can complete concurrently, improving throughput and achieving speedup close to the number of pipeline stages when the number of tasks is large.
The document discusses pruning code by removing unnecessary branching and conditional statements. It provides examples of how to "prune" code by refactoring conditional logic using polymorphism instead of if/else statements or switch cases. This avoids deep nesting and duplication of conditions. It also moves the condition check to a single location rather than having it scattered throughout the code. The benefits mentioned are that pruned code is easier to read, understand, test, maintain and extend, and can provide performance gains by reducing branches.
This document provides an overview of advanced PL/SQL concepts such as flow control, bulk processing, Oracle hints, and resources. It discusses techniques for optimizing PL/SQL code through improved loop and conditional logic. Bulk processing using FORALL is described as enabling set-based operations. Oracle hints are introduced as a way to suggest execution plans to the optimizer. Parallel query is explained as a way to improve performance on multi-processor systems. Finally, resources for further reading are listed.
This document provides instructions for exercises on a computer simulation of closed-loop control systems. The exercises are designed to teach students about dynamic processes and help them understand relationships between dynamic processes. The training equipment uses a process control system called Freelance, which works with a single engineering tool called Control Builder F. The document outlines the hardware structure of the simulation, how to operate it using DigiVis for monitoring, and how to use the Freelance controller emulator. It then provides specific exercises for students to complete, including recording step responses of control loops with different time constants to observe their behavior.
Pipelining is a technique used in microprocessors to overlap the execution of multiple instructions by dividing instruction execution into discrete stages. It allows the next instruction to begin executing before the previous one has finished. The pipeline is divided into segments that perform discrete operations concurrently. This improves processor throughput by allowing new instructions to enter the pipeline every clock cycle.
This document discusses different types of instruction hazards in pipelines including structural hazards, data hazards, and control hazards. It focuses on control hazards caused by branches, where the destination of the branch is unknown until it is evaluated. To resolve this, it discusses different branch prediction strategies like stalling, deciding the branch in the ID stage, delayed branches using compiler reordering, and branch prediction. Branch prediction involves using a branch history table (BHT) to predict if the branch will be taken or not based on its past behavior. The document provides statistics on typical branch behavior and analyzes the accuracy of 1-bit branch prediction. It also discusses scheduling instructions into the delay slot of delayed branches.
Design pipeline architecture for various stage pipelinesMahmudul Hasan
This document discusses the concepts of single-cycle control, multi-cycle control, and pipelining in processors. It explains that single-cycle control has a low CPI but a long clock period, while multi-cycle control has a short clock period but high CPI. Pipelining allows overlapping the execution of instructions to improve throughput. The document presents diagrams of 5-stage instruction pipelines and describes the fetch, decode, execute, memory, and write-back stages. It also discusses pipeline hazards and performance improvements from pipelining over single-cycle and multi-cycle designs.
In this unit we introduce interrupts in processors and microcontrollers. We explain how the UoS processor (which doesn't support interrupts currently) could be extended to support interrupts.
Unit duration: 50mn.
License: LGPL 2.1
Similar to Analysis of branch misses in Quicksort (20)
Succint Data Structures for Range Minimum ProblemsSebastian Wild
This was an invited talk I gave at Purdue University. It introduces some concepts and techniques of succinct data structures along the example of the range-minimum query problem, and presents my new, average-case space optimal solution.
Entropy Trees & Range-Minimum Queries in Optimal Average-Case SpaceSebastian Wild
I gave this talk at the Dagstuhl Seminar 19051
on Data Structures for the Cloud and External Memory Data
(https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19051)
Sesquickselect: One and a half pivot for cache efficient selectionSebastian Wild
These are the slides for my ANALCO about sesquickselect, a novel quickselect variant. The paper and further details here: https://www.wild-inter.net/publications/martinez-nebel-wild-2019
Average cost of QuickXsort with pivot samplingSebastian Wild
The document discusses QuickXsort, a variant of Quicksort that uses a sorting algorithm X to sort one subproblem during recursion. QuickXsort is described as using Mergesort for X to achieve near-optimal comparison counts while sorting in-place. Merging in Mergesort is explained as possible through swapping elements between runs and a buffer to merge runs together without using extra space.
Nearly-optimal mergesort: Fast, practical sorting methods that optimally adap...Sebastian Wild
Mergesort can make use of existing order in the input by picking up existing runs, i.e., sorted segments. Since the lengths of these runs can be arbitrary, simply merging them as they arrive can be wasteful—merging can degenerate to inserting a single elements into long run.
In this talk, I show that we can find an optimal merging order (up to lower order terms of costs) with negligible overhead and thereby get the same worst-case guarantee as for standard mergesort (up to lower order terms), while exploiting existing runs if present. I present two new mergesort variants, peeksort and powersort, that are simple, stable, optimally adaptive and fast in practice (never slower than standard mergesort and Timsort, but significantly faster on certain inputs).
This talk was given at ESA 2018 and is based on joint work with Ian Munro. ItThe paper and further information can be found on my website:
https://www.wild-inter.net/publications/munro-wild-2018
The document describes the process of quicksort and building a binary search tree on the same data. It shows quicksort sorting an array from 7 4 2 9 1 3 8 5 6 to its sorted order, and building a corresponding binary search tree from the sorted array. It notes that the recursion tree of quicksort is equivalent to the built binary search tree, and that the number of comparisons in quicksort equals building and searching the tree. It questions how median-of-three quicksort and other variants relate to building fringe-balanced search trees.
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Creative-Biolabs
Neutralizing antibodies, pivotal in immune defense, specifically bind and inhibit viral pathogens, thereby playing a crucial role in protecting against and mitigating infectious diseases. In this slide, we will introduce what antibodies and neutralizing antibodies are, the production and regulation of neutralizing antibodies, their mechanisms of action, classification and applications, as well as the challenges they face.
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxgoluk9330
Ahota Beel, nestled in Sootea Biswanath Assam , is celebrated for its extraordinary diversity of bird species. This wetland sanctuary supports a myriad of avian residents and migrants alike. Visitors can admire the elegant flights of migratory species such as the Northern Pintail and Eurasian Wigeon, alongside resident birds including the Asian Openbill and Pheasant-tailed Jacana. With its tranquil scenery and varied habitats, Ahota Beel offers a perfect haven for birdwatchers to appreciate and study the vibrant birdlife that thrives in this natural refuge.
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Anti-Universe And Emergent Gravity and the Dark UniverseSérgio Sacani
Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.
Anti-Universe And Emergent Gravity and the Dark Universe
Analysis of branch misses in Quicksort
1. Analysis of Branch Misses in Quicksort
Sebastian Wild
wild@cs.uni-kl.de
based on joint work with Conrado Martínez and Markus E. Nebel
04 January 2015
Meeting on Analytic Algorithmics and Combinatorics
Sebastian Wild Branch Misses in Quicksort 2015-01-04 1 / 15
2. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
3. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
4. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
5. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
6. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
7. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
8. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
9. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
10. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
11. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
12. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
13. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
14. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
15. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
16. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
17. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
18. Instruction Pipelines
Computers do not execute
instructions fully sequentially
Instead they use an “assembly line”
Example:
42
43
44
45
46
47
41
48
...
i := i + 1
a := A[i]
IF a < p GOTO 42
j := j - 1
a := A[j]
IF a > p GOTO 45
...
each instruction broken in 4 stages
simpler steps shorter CPU cycles
one instruction per cycle finished ...
... except for branches!
1 undo wrong instructions
2 fill pipeline anew
Pipeline stalls are costly ... can we avoid (some of) them?
Sebastian Wild Branch Misses in Quicksort 2015-01-04 2 / 15
19. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
20. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
21. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
22. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
23. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
24. Branch Prediction
We could avoid stalls if we knew
whether a branch will be taken or not
in general not possible prediction with heuristics:
Predict same outcome as last time.
(1-bit predictor) 1 2
predict taken predict not taken
taken
not t. not t.
taken
Predict most frequent outcome with
finite memory (2-bit saturating counter) 1 2 3 4
predict taken predict not taken
taken
not t. not t. not t. not t.
takentakentaken
Flip prediction only after two
consecutive errors (2-bit flip-consecutive)
predicttaken
predictnottaken
1
2
3
4
taken
not t.
taken
not t.
not t.
taken
not t.
taken
wilder heuristics exist out there ...
not considered here
prediction can be wrong branch miss (BM)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 3 / 15
25. Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
26. Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
27. Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
28. Why Should We Care?
misprediction rates of “typical” programs < 10%
(Comparison-based) sorting is different!
Branch based on comparison result
Comparisons reduce entropy (uncertainty about input)
The less comparisons we use, the less predictable they become
for classic Quicksort: misprediction rate 25 %
with median-of-3: 31.25 %
Practical Importance (KALIGOSI & SANDERS, ESA 2006):
on Pentium 4 Prescott: very skewed pivot faster than median
branch misses dominated running time
Sebastian Wild Branch Misses in Quicksort 2015-01-04 4 / 15
29. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
30. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
31. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
32. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
33. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
34. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
35. Track Record of Dual-Pivot Quicksort
Since 2009, Java uses YAROSLAVSKIY’s dual-pivot Quicksort (YQS)
faster than previously used classic Quicksort (CQS) in practice
traditional cost measures do not explain this!
CQS YQS Relative
Running Time (from various experiments) −10±2%
Comparisons 2 1.9 −5%
Swaps 0.3 0.6 +80%
Bytecode Instructions 18 21.7 +20.6%
MMIX oops υ 11 13.1 +19.1%
MMIX mems µ 2.6 2.8 +5%
scanned elements1
(≈ cache misses)
2 1.6 −20%
·n ln n + O(n) , average case results
What about branch misses? Can they explain YQS’s success? ... stay tuned.
1KUSHAGRA, LÓPEZ-ORTIZ, MUNRO, QIAO; ALENEX 2014
Sebastian Wild Branch Misses in Quicksort 2015-01-04 5 / 15
36. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
37. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
38. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
39. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
40. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
41. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P
Pr U > P = 1 − P
0 1P
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
42. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
43. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
44. Random Model
n i. i. d. elements chosen uniformly in [0, 1]
0 1
U1 U2U3 U4U5U6 U7U8
pairwise distinct almost surely
relative ranking is a random permutation
equivalent to classic model
Consider pivot value P fixed:
Pr U < P = P = D1
Pr U > P = 1 − P = D2
0 1P
D1 D2
Similarly for dual-pivot Quicksort with pivots P Q
Pr U < P = D1
Pr P < U < Q = D2
Pr U > Q = D3
0 1P Q
D1 D2 D3
These probabilities hold for all elements U,
independent of all other elements!
Sebastian Wild Branch Misses in Quicksort 2015-01-04 6 / 15
45. Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
46. Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
47. Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
48. Branches in CQS
How many branches in first partitioning step of CQS?
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
49. Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
50. Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
51. Branches in CQS
How many branches in first partitioning step of CQS?
Consider pivot value P fixed. D = (D1, D2) = (P, 1 − P) fixed.
one comparison branch per element U:
U < P left partition
U > P right partition
branch taken with prob. P
i. i. d. for all elements U!
memoryless source
other branches (loop logic etc.)
easy to predict
only constant number of mispredictions
can be ignored (for leading term asymptotics)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 7 / 15
52. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
53. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
54. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
55. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
56. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
57. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
58. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
59. Misprediction Rate for Memoryless Sources
Branches taken i. i. d. with probability p.
Information theoretic lower bound:
Miss rate: fOPT(p) = min{p, 1 − p}
Can approach lower bound by estimating p.
ˆp ≥ 1
2 taken ˆp < 1
2 not taken
But: Actual predictors have very little memory!
1-bit Predictor
Wrong prediction whenever value changes
Miss rate: f1bit(p) = 2p(1 − p)
1 2
predict taken predict not taken
p
1 − p 1 − p
p
Sebastian Wild Branch Misses in Quicksort 2015-01-04 8 / 15
60. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
61. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
62. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
63. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
64. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
65. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
66. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
67. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
68. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
69. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
70. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
71. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
72. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
73. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
74. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
75. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
76. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
77. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
78. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
79. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
80. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
81. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
82. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
83. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
84. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
85. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
86. Misprediction Rate for Memoryless Sources [2]
2-bit Saturating Counter
Miss rate? ... depends on state!
1 2 3 4
predict taken predict not taken
p
1 − p 1 − p 1 − p 1 − p
ppp
But: Very fast convergence to steady state
different initial state distributions
20 iterations for p = 2
3
use steady-state miss-rate:
expected miss rate over states in stationary
distribution
here: f2-bit-sc(p) =
q
1 − 2q
with q = p(1 − p).
similarly for 2-bit Flip-Consecutive
f2-bit-fc(p) =
q(1 + 2q)
1 − q
.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 9 / 15
87. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
88. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
89. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
90. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
91. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
92. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
93. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
94. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
95. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
96. Distribution of Pivot Values
In (classic) Quicksort branch probability is random
expected miss rate: E[f(P)]. (expectation over pivot values P)
What is the distribution of P?
without sampling: P
D
= Uniform(0, 1)
Typical pivot choice: median of k (in practice: k = 3)
or pseudomedian of 9 (“ninther”)
Here: more general scheme with parameter t = (t1, t2)
Example: k = 6 and t = (3, 2):
P
t1 t2
t = (0, 0) no sampling
t = (t, t) gives median-of-(2t + 1)
can also sample skewed pivots
Distribution of pivot value: P
D
= Beta(t1 + 1, t2 + 1)
Sebastian Wild Branch Misses in Quicksort 2015-01-04 10 / 15
97. Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
98. Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
99. Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
100. Miss Rates for Quicksort Branch
expected miss rate given by integral
E[f(P)] =
ˆ 1
0
f(p) ·
pt1
(1 − p)t2
B(t + 1)
dp
e. g. for 1-bit predictor
E[f1-bit(P)] =
ˆ 1
0
2p(1 − p) ·
pt1
(1 − p)t2
B(t + 1)
dp = 2
(t1 + 1)(t2 + 1)
(k + 2)(k + 1)
no concise representation for other integrals ... (see paper)
but: exact values for fixed t
Sebastian Wild Branch Misses in Quicksort 2015-01-04 11 / 15
101. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
102. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
103. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
104. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
105. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
106. Miss Rate and Branch Misses
Miss Rate for CQS with median of 2t+1:
0 2 4 6 8
0.3
0.4
0.5
0.5
t
miss rate
OPT 1-bit
2-bit sc 2-bit fc
miss rates quickly get bad
(close to guessing!)
but: less comparisons in total!
0 2 4 6 8
1.4
1.6
1.8
2
1/ ln 2
·n ln n + O(n)
t
#cmps
Consider number of branch misses:
#BM = #comparisons · miss rate
Overall BM still grows with t.
0 2 4 6 8
0.5
0.6
0.7
0.5/ ln 2
·n ln n + O(n)
t
#BM
Sebastian Wild Branch Misses in Quicksort 2015-01-04 12 / 15
107. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
< P ?
swap < Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
108. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
109. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
110. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
111. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
112. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
113. Branch Misses in YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Complication for analysis:
4 branch locations
how often they are
executed depends on
input
P ?
swap Q ?
skip swap g
Q ?
P ? skip
swap swap k
P P ≤ ◦ ≤ Q ≥ QP Q
Example: C(y1)
executed ( D1 + D2 )n + O(1) times. (in expectation, conditional on D)
branch taken i. i. d. with prob D1 . (conditional on D)
expected #BM at C(y1)
in first partitioning step:
E[(D1 + D2) · f(D1)] · n + O(1)
Integrals even more “fun” ... but doable
Sebastian Wild Branch Misses in Quicksort 2015-01-04 13 / 15
114. Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
115. Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
116. Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
117. Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
118. Results CQS vs. YQS
Original question: Does YQS better than CQS w. r. t. branch misses?
Expected number of branch misses
without pivot sampling
CQS YQS Relative
OPT 0.5 0.513 +2.6%
1-bit 0.6 0.673 +1.0%
2-bit sc 0.571 0.585 +2.5%
2-bit fc 0.589 0.602 +2.2%
·n ln n + O(n)
CQS median-of-3 vs. YQS tertiles-of-5
CQS YQS Relative
OPT 0.536 0.538 +0.4%
1-bit 0.686 0.687 +0.1%
2-bit sc 0.611 0.613 +0.3%
2-bit fc 0.627 0.629 +0.3%
·n ln n + O(n)
essentially same number of BM.
Branch misses not a plausible explanation for YQS’s success.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 14 / 15
119. Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
120. Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15
121. Conclusion
Precise analysis of branch misses in Quicksort (CQS and YQS)
including pivot sampling
lower bounds on branch miss rates
CQS and YQS cause very similar number of BM
Strengthened evidence for the hypothesis that
YQS is faster because of better usage of memory hierarchy.
Sebastian Wild Branch Misses in Quicksort 2015-01-04 15 / 15