2. In this presentation we will try to illustrate the pro’s, con’s,
and latency characteristics of several Linux kernel
preemption models, and provide some guidance in
selecting an appropriate preemption model for a given
category of application.
Overview of Topics Presented
3. Questions we will address include:
• Which preemption model provides the best throughput?
• Which model offers the lowest average latencies?
• Which model offers the lowest maximum latencies?
• Which model offers the most predictable latencies?
• How do load conditions impact the respective latency
performance of the various models?
• What impact does CPU Frequency Scaling or CPU
sleep states have on latency performance?
• What is the best model for a given application type?
Overview of Topics - continued
4. Our intent is to show relative trends between the
preemption models under the same conditions… so the
data presented were gathered thusly:
• Each preemption model configuration was tested using
identical tests running on the same InSignal Arndale.
• Cyclictest was used for a run duration of two hours with
a single thread executing at a SCHED_FIFO priority of
80 to realistically represent scheduling latency for a real-
time process.
• A cyclictest run was done with no system load, then
another with an externally-applied ping flood, and
another with back-to-back executions of hackbench
running to represent maximum system loading.
Test Rationale and Methodology
6. Only three Linux preemption models are really interesting
for anything other than desktop use:
• the Server preemption model provides optimal
throughput for applications where latencies are not an
issue
• the Low Latency Desktop preemption model provides
low average latencies for interactive and ‘soft real-time’
applications
• the Full RT preemption model provides the highest level
of latency determinism for ‘hard real-time’ applications
Tested Linux Preemption Models
7. Server Preemption Model Latencies
Cyclictest with no system load
CPU frequency scaling disabled
Minimum Latency: 16 usec
Average Latency: 24 usec
Most Frequent Latency: 24 usec
Maximum Latency: 572 usec
Standard Deviation: 1.211041
Almost all latencies between 20 usec and
28 usec
However, even at light loads, latencies
out to 572 usec were observed. This is a
consequence of all code paths through
the kernel being non-preemptible.
8. Server Preemption Model Latencies
Cyclictest with ping flood load
CPU frequency scaling disabled
Minimum Latency: 15 usec
Average Latency: 23 usec
Most Frequent Latency: 24 usec
Maximum Latency: 592 usec
Standard Deviation: 1.580778
Almost all latencies between 20 usec and
28 usec
Note, however, that much longer
latencies continue to be observed due to
lack of any design efforts to avoid them.
Also note that maximum latency is
already beginning to creep upwards.
9. Server Preemption Model Latencies
Cyclictest with hackbench load
CPU frequency scaling disabled
Minimum Latency: 17 usec
Average Latency: 150655 usec
Most Frequent Latency: 22 usec
Maximum Latency: 2587753 usec
Standard Deviation: 493977.9
The majority of latencies were between
21 usec and 25 usec, gradually tapering
off to single digit frequencies at 204
usec. Note the duration of the max
latency is 4000 times longer than under
no load!
Note also the much lower frequency
percentage for the peak occurrence.
This means a larger percentage of the
higher latencies were observed, and
illustrates the serious degradation of
latency determinism under load in a non-
preemptible kernel where latency was
not a primary design consideration.
10. Low Latency Desktop Model Latencies
Cyclictest with no system load
CPU frequency scaling disabled
Minimum Latency: 19 usec
Average Latency: 28 usec
Most Frequent Latency: 29 usec
Maximum Latency: 57 usec
Standard Deviation: 0.8698308
The majority of latencies were between
28 usec and 31 usec, quickly tapering off
to single digit frequencies at 42 usec.
Maximum latency was reduced tenfold
under light loads vs. the Server model.
This illustrates the significant
improvements in latency performance
under light loads with kernel preemption
enabled.
11. Low Latency Desktop Model Latencies
Cyclictest with ping flood
CPU frequency scaling disabled
Minimum Latency: 18 usec
Average Latency: 29 usec
Most Frequent Latency: 29 usec
Maximum Latency: 131 usec
Standard Deviation: 1.79573
The majority of latencies were between
28 usec and 32 usec, quickly tapering off
to single digit frequencies at 80 usec.
The reduced range of observed latencies
indicates improved latency performance
and predictability at moderate loads
versus the Server model. However, as
the next slide will show, latency
performance in this model degrades
seriously under heavy load, making Full
RT a better choice for latency
performance under heavy load
conditions.
12. Low Latency Desktop Model Latencies
Cyclictest with hackbench
CPU frequency scaling disabled
Minimum Latency: 19 usec
Average Latency: 370606 usec
Most Frequent Latency: 25 usec
Maximum Latency: 4122148 usec
Standard Deviation: 826092
The majority of latencies were between
24 usec and 26 usec, gradually tapering
off to single digit frequencies at 105
usec. Note that the max latency was
70,000 times longer than with this model
under no load!
Max latencies were nearly double that for
a Server model under heavy load, and
latency predictability is low. This
illustrates the combined impacts under
heavy load of increased context switches
without addressing priority inversion or
FIFO queueing disciplines.
13. Full RT Preemption Model Latencies
Cyclictest with no system load
CPU frequency scaling disabled
Minimum Latency: 19 usec
Average Latency: 29 usec
Most Frequent Latency: 29 usec
Maximum Latency: 53 usec
Standard Deviation: 1.031893
The majority of latencies were between
29 usec and 31 usec, quickly tapering off
to single digit frequencies at 50 usec.
Maximum latency was reduced tenfold
under light loads vs. the Server model.
This illustrates the significant
improvements in latency performance
under light loads with kernel preemption
enabled.
Under light load performance is very
similar to that of the Low Latency
Desktop model.
14. Full RT Preemption Model Latencies
Cyclictest with ping flood
CPU frequency scaling disabled
Minimum Latency: 19 usec
Average Latency: 29 usec
Most Frequent Latency: 30 usec
Maximum Latency: 59 usec
Standard Deviation: 2.698587
The majority of latencies were between
29 usec and 31 usec, quickly tapering off
to single digit frequencies at 53 usec.
The reduced range of observed latencies
indicates improved latency performance
and predictability at moderate loads
versus the Server model.
Note that even at moderate loads the
maximum latencies are less than half the
duration of those seen in the Low
Latency Desktop model.
15. Full RT Preemption Model Latencies
Cyclictest with hackbench load
CPU frequency scaling disabled
Minimum Latency: 21 usec
Average Latency: 29 usec
Most Frequent Latency: 25 usec
Maximum Latency: 156 usec
Standard Deviation: 7.69571
The majority of latencies were between
24 usec and 26 usec, with a second
group peaking between 43 and 44 usec,
and quickly tapering off to single digit
frequencies at 134 usec.
Latency performance under heavy load is
much better than in any of the other
preemption models. With threaded
interrupt handlers, priority inheritance
and priority-based queuing disciplines,
the real-time process is still able to meet
much tighter scheduling deadlines
despite heavy activity of other lower-
priority threads.
19. • For applications in which throughput and not latencies
are the primary consideration, opt for the Server model
• If quality of service is important but missed latency
deadlines will not result in catastrophic failures, opt for
the Low Latency Desktop model and size the hardware
capacity to keep loading moderate
• For host environments for ‘zero overhead Linux’ (ODP
for example), Low Latency Desktop is a good choice
• If latencies must be consistent even under high load
conditions Full RT may be required
• For applications based on POSIX real-time scheduling
and priority-based preemption, use Full RT for best
results
What Preemption Model Is Best for Me?
20. The test scripts, data files, and graphs used to provide
reference data for this presentation may be accessed
online at the following URL:
http://people.linaro.org/~gary.robertson/LCA14
Data References
21. More about Linaro Connect: http://connect.linaro.org
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
Linaro members: www.linaro.org/members
23. The Server preemption model lies at one extreme of the
latency vs. throughput continuum.
Pro’s include:
• Simplicity, maturity and robustness make this a very
reliable platform
• With no preemption the reduced number of context
switches minimizes system overhead and maximizes
overall throughput
Server Model Characteristics
24. Con’s include:
• The lack of preemption results in low average
latencies under low loads but much higher latencies
when the system is heavily loaded
• The latencies imposed by different execution paths
through the kernel result in a wide range of latency
durations and low latency determinism
Server Model Characteristics
25. The Low Latency Desktop preemption model holds the
middle ground in the latency vs. throughput continuum.
Pro’s include:
• Under low to moderate load, latency range and
predictability are significantly improved vs. the Server
model
• This preemption model is supported as part of the
mainstream kernel and tends to be less trouble-prone
than Full RT preemption
Low Latency Desktop Model Characteristics
26. Con’s include:
• The preemption of kernel operations and increased
number of context switches create increased
overhead and reduced performance relative to the
Server model
• The preemption of kernel operations results in higher
average latencies vs. the Server preemption model
• This preemption model does not perform as well
under heavy system loads as other models
Low Latency Desktop Model Characteristics
27. The following software-induced latency sources remain
problematic in the Low Latency Desktop preemption
model:
• Exceptions, software interrupts, and device service
request interrupts execute outside of scheduler control
• Most mutual exclusion locking primitives are subject to
priority inversion
• Shared resources use FIFO-based queueing disciplines,
meaning high-priority threads may have to wait behind
lower priority threads for access to the resources
These factors result in lower levels of latency determinism.
Low Latency Desktop Model - continued
28. The Full RT preemption model represents the latency-
centric end of the latency vs. throughput continuum. It
attempts to mitigate all the remaining software-induced
sources of latency.
• Handlers for exceptions, software interrupts, and
device service request interrupts are encapsulated
inside threads which are under scheduler control
• Priority inheritance is added for most mutual
exclusion locking primitives to prevent priority
inversions
• Shared resources use priority-based queueing
disciplines so that the highest-priority thread always
gets first access to the resources
Full RT Model Characteristics
29. The Full RT preemption model inevitably suffers from
reduced overall throughput as a consequence of its
efforts to maximize latency determinism:
• Schedulable ‘threaded’ ISRs result in the highest
levels of preemption and context switch overhead
• Priority inheritance involves iterative logic to
temporarily boost the priorities of all lock holders to
equal that of the highest-priority lock waiters. This
adds significant overhead to locking primitive code.
• Priority-based queueing requires sorting the queue
each time a new waiting thread is added
Full RT Model Characteristics - continued
30. Pro’s include:
• The most consistent and predictable latency
performance available in any preemption model
• The best support environment for creating priority-
based multi-layered applications
• The best hard real-time support available in a Linux
environment
Full RT Model Characteristics - continued
31. Con’s include:
• Full RT preemption is supported only with a
separately maintained kernel patch set
• The latest supported RT kernel version always lags
behind mainstream development
• Mainstream drivers, libraries, and applications may
not always function properly in the Full RT
environment
• Poorly designed or written real-time threads may
starve out threaded interrupt handlers
Full RT Model Characteristics - continued