Linux PREEMPT_RT
Internals
George Kang ( 康哲豪 )
<georgekang03@gmail.com>
May,30 2015/MOSUT
● Deadline
● Hard Real-time
– Meet deadline Deterministically
– Guaranteed worst case
● Soft Real-time
– Best Effort
Real-time System
Real-time System
● Assumption
– critical task has highest priority
– task workload is fixed
● Deadline miss
– Undetermined latency
– Unpredictable event
Critical TaskLatency?
Critical Event Deadline
Events?
Latency in Linux
Latency
● Time interval from event trigger till handler task react this
event
Critical TaskLatency?
Critical Event Deadline
Events?
From Event to Received
Interrupt
Taken
Interrupt
Received in
User/Thread
Context
Critical section
with interrupts
disabled
HW
Excepti
on
“Top Half” / ISR Exit from IRQ Reschedule Context SwitchSignal/
Wakeup
Resource Conflicts
Latency results from...
● Preemption
● Critical Section
● Interrupt
Preemption
● Re-schedule when high priority task is ready
● Increase responsibility
● Decrease throughput
Preemption in Linux
● Preempt configurations
– NONE, Voluntary, Basic RT, RT_FULL
Toward full preemption
● Most important aspects of Real-time
– Controlling latency by allowing kernel to be preemptible
everywhere
Non-Preemptive
● CONFIG_PREEMPT_NONE
● Preemption is not allowed in Kernel Mode
● Preemption could happen upon returning to user space
Non-Preemptive Issue
● Latency of Non-Preemptive configuration
Task A
(user mode)
Interrupt handler
Wakes up Task B
Task B
(user mode)
System call
Task A
(kernel mode)
Task A
(kernel mode)
Interrupt
?
Return from syscall
Unbound
Latency
Preemption Point
● CONFIG_PREEMPT_VOLUNTARY
● Insert explicit preemption point in Kernel
– might_sleep → __cond_resched
● Kernel can be preempted only at preemption point
Preemptible Kernel
● CONFIG_PREEMPT
● Implicit preemption in Kernel
● preempt_count
– Member of thread_info
– Preemption could happen when preempt_count == 0
Preemptible Kernel
● Kernel preemption could happen when
– return from irq handler
__irq_svc:
svc_entry
irq_handler
#ifdef CONFIG_PREEMPT
get_thread_info tsk
ldr r8, [tsk, #TI_PREEMPT] @ get preempt count
teq r8, #0 @ if preempt count != 0
bne 1f @ return from exeption
ldr r0, [tsk, #TI_FLAGS] @ get flags
tst r0, #_TIF_NEED_RESCHED @ if NEED_RESCHED is set
blne svc_preempt @ preempt!
...
#endif
Full Preemptive
● Goal: Preempt Everywhere except
– Preempt disable
– Interrupt disable
● Reduce non-preemptible cases in kernel
– spin_lock
– Interrupt
Latency Issue
● Preemption
● Critical Section
● Interrupt
Spinlock
● Task will be busy waiting until it acquires the lock
● Spinlock in Preemptible Linux (CONFIG_PREEMPT)
– Uniprocessor
● preempt_disable
– SMP
● preempt_disable
● Lock acquire, busy waiting
– No sleeping
● Convert spinlock to rt-mutex
– Sleeping
Priority Inversion
Acquires
a lock
Priority
Time
preempted
Tries to get
the same
lock waits
● High priority task is preempted by medium priority task
● Unbound waiting for high priority task
Priority Inheritance
Acquires
a lock
Priority
Ti
me
preempted
Tries to get
the lock
waits
Executes with the priority
of the waiting process
Releases
the lock
Acquires
the lock
preempted
Latency Issue
● Preemption
● Critical Section
● Interrupt
Task triggered by IRQ
interrupt
latency
Interrupt
handler Scheduler
Interrupt
handler
duration
scheduler
latency
scheduler
duration
Process
context
Interrupt
context
Makes the
task runnable
Total Latency
Task Sleep Task wakeup
Task interrupted by
IRQ
Task Run
Interrupt
handler
Task Resume
Interrupt
Process
context
Interrupt
context
Interrupt
handler ...
...
Interrupt Preempt Latency
Unbound??
Original Linux Kernel
User Space
User Context
Interrupt Handlers
Kernel
Space
Interrupt
Context
SoftIRQsHiprio
tasklets
Network
Stack
Timers
Regular
tasklets
...
Scheduling
Points
Process
Thread
Kernel
Thread
Priority of interrupt context is always higher than others
Non-determined Issue
● Interrupt context can always preempt others
● Interrupt as an external event
– Interrupt number of a time interval is non-determinated
– Nature of interrupt, can not be avoided
● Behavior of interrupt handler is not well defined
– Non-determined interrupt handler
IRQ in PREEMPT_RT
● Threaded IRQ
– IRQ handler is actually a kernel thread by default
– Hard IRQ handler only wakes up IRQ handler thread
● Behavior of hard IRQ handler is well defined
– Original IRQ handler (No Delayed) is reserved by
IRQF_NO_THREAD flag.
● Remove softirq
– ksoftirqd as a normal kernel thread, handles all softirqs
PREEMPT_RT
User Space
NODELAY Interrupt Handlers
Kernel
SpaceNetworkStack
Timers
Tasklets
Scheduling
Points
Process
Thread
Kernel
Threads
Experiments on ARM
● OMAP-L138 (ARM11; 375/456 MHz)
● Cyclictest
– measuring accuracy of sleep and wake operations of highly
prioritized realtime threads
● cyclictest -t1 -p 80 -n -i 1000 -l 600000 -h 1000
● Ping target with inverval 0.004s
No Preempt (Interrupt)
Preempt RT-Basic (Interrupt)
Preempt RT-Full (Interrupt)
RT-Basic vs. RT-Full (syscall)
Basic
Full
Practical Issues in
PREEMPT_RT
Tool for Observation
● Linux Kernel Tracing Tool
● Event tracing
– Tracing kernel events
mount ­t debugfs debugfs 
/sys/kernel/debug
– Event: irq_handler_entry, irq_handler_exit, sched:*
(/sys/kernel/debug/tracing/events/)
trace­cmd record ­e irq_handler_entry 
                 ­e irq_handler_exit 
                 ­e sched:*
trace­cmd report 
Trace Log Format
● trace­cmd­1061  [000]   142.334403: irq_handler_entry:    
irq=21 name=critical_irq
Current
Task
CPU# Time
Stamp
Event
Name
Message
trace-cmd-1061 [000] 142.334403 irq_handler_entry Irq=21
name=critical_irq
Trace Log
interrupt
latency
Interrupt
handler Scheduler
Interrupt
handler
duration
scheduler
latency
scheduler
duration
Process
context
Interrupt
context
Makes the
task runnable
trace-cmd critical_task
irq_handler_entry irq_handler_exit
sched_wakeup*
sched_wakeup**
sched_switch
sched_wakeup*: wakeup critical_task
sched_wakeup**: wakeup ksoftirqd
Trace Log
trace­cmd­1061  [000]   142.334403: irq_handler_entry:    irq=29 
name=critical_irq
trace­cmd­1061  [000]   142.334437: sched_wakeup:         
critical_task:1056 [9] success=1 CPU:000
trace­cmd­1061  [000]   142.334456: irq_handler_exit:     irq=29 
ret=handled
trace­cmd­1061  [000]   142.334480: sched_wakeup:         
ksoftirqd/0:3 [98] success=1 CPU:000
trace­cmd­1061  [000]   142.334526: sched_switch:         trace­
cmd:1061 [120] R ==> critical_task:1056 [9]
Practical Issues
● Interrupt
– Non-determined external events
– Collision: Several Interrupt happens almost at the same
time
– Preemption: Interrupt takes place when task is woke up
but not starts yet.
IRQ IRQ IRQ
Interrupt Issue
Critical
Handler
Critical TaskHandler1
Critical
IRQ
IRQ1 IRQ2 ...
Handler2 ...
Critical Task Latency = Critical IRQ Latency + Critical Handler Cost +
IRQ1 Latency + Handler1 Cost + … +
+ Schedule Latency
Interrupt Preemption
● Trace Log
trace­cmd­1079  [000]  1074.488916: irq_handler_entry:    irq=29 
name=critical_irq
trace­cmd­1079  [000]  1074.488970: sched_wakeup:         critical_task:1056 
[9] success=1 CPU:000
trace­cmd­1079  [000]  1074.488992: irq_handler_exit:     irq=29 ret=handled
trace­cmd­1079  [000]  1074.489018: sched_wakeup:         ksoftirqd/0:3 [98] 
success=1 CPU:000
trace­cmd­1079  [000]  1074.489046: irq_handler_entry:    irq=59 
name=ohci_hcd:usb1
trace­cmd­1079  [000]  1074.489056: irq_handler_exit:     irq=59 ret=handled
trace­cmd­1079  [000]  1074.489075: sched_wakeup:         irq/59­ohci_hcd:979 
[49] success=1 CPU:000
trace­cmd­1079  [000]  1074.489113: sched_stat_runtime:   comm=trace­cmd 
pid=1079 runtime=694958 [ns] vruntime=30984068262 [ns]
trace­cmd­1079  [000]  1074.489139: sched_switch:         trace­cmd:1079 [120] 
R ==> critical_task:1056 [9]
critical_task­1056  [000]  1074.489233: irq_handler_entry:    irq=21 
name=clockevent
critical_task­1056  [000]  1074.489361: irq_handler_exit:     irq=21 
ret=handled
critical_task­1056  [000]  1074.489424: sched_switch:         
critical_task:1056 [9] D ==> irq/59­ohci_hcd:979 [49]
Interrupt Preemption
● Defer Other Interrupt
Critical
Handler
Critical TaskHandler1
Critical
IRQ
IRQ1 IRQ2 ...
Handler2
...
Critical
Handler
Critical Task Handler1
Critical
IRQ
IRQ1 IRQ2 ...
Handler2 ...
Interrupt Preemption
● Disable other interrupts when critical interrupt occurs
● Schedule a tasklet to re-enable other interrupts.
– Critical Task's priority could be higher than tasklet.
– Ensure determination of critical task.
Critical
Handler
Critical Task Handler1
IRQ1 IRQ2 ...
Handler2 ...
Critical
IRQ
tasklet
Interrupt Preemption
Critical
Handler
Critical TaskHandler1
IRQ1 Critical
IRQ
● Several Interrupt happens almost at the same time
● Set Critical IRQ with highest priority
● Depended on hardware
Interrupt Collision
Handler2
IRQ2
Almost at the same time !!!
● High Resolution Timer
– Non threaded Interrupt
● Non-determined IRQ handler
– Heavy and variable jitter
● 100us ~ 160us (375MHZ ARM926EJS)
– Important and complex component of Linux Kernel
● Tick timer
● Timer for RR scheduler
● nanosleep, clock_nanosleep
● Futex, rtmux
● Others
– Can not be disabled
Many Unpredicted
Timer IRQ sources !!
HRT Impact
Simplify Timer Interrupt
● Move to Periodic Timer
– Tick is the only one timer source
– However, the behavior of timer interrupt is still non-
determined
– Timer is not precise enough
● Periodic task could not be Real-time task
Conclusion
● PREEMPT_RT patch significantly improves the
preemptiveness of Linux kernel.
– However we should keep in mind that these results are
statistical in nature rather than being conclusive.
● Using PREEMPT_RT will not guarantee that your system is
hard real-time.
– Your system and applications will also have to be designed
properly including correct priorities, use of deterministic APIs,
allocation of critical resources.
Reference
● Inside The RT Patch, Steven Rostedt, Darren V Hart
● A real time preemption overview, Paul Mckenny
● Making Linux do Hard Real Time, Jim Hung

Linux Preempt-RT Internals