SlideShare a Scribd company logo
1 of 70
Download to read offline
IRQs: the Hard, the Soft, the Threaded and the
Preemptible
Alison Chaiken
Latest version of these slides
alison@she-devel.com
Embedded Linux Conference Europe
Oct 11, 2016
Example code
Version 2, actually presented live
2
Thursday October 13, 2016 15:30:
Debugging Methodologies for Realtime Issues
Joel Fernandes, Google
this same room
Knocking at Your Back Door (or How Dealing with
Modern Interrupt Architectures can Affect Your Sanity)
Marc Zyngier, ARM Ltd
Hall Berlin A
3
Agenda
● Why do IRQs exist?
● About kinds of hard-IRQ handlers
● About softirqs and tasklets
● Differences in IRQ handling between RT and non-RT kernels
● Studying IRQ behavior via kprobes, event tracing, mpstat and
eBPF
● Detailed example: when does NAPI take over for eth IRQs?
“Kunst nicht lehrbar ist. Sie müssen wieder in der
Werkstatt aufgehen.” -- Walter Gropius
4
Sample questions to be answered
● What's all stuff in /proc/interrupts anyway?
● What are IPIs and NMIs?
● Why are atomic operations expensive for ARM?
● Why are differences between mainline and RT for softirqs?
● What is 'current' task while in softirq?
● What function is running inside the threaded IRQs?
● When do we switch from individual hard IRQ processing to
NAPI?
5
Interrupt handling: a brief pictorial summary
DennisJarvis,http://tinyurl.com/jmkw23h
onefulllife,http://tinyurl.com/j25lal5
Top half: the hard IRQ Bottom half: the soft IRQ
6
Why do we need interrupts at all?
● IRQs allow devices to notify the kernel that they require
maintenance.
● Alternatives include
– polling (servicing devices at a pre-configured
interval);
– traditional IPC to user-space drivers.
● Even a single-threaded RTOS or a bootloader needs a
system timer.
7
Interrupts in Das U-boot
● For ARM, minimal IRQ support:
– clear exceptions and reset timer (e.g., arch/arm/lib/interrupts_64.c
or arch/arm/cpu/armv8/exceptions.S)
● For x86, interrupts are serviced via a stack-push followed by a
jump (arch/x86/cpu/interrupts.c)
– PCI has full-service interrupt handling (arch/x86/cpu/irq.c)
8
Interrupts in RTOS: Xenomai/ADEOS IPIPE
From Adeos website, covered by GFDL
9
Zoology of IRQs
● Hard versus soft
● Level- vs. edge-triggered, simple, fast EOI
or per-CPU
● Local vs. global; System vs. device
● Maskable vs. non-maskable
● Shared or not; chained or not
● Multiple interrupt controllers per SOC
'cat /proc/interrupts' or 'mpstat -A'
ByBirdBeaksA.svg:L.Shyamalderivativework:Leptictidium(talk)-BirdBeaksA.svg,CCBY-SA2.5,https://commons.wikimedia.org/w/index.php?curid=6626434
10
ARM IPIs, from arch/arm/kernel/smp.c
$ # cat /proc/interrupts, look at bottom
void handle_IPI(int ipinr, struct pt_regs *regs)
switch (ipinr) {
case IPI_TIMER:
tick_receive_broadcast();
case IPI_RESCHEDULE:
scheduler_ipi();
case IPI_CALL_FUNC:
generic_smp_call_function_interrupt();
case IPI_CPU_STOP:
ipi_cpu_stop(cpu);
case IPI_IRQ_WORK:
irq_work_run();
case IPI_COMPLETION:
ipi_complete(cpu);
}
Handlers are in
kernel/sched/core.c
11
What is an NMI?
● A 'non-maskable' interrupt is related to:
– HW problem: parity error, bus error, watchdog timer expiration . . .
– also used by perf
/* non-maskable interrupt control */
#define NMICR_NMIF 0x0001 /* NMI pin interrupt flag */
#define NMICR_WDIF 0x0002 /* watchdog timer overflow */
#define NMICR_ABUSERR 0x0008 /* async bus error flag */
From arch/arm/mn10300/include/asm/intctl-regs.h
ByJohnJewell-Fenix,CCBY2.0,https://commons.wikimedia.org/w/index.php?curid=49332041
SKIP
12
How IRQ masking works
arch/arm/include/asm/irqflags.h:
#define arch_local_irq_enable arch_local_irq_enable
static inline void arch_local_irq_enable(void)
{ asm volatile(
"cpsie i @ arch_local_irq_enable"
::: "memory", "cc"); }
arch/arm64/include/asm/irqflags.h:
static inline void arch_local_irq_enable(void)
{ asm volatile(
"msr daifclr, #2 // arch_local_irq_enable"
::: "memory"); }
arch/x86/include/asm/irqflags.h:
static inline notrace void arch_local_irq_enable(void)
{ native_irq_enable(); }
static inline void native_irq_enable(void)
{ asm volatile("sti": : :"memory"); }
“change processor state”
only current core
SKIP
13
x86's Infamous System Management Interrupt
● SMI jumps out of kernel into System Management Mode
– controlled by System Management Engine (Skochinsky)
● Identified as security vulnerability by Invisible Things Lab
● Not directly visible to Linux
● Traceable via hw_lat detector (sort of)
[RFC][PATCH 1/3] tracing: Added hardware latency tracer, Aug 4
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
The hardware latency tracer has been in the PREEMPT_RT patch for some
time. It is used to detect possible SMIs or any other hardware interruptions that
the kernel is unaware of. Note, NMIs may also be detected, but that may be
good to note as well.
14
ARM's Fast Interrupt reQuest
● An NMI with optimized handling due to dedicated registers.
● Underutilized by Linux drivers.
● Serves as the basis for Android's fiq_debugger.
15
IRQ 'Domains' Correspond to Different INTC's
CONFIG_IRQ_DOMAIN_DEBUG:
This option will show the mapping relationship between hardware irq
numbers and Linux irq numbers. The mapping is exposed via debugfs
in the file "irq_domain_mapping".
Note:
● There are a lot more IRQs than in /proc/interrupts.
● There are more IRQs in /proc/interrupts than in 'ps axl | grep irq'.
● Some IRQs are not used.
● Some are processor-reserved and not kernel-managed.
SKIP
Example: i.MX6 General Power Controller
Unmasked IRQs can wakeup sleeping power domains.
Threaded IRQs in RT kernel
ps axl | grep irq
with both RT and non-RT kernels.
Handling IRQs as kernel threads allows priority and
CPU affinity to be managed individually.
IRQ handlers running in threads can themselves be
interrupted.
18
Quiz: What we will see 
with 'ps axl | grep irq' 
for non­RT kernels?
Why?
?
?
?
?
?? ??
?
What function do threaded IRQs run?
/* request_threaded_irq - allocate an interrupt line
* @handler: Function to be called when the IRQ occurs.
* Primary handler for threaded interrupts
* If NULL and thread_fn != NULL the default
* primary handler is installed
*
* @thread_fn: Function called from the irq handler thread
* If NULL, no irq thread is created
*/
Even in mainline, request_irq() = requested_threaded_irq()
with NULL thread_fn.
EXAMPLE
20
Result:
-- irq_default_primary_handler() runs in interrupt context.
-- All it does is wake up the thread.
-- Then handler runs in irq/<name> thread.
Result:
-- handler runs in interrupt context.
-- thread_fn runs in irq/<name> thread.
request_irq(handler) request_threaded_irq(handler, NULL)
direct invocation of request_threaded_irq()CASE 1
irq_setup_forced_threading()
CASE 0 indirect invocation of request_threaded_irq()
21
Threaded IRQs in RT, mainline and mainline with
“threadirqs” boot param
● RT: all hard-IRQ handlers that don't set IRQF_NOTHREAD run
in threads.
● Mainline: only those hard-IRQ handlers whose registration
requests explicitly call request_threaded_irq() run in threads.
● Mainline with threadirqs kernel cmdline: like RT, but CPU affinity
of IRQ threads cannot be set.
genirq: Force interrupt thread on RT
genirq: Do not invoke the affinity callback via a workqueue on RT
22
Shared interrupts: mmc driver
● Check 'ps axl | grep irq | grep mmc':
1 0 122 2 -51 0 - S ? 0:00 [irq/16-mmc0]
1 0 123 2 -50 0 - S ? 0:00 [irq/16-s-mmc0]
● 'cat /proc/interrupts': mmc and ehci-hcd share an IRQ line
16: 204 IR-IO-APIC 16-fasteoi mmc0,ehci_hcd:usb3
● drivers/mmc/host/sdhci.c:
ret = request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq,
IRQF_SHARED,mmc_hostname(mmc), host);
handler thread_fn
Why are atomic operations more expensive (ARM)?
arch/arm/include/asm/atomic.h:
static inline void atomic_##op(int i, atomic_t *v) 
{ raw_local_irq_save(flags); 
v->counter c_op i; 
raw_local_irq_restore(flags); }
include/linux/irqflags.h:
#define raw_local_irq_save(flags) 
do { flags = arch_local_irq_save(); } while (0)
arch/arm/include/asm/atomic.h:
/* Save the current interrupt enable state & disable IRQs */
static inline unsigned long arch_local_irq_save(void) { . . . }
24
Introduction to softirqs
In kernel/softirq.c:
const char * const softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
"TASKLET", "SCHED", "HRTIMER", "RCU"
};
Tasklet interface Raised by devices Kernel housekeeping
In ksoftirqd, softirqs are serviced in the listed order.
IRQ_POLL since 4.4
Gone since 4.1
25
What are tasklets?
● Tasklets perform deferred work not handled by other softirqs.
● Examples: crypto, USB, DMA, keyboard . . .
● More latency-sensitive drivers (sound, PCI) are part of
tasklet_hi_vec.
● Any driver can create a tasklet.
● tasklet_hi_schedule() or tasklet_schedule() are called directly by
ISR.
const char * const softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
"TASKLET", "SCHED", "HRTIMER", "RCU"
};
26
[alison@sid ~]$ sudo mpstat -I SCPU
Linux 4.1.0-rt17+ (sid) 05/29/2016 _x86_64_(4 CPU)
CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s
0 0.03 249.84 0.00 0.11 19.96 0.43 238.75 0.68 0.00
1 0.01 249.81 0.38 1.00 38.25 1.98 236.69 0.53 0.00
2 0.02 249.72 0.19 0.11 53.34 3.83 233.94 1.44 0.00
3 0.59 249.72 0.01 2.05 19.34 2.63 234.04 1.72 0.00
Linux 4.6.0+ (sid) 05/29/2016 _x86_64_(4 CPU)
CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s
0 0.26 16.13 0.20 0.33 40.90 0.73 9.18 0.00 19.04
1 0.00 9.45 0.00 1.31 14.38 0.61 7.85 0.00 17.88
2 0.01 15.38 0.00 0.20 0.08 0.29 13.21 0.00 16.24
3 0.00 9.77 0.00 0.05 0.15 0.00 8.50 0.00 15.32
Linux 4.1.18-rt17-00028-g8da2a20 (vpc23) 06/04/16 _armv7l_ (2 CPU)
CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s
0 0.00 999.72 0.18 9.54 0.00 89.29 191.69 261.06 0.00
1 0.00 999.35 0.00 16.81 0.00 15.13 126.75 260.89 0.00
Linux 4.7.0 (nitrogen6x) 07/31/16 _armv7l_ (4 CPU)
CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s
0 0.00 2.84 0.50 40.69 0.00 0.38 2.78 0.00 3.03
1 0.00 89.00 0.00 0.00 0.00 0.00 0.64 0.00 46.22
2 0.00 16.59 0.00 0.00 0.00 0.00 0.23 0.00 3.05
3 0.00 10.22 0.00 0.00 0.00 0.00 0.25 0.00 1.45
SKIP
27
Two paths by which softirqs run
Related demo and sample code
system
management thread
run_ksoftirqd()
Hard-IRQ handler system
management thread
exhausts timeslice?
local_bh_enable()
raises softirqraises softirq
__do_softirq()do_current_softirqs()
(RT) or
__do_softirq()
CASE 0
(left)
CASE 1
(right)
28
Case 0: Run softirqs at exit of a hard-IRQ handler
while (current->softirqs_raised) {
i = __ffs(current->softirqs_raised);
do_single_softirq(i);
}
RT (4.6.2-rt5) non-RT (4.6.2)
local_bh_enable(); local_bh_enable();
__local_bh_enable(); do_softirq();
do_current_softirqs(); __do_softirq();
Run softirqs raised
in the current context.
Run all pending softirqs up to
MAX_IRQ_RESTART.
handle_pending_softirqs();
handle_softirq(); while ((softirq_bit = ffs(pending)))
handle_softirq();
EXAMPLE
29
Case 1: Scheduler runs the rest from ksoftirqd
RT (4.6.2-rt5) non-RT (4.6.2)
do_softirq();
__do_softirq();
h = softirq_vec;
while ((softirq_bit = ffs(pending)))
{
h += softirq_bit - 1;
h->action(h);
}
run_ksoftirqd(); run_ksoftirqd();
do_current_softirqs()
[ where current == ksoftirqd ]
30
4.6.2-rt5:
[ 6937.393805] e1000e_poll+0x126/0xa70 [e1000e]
[ 6937.393808] check_preemption_disabled+0xab/0x240
[ 6937.393815] net_rx_action+0x53e/0xc90
[ 6937.393824] do_current_softirqs+0x488/0xc30
[ 6937.393831] do_current_softirqs+0x5/0xc30
[ 6937.393836] __local_bh_enable+0xf2/0x1a0
[ 6937.393840] irq_forced_thread_fn+0x91/0x140
[ 6937.393845] irq_thread+0x170/0x310
[ 6937.393848] irq_finalize_oneshot.part.6+0x4f0/0x4f0
[ 6937.393853] irq_forced_thread_fn+0x140/0x140
[ 6937.393857] irq_thread_check_affinity+0xa0/0xa0
[ 6937.393862] kthread+0x12b/0x1b0
} hard-IRQ handler
kick-off softIRQ
}
4.7 mainline:
[11661.191187] e1000e_poll+0x126/0xa70 [e1000e]
[11661.191197] net_rx_action+0x52e/0xcd0
[11661.191206] __do_softirq+0x15c/0x5ce
[11661.191215] irq_exit+0xa3/0xd0
[11661.191222] do_IRQ+0x62/0x110
[11661.191230] common_interrupt+0x82/0x82
hard-IRQ handler
}
kick off soft IRQ
RT vs Mainline: entering softirq handler SKIP
31
Summary of softirq execution paths
Case 0: Behavior of local_bh_enable() differs
significantly between RT and mainline kernel.
Case 1: Behavior of ksoftirqd itself is mostly the
same (note discussion of ktimersoftd below).
32
What is 'current'?
include/asm-generic/current.h:
#define get_current() (current_thread_info()->task)
#define current get_current()
arch/arm/include/asm/thread_info.h:
static inline struct thread_info *current_thread_info(void)
{ return (struct thread_info *) (current_stack_pointer &
~(THREAD_SIZE - 1));
}
arch/x86/include/asm/thread_info.h:
static inline struct thread_info *current_thread_info(void)
{ return (struct thread_info *)(current_top_of_stack() -
THREAD_SIZE);}
In do_current_softirqs(), current is the threaded IRQ task.
33
What is 'current'? part 2
arch/arm/include/asm/thread_info.h:
/*
* how to get the current stack pointer in C
*/
register unsigned long current_stack_pointer asm ("sp");
arch/x86/include/asm/thread_info.h:
static inline unsigned long current_stack_pointer(void)
{
unsigned long sp;
#ifdef CONFIG_X86_64
asm("mov %%rsp,%0" : "=g" (sp));
#else
asm("mov %%esp,%0" : "=g" (sp));
#endif
return sp;
}
SKIP
34
Q.: When do 
system­management 
softirqs get to run?
?
?
? ?
?
? ??
?
35
Introducing systemd-irqd!!†
†
As suggested by Dave Anders
36
Do timers, scheduler, RCU ever run as part of
do_current_softirqs?
Examples:
-- every jiffy,
raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-- scheduler_ipi() for NOHZ calls
raise_softirq_irqoff(SCHED_SOFTIRQ);
-- rcu_bh_qs() calls
raise_softirq(RCU_SOFTIRQ);
These run when ksoftirqd is current.
37
Demo: kprobe on do_current_softirqs() for RT kernel
● At Github
● Counts calls to do_current_softirqs() from ksoftirqd and from a
hard-IRQ hander.
● Tested on 4.4.4-rt11 with Boundary Devices' Nitrogen i.MX6.
Output showing what task of 'current_thread' is:
[ 52.841425] task->comm is ksoftirqd/1
[ 70.051424] task->comm is ksoftirqd/1
[ 70.171421] task->comm is ksoftirqd/1
[ 105.981424] task->comm is ksoftirqd/1
[ 165.260476] task->comm is irq/43-2188000.
[ 165.261406] task->comm is ksoftirqd/1
[ 225.321529] task->comm is irq/43-2188000.
explanation
38
struct task_struct {
#ifdef CONFIG_PREEMPT_RT_BASE
struct rcu_head put_rcu;
int softirq_nestcnt;
unsigned int softirqs_raised;
#endif
};
Softirqs can be pre-empted with PREEMPT_RT
include/linux/sched.h:
39
RT-Linux headache: 'softirq starvation'
● ksoftirqd scarcely gets to run.
● Events that are triggered by timer interrupt won't happen.
● Example: main event loop in userspace did not run due to
missed timer ticks.
Reference: “Understanding a Real-Time System” by Rostedt,
slides and video
“sched: RT throttling activated” or
“INFO: rcu_sched detected stalls on CPUs”
40
(partial) RT solution: ktimersoftd
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Wed Jan 20 2016 +0100
softirq: split timer softirqs out of ksoftirqd
With enough networking load it is possible that the system
never goes idle and schedules ksoftirqd and everything else
with a higher priority. One of the tasks left behind is one of
RCU's threads and so we see stalls and eventually run out of
memory. This patch moves the TIMER and HRTIMER
softirqs out of the `ksoftirqd` thread into its own `ktimersoftd`.
The former can now run SCHED_OTHER (same as mainline)
and the latter at SCHED_FIFO due to the wakeups. [ . . . ]
41
42
ftrace produces a copious amount of output
43
Investigating IRQs with eBPF: bcc
● BCC - Tools for BPF-based Linux analysis
● tools/ and examples/ illustrate interfaces to kprobes and
uprobes.
● BCC tools are:
– a convenient way to study arbitrary infrequent events
dynamically;
– based on dynamic code insertion using Clang Rewriter JIT;
– lightweight due to in-kernel data storage.
44
eBPF, IOvisor and IRQs: limitations
● JIT compiler is currently available for the x86-64, arm64, and
s390 architectures.
● No stack traces unless CONFIG_FRAME_POINTER=y
● Requires recent kernel, LLVM and Clang
● bcc/src/cc/export/helpers.h:
#ifdef __powerpc__
[ . . . ]
#elif defined(__x86_64__)
[ . . . ]
#else
#error "bcc does not support this platform yet"
#endif
45
bcc tips
● Kernel source must be present on the host where the probe
runs.
● /lib/modules/$(uname -r)/build/include/generated must exist.
● To switch between kernel branches and continue quickly using
bcc:
– run 'mrproper; make config; make'
– 'make' need only to populate include/generated in kernel source
before bcc again becomes available.
– 'make headers_install' as non-root user
SKIP
46
Get latest version of clang by compiling from source
(or from Debian Sid)
$ git clone http://llvm.org/git/llvm.git
$ cd llvm/tools
$ git clone --depth 1 http://llvm.org/git/clang.git
$ cd ..; mkdir build; cd build
$ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86"
$ make -j $(getconf _NPROCESSORS_ONLN)
SKIP
from samples/bpf/README.rst
47
Example: NAPI: changing the bottom half
DiO.Quincel-Operapropria,CCBY-SA4.0
ByMcSmit-Ownwork,CCBY-SA3.0
48
Quick NAPI refresher
The problem:
“High-speed networking can create thousands of interrupts per
second, all of which tell the system something it already knew: it has
lots of packets to process.”
The solution:
“Interrupt mitigation . . . NAPI allows drivers to run with (some)
interrupts disabled during times of high traffic, with a corresponding
decrease in system load.”
The implementation:
Poll the driver and drop packets without processing in the NIC if the
polling frequency necessitates.
net/core/dev.c in RT
49
Example: i.MX6 FEC RGMII NAPI turn-on
static irqreturn_t fec_enet_interrupt(int irq, void *dev_id)
[ . . . ]
if ((fep->work_tx || fep->work_rx) && fep->link) {
if (napi_schedule_prep(&fep->napi)) {
/* Disable the NAPI interrupts */
writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
__napi_schedule(&fep->napi);
}
}
== irq_forced_thread_fn() for irq/43
Back to threaded IRQs
50
Example: i.MX6 FEC RGMII NAPI turn-off
static int fec_enet_rx_napi(struct napi_struct *napi, int budget){
[ . . . ]
pkts = fec_enet_rx(ndev, budget);
if (pkts < budget) {
napi_complete(napi);
writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
}
}
netif_napi_add(ndev, &fep->napi, fec_enet_rx_napi,
NAPI_POLL_WEIGHT);
Interrupts are re-enabled when budget is not consumed.
Using existing tracepoints
● function_graph tracing causes a lot of overhead.
● How about napi_poll tracer in /sys/kernel/debug/events/napi?
– Fires constantly with any network traffic.
– Displays no obvious change in behavior when eth IRQ is
disabled and polling starts.
52
The Much Easier Way:
BCC on x86_64 with
4.6.2-rt5 and Clang-3.8
53
Handlind Eth IRQs in ksoftirqd on x86_64, but NAPI?
root $ ./stackcount.py e1000_receive_skb
Tracing 1 functions for "e1000_receive_skb"
^C
e1000_receive_skb
e1000e_poll
net_rx_action
do_current_softirqs
run_ksoftirqd
smpboot_thread_fn
kthread
ret_from_fork
1
e1000_receive_skb
e1000e_poll
net_rx_action
do_current_softirqs
__local_bh_enable
irq_forced_thread_fn
irq_thread
kthread
ret_from_fork
26469
running from
ksoftirqd, not from
hard IRQ handler.
Normal behavior:
packet handler runs
immediately after eth
IRQ, in its context.
COUNTS
4.6.2-rt5
54
Switch to NAPI on x86_64
[alison@sid]$ sudo modprobe kp_ksoft eth_irq_procid=1
[ ] __raise_softirq_irqoff_ksoft: 582 hits
[ ] kprobe at ffffffff81100920 unregistered
[alison@sid]$ sudo ./stacksnoop.py __raise_softirq_irqoff_ksoft
144.803096056 __raise_softirq_irqoff_ksoft
ffffffff81100921 __raise_softirq_irqoff_ksoft
ffffffff810feda9 do_current_softirqs
ffffffff810ffeae run_ksoftirqd
ffffffff8114d255 smpboot_thread_fn
ffffffff81144a99 kthread
ffffffff8205ed82 ret_from_fork
55
Same Experiment, but non-RT 4.6.2
Most frequent:
e1000_receive_skb
e1000e_poll
net_rx_action
__softirqentry_text_start
irq_exit
do_IRQ
ret_from_intr
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
1016045
Run in ksoftirqd:
e1000_receive_skb
e1000e_poll
net_rx_action
__softirqentry_text_start
run_ksoftirqd
smpboot_thread_fn
kthread
ret_from_fork
1162
At least 70 other call stacks observed in a few seconds.
SKIP
56
Due to handle_pending_softirqs(), any hard IRQ can run before a
given softirq (non-RT 4.6.2)
e1000_receive_skb
e1000e_poll
net_rx_action
__softirqentry_text_start
irq_exit
do_IRQ
ret_from_intr
pipe_write
__vfs_write
vfs_write
sys_write
entry_SYSCALL_64_fastpath
357
e1000_receive_skb
e1000e_poll
net_rx_action
__softirqentry_text_start
irq_exit
do_IRQ
ret_from_intr
__alloc_pages_nodemask
alloc_pages_vma
handle_pte_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
366
57
Same Experiment, but 4.6.2 with 'threadirqs' boot param
e1000_receive_skb
e1000e_poll
net_rx_action
__softirqentry_text_start
do_softirq_own_stack
do_softirq.part.16
__local_bh_enable_ip
irq_forced_thread_fn
irq_thread
kthread
ret_from_fork
569174
With 'threadirqs'
cmdline parameter at
boot.
Note:
no do_current_softirqs()
58
Investigation on ARM:
kprobe with 4.6.2-rt5
59
Documentation/kprobes.txt
“In general, you can install a probe
anywhere in the kernel.
In particular, you can probe interrupt handlers.”
Takeaway: not limited to existing tracepoints!
60
root@nitrogen6x:~# insmod 4.6.2/kp_raise_softirq_irqoff.ko
[ 1749.935955] Planted kprobe at 8012c1b4
[ 1749.936088] Internal error: Oops - undefined instruction: 0 [#1]
PREEMPT SMP ARM
[ 1749.936109] Modules linked in: kp_raise_softirq_irqoff(+)
[ 1749.936116] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.2
[ 1749.936119] Hardware name: Freescale i.MX6 Quad/DualLite
[ 1749.936131] PC is at __raise_softirq_irqoff+0x0/0xf0
[ 1749.936144] LR is at __napi_schedule+0x5c/0x7c
[ 1749.936766] Kernel panic - not syncing: Fatal exception in
interrupt
Not quite anywhere
Mainline stable 4.6.2
61
Adapt samples/kprobes/kprobe_example.c
/* For each probe you need to allocate a kprobe structure */
static struct kprobe kp = {
.symbol_name= "__raise_softirq_irqoff_ksoft",
};
/* kprobe post_handler: called after the probed instruction is executed */
static void handler_post(struct kprobe *p, struct pt_regs *regs,unsigned
long flags)
{
unsigned id = smp_processor_id();
/* change id to that where the eth IRQ is pinned */
if (id == 0) { pr_info("Switched to ethernet NAPI.n");
pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx,"
" lr = 0x%lx, cpsr = 0x%lxn",
p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr); }
}
code at Github
in net/core/dev.c
62
Watching net_rx_action() switch to NAPI
alison@laptop:~# make ARCH=arm CROSS_COMPILE=arm-linux-
gnueabi- samples/kprobes/ modules
root@nitrogen6x:~# modprobe kp_ksoft.ko eth_proc_id=1
root@nitrogen6x:~# dmesg | tail
[ 6548.644584] Planted kprobe at 8003344
root@nitrogen6x:~# dmesg | grep post_handler
root@nitrogen6x:~#
. . . . . Start DOS attack . . . Wait 15 seconds . . . .
root@nitrogen6x:~# dmesg | tail
[ 6548.644584] Planted kprobe at 80033440
[ 6617.858101] pre_handler: p->addr = 0x80033440, pc = 0x80033444,
lr = 0x80605ff0, cpsr = 0x20070193
[ 6617.858104] Switched to ethernet NAPI.
63
Another example of output
Insert/remove two probes during packet storm:
root@nitrogen6x:~# modprobe -r kp_ksoft
[ 232.471922] __raise_softirq_irqoff_ksoft: 14 hits
[ 232.471922] kprobe at 80033440 unregistered
root@nitrogen6x:~# modprobe -r kp_napi_complete
[ 287.225318] napi_complete_done: 1893005 hits
[ 287.262011] kprobe at 80605cc0 unregistered
64
Counting activation of two softirq execution paths
show you the codez
static struct kprobe kp = {
.symbol_name= "do_current_softirqs",
};
if (raised == NET_RX_SOFTIRQ) {
ti = current_thread_info();
task = ti->task;
if (chatty)
pr_debug("task->comm is %sn", task->comm);
if (strstr(task->comm, "ksoftirq"))
p->ksoftirqd_count++;
if (strstr(task->comm, "irq/"))
p->local_bh_enable_count++;
}
previously included results
modprobe kp_do_current_softirqs chatty=1
store counters in
struct kprobe{}
65
Summary
● IRQ handling involves a 'hard', fast part or 'top half' and a 'soft',
slower part or 'bottom half.'
● Hard IRQs include arch-dependent system features plus
software-generated IPIs.
● Soft IRQs may run directly after the hard IRQ that raises them,
or at a later time in ksoftirqd.
● Threaded, preemptible IRQs are a salient feature of RT Linux.
● The management of IRQs, as illustrated by NAPI's response to
DOS, remains challenging.
● If you can use bcc and eBPF, you should be!
66
Acknowledgements
Thanks to Sebastian Siewor, Brenden Blanco, Brendan Gregg,
Steven Rostedt and Dave Anders for advice and inspiration.
Special thanks to Joel Fernandes and Sarah Newman for detailed
feedback on an earlier version.
67
Useful Resources
● NAPI docs
● Documentation/kernel-per-CPU-kthreads
● Documentation/DocBook/genericirq.pdf
● Brendan Gregg's blog
● Tasklets and softirqs discussion at KLDP wiki
● #iovisor at OFTC IRC
● Alexei Starovoitov's 2015 LLVM Microconf slides
68
ARMv7 Core Registers
69
Softirqs that don't run in context of hard-IRQ handlers
run “on behalf of ksoftirqd”
static inline void ksoftirqd_set_sched_params(unsigned int cpu)
{
/* Take over all but timer pending softirqs when starting */
local_irq_disable();
current->softirqs_raised = local_softirq_pending() & ~TIMER_SOFTIRQS;
local_irq_enable();
}
static struct smp_hotplug_thread softirq_threads = {
.store = &ksoftirqd,
.setup = ksoftirqd_set_sched_params,
.thread_should_run = ksoftirqd_should_run,
.thread_fn = run_ksoftirqd,
.thread_comm = "ksoftirqd/%u",
};
70
Compare output to source with GDB
[alison@hildesheim linux-4.4.4 (trace_napi)]$ arm-linux-gnueabihf-gdb vmlinux
(gdb) p *(__raise_softirq_irqoff_ksoft)
$1 = {void (unsigned int)} 0x80033440 <__raise_softirq_irqoff_ksoft>
(gdb) l *(0x80605ff0)
0x80605ff0 is in net_rx_action (net/core/dev.c:4968).
4963 list_splice_tail(&repoll, &list);
4964 list_splice(&list, &sd->poll_list);
4965 if (!list_empty(&sd->poll_list))
4966 __raise_softirq_irqoff_ksoft(NET_RX_SOFTIRQ);
4967
4968 net_rps_action_and_irq_enable(sd);
4969 }

More Related Content

What's hot

Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and PropertiesSaadi Rahman
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introductionzenixls2
 
How to perform trouble shooting based on counters
How to perform trouble shooting based on countersHow to perform trouble shooting based on counters
How to perform trouble shooting based on countersAbdul Muin
 
huawei-lte-kpi-ref
huawei-lte-kpi-refhuawei-lte-kpi-ref
huawei-lte-kpi-refAbd Yehia
 
Vo lte(eran8.1 03)
Vo lte(eran8.1 03)Vo lte(eran8.1 03)
Vo lte(eran8.1 03)Musa Ahmed
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMCLinaro
 
Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022Stefano Stabellini
 
Module 4 Embedded Linux
Module 4 Embedded LinuxModule 4 Embedded Linux
Module 4 Embedded LinuxTushar B Kute
 
Commissioning flexi multiradio_bts_wcdma
Commissioning flexi multiradio_bts_wcdmaCommissioning flexi multiradio_bts_wcdma
Commissioning flexi multiradio_bts_wcdmaNgoMinh23
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Samsung Open Source Group
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_BootingRashila Rr
 
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for EmbeddedELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for EmbeddedStefano Stabellini
 
Android crash debugging
Android crash debuggingAndroid crash debugging
Android crash debuggingAshish Agrawal
 

What's hot (20)

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
 
How to perform trouble shooting based on counters
How to perform trouble shooting based on countersHow to perform trouble shooting based on counters
How to perform trouble shooting based on counters
 
huawei-lte-kpi-ref
huawei-lte-kpi-refhuawei-lte-kpi-ref
huawei-lte-kpi-ref
 
Vo lte(eran8.1 03)
Vo lte(eran8.1 03)Vo lte(eran8.1 03)
Vo lte(eran8.1 03)
 
Enm cli
Enm cliEnm cli
Enm cli
 
Linux PV on HVM
Linux PV on HVMLinux PV on HVM
Linux PV on HVM
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
 
Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022
 
Linux Device Tree
Linux Device TreeLinux Device Tree
Linux Device Tree
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Module 4 Embedded Linux
Module 4 Embedded LinuxModule 4 Embedded Linux
Module 4 Embedded Linux
 
Commissioning flexi multiradio_bts_wcdma
Commissioning flexi multiradio_bts_wcdmaCommissioning flexi multiradio_bts_wcdma
Commissioning flexi multiradio_bts_wcdma
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?
 
Embedded_Linux_Booting
Embedded_Linux_BootingEmbedded_Linux_Booting
Embedded_Linux_Booting
 
ELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for EmbeddedELC21: VM-to-VM Communication Mechanisms for Embedded
ELC21: VM-to-VM Communication Mechanisms for Embedded
 
Android crash debugging
Android crash debuggingAndroid crash debugging
Android crash debugging
 

Viewers also liked

Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embeddedAlison Chaiken
 
LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager Alison Chaiken
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...Principled Technologies
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction PerformanceLF Events
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cFrank Munz
 
NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxLF Events
 
Linux architecture
Linux architectureLinux architecture
Linux architecturemcganesh
 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementLF Events
 
WebLogic im Docker Container
WebLogic im Docker ContainerWebLogic im Docker Container
WebLogic im Docker ContainerAndreas Koop
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017Arun Gupta
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performanceForthscale
 
Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionLF Events
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Keith Resar
 

Viewers also liked (20)

Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embedded
 
LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager LISA15: systemd, the Next-Generation Linux System Manager
LISA15: systemd, the Next-Generation Linux System Manager
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...
Comparing file system performance: Red Hat Enterprise Linux 6 vs. Microsoft W...
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12cDocker in the Oracle Universe / WebLogic 12c / OFM 12c
Docker in the Oracle Universe / WebLogic 12c / OFM 12c
 
NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in Linux
 
Linux architecture
Linux architectureLinux architecture
Linux architecture
 
SR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and ImprovementSR-IOV ixgbe Driver Limitations and Improvement
SR-IOV ixgbe Driver Limitations and Improvement
 
WebLogic im Docker Container
WebLogic im Docker ContainerWebLogic im Docker Container
WebLogic im Docker Container
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
 
Advanced troubleshooting linux performance
Advanced troubleshooting linux performanceAdvanced troubleshooting linux performance
Advanced troubleshooting linux performance
 
Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with Encryption
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017
 

Similar to IRQs: the Hard, the Soft, the Threaded and the Preemptible

Beneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBhoomil Chavda
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOSICS
 
NIOS II Processor.ppt
NIOS II Processor.pptNIOS II Processor.ppt
NIOS II Processor.pptAtef46
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSnehaLatha68
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxKandavelEee
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
Talk 160920 @ Cat System Workshop
Talk 160920 @ Cat System WorkshopTalk 160920 @ Cat System Workshop
Talk 160920 @ Cat System WorkshopQuey-Liang Kao
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewLinaro
 
Bottom halves on Linux
Bottom halves on LinuxBottom halves on Linux
Bottom halves on LinuxChinmay V S
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMAn Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMjournalBEEI
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computingrinnocente
 
AMP Kynetics - ELC 2018 Portland
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 PortlandKynetics
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandNicola La Gloria
 

Similar to IRQs: the Hard, the Soft, the Threaded and the Preemptible (20)

Beneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handling
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOS
 
NIOS II Processor.ppt
NIOS II Processor.pptNIOS II Processor.ppt
NIOS II Processor.ppt
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Processor types
Processor typesProcessor types
Processor types
 
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
Softcore processor.pptxSoftcore processor.pptxSoftcore processor.pptx
 
unit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptxunit 1ARM INTRODUCTION.pptx
unit 1ARM INTRODUCTION.pptx
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
Talk 160920 @ Cat System Workshop
Talk 160920 @ Cat System WorkshopTalk 160920 @ Cat System Workshop
Talk 160920 @ Cat System Workshop
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overview
 
Bottom halves on Linux
Bottom halves on LinuxBottom halves on Linux
Bottom halves on Linux
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMAn Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Nodes and Networks for HPC computing
Nodes and Networks for HPC computingNodes and Networks for HPC computing
Nodes and Networks for HPC computing
 
AMP Kynetics - ELC 2018 Portland
AMP  Kynetics - ELC 2018 PortlandAMP  Kynetics - ELC 2018 Portland
AMP Kynetics - ELC 2018 Portland
 
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portlandAsymmetric Multiprocessing - Kynetics ELC 2018 portland
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
 

More from Alison Chaiken

Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABIAlison Chaiken
 
Supporting SW Update via u-boot and GPT/EFI
Supporting SW Update via u-boot and GPT/EFISupporting SW Update via u-boot and GPT/EFI
Supporting SW Update via u-boot and GPT/EFIAlison Chaiken
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsAlison Chaiken
 
V2X Communications: Getting our Cars Talking
V2X Communications: Getting our Cars TalkingV2X Communications: Getting our Cars Talking
V2X Communications: Getting our Cars TalkingAlison Chaiken
 
Practical Challenges to Deploying Highly Automated Vehicles
Practical Challenges to Deploying Highly Automated VehiclesPractical Challenges to Deploying Highly Automated Vehicles
Practical Challenges to Deploying Highly Automated VehiclesAlison Chaiken
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first secondAlison Chaiken
 
Functional AI and Pervasive Networking in Automotive
 Functional AI and Pervasive Networking in Automotive Functional AI and Pervasive Networking in Automotive
Functional AI and Pervasive Networking in AutomotiveAlison Chaiken
 
Flash in Vehicles: an End-User's Perspective
Flash in Vehicles: an End-User's PerspectiveFlash in Vehicles: an End-User's Perspective
Flash in Vehicles: an End-User's PerspectiveAlison Chaiken
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first secondAlison Chaiken
 
Automotive Linux, Cybersecurity and Transparency
Automotive Linux, Cybersecurity and TransparencyAutomotive Linux, Cybersecurity and Transparency
Automotive Linux, Cybersecurity and TransparencyAlison Chaiken
 
Automotive Grade Linux and systemd
Automotive Grade Linux and systemdAutomotive Grade Linux and systemd
Automotive Grade Linux and systemdAlison Chaiken
 
Systemd for developers
Systemd for developersSystemd for developers
Systemd for developersAlison Chaiken
 
Developing Automotive Linux
Developing Automotive LinuxDeveloping Automotive Linux
Developing Automotive LinuxAlison Chaiken
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveAlison Chaiken
 
Technology, Business and Regulation of the Connected Car
Technology, Business and Regulation of the Connected CarTechnology, Business and Regulation of the Connected Car
Technology, Business and Regulation of the Connected CarAlison Chaiken
 
Best practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-treeBest practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-treeAlison Chaiken
 
The “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingThe “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingAlison Chaiken
 
Developing automotive Linux
Developing automotive LinuxDeveloping automotive Linux
Developing automotive LinuxAlison Chaiken
 
Automotive Free Software 2013: "Right to Repair" and Privacy
Automotive Free Software 2013: "Right to Repair" and PrivacyAutomotive Free Software 2013: "Right to Repair" and Privacy
Automotive Free Software 2013: "Right to Repair" and PrivacyAlison Chaiken
 
Addressing the hard problems of automotive Linux: networking and IPC
Addressing the hard problems of automotive Linux: networking and IPCAddressing the hard problems of automotive Linux: networking and IPC
Addressing the hard problems of automotive Linux: networking and IPCAlison Chaiken
 

More from Alison Chaiken (20)

Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
 
Supporting SW Update via u-boot and GPT/EFI
Supporting SW Update via u-boot and GPT/EFISupporting SW Update via u-boot and GPT/EFI
Supporting SW Update via u-boot and GPT/EFI
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
 
V2X Communications: Getting our Cars Talking
V2X Communications: Getting our Cars TalkingV2X Communications: Getting our Cars Talking
V2X Communications: Getting our Cars Talking
 
Practical Challenges to Deploying Highly Automated Vehicles
Practical Challenges to Deploying Highly Automated VehiclesPractical Challenges to Deploying Highly Automated Vehicles
Practical Challenges to Deploying Highly Automated Vehicles
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first second
 
Functional AI and Pervasive Networking in Automotive
 Functional AI and Pervasive Networking in Automotive Functional AI and Pervasive Networking in Automotive
Functional AI and Pervasive Networking in Automotive
 
Flash in Vehicles: an End-User's Perspective
Flash in Vehicles: an End-User's PerspectiveFlash in Vehicles: an End-User's Perspective
Flash in Vehicles: an End-User's Perspective
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first second
 
Automotive Linux, Cybersecurity and Transparency
Automotive Linux, Cybersecurity and TransparencyAutomotive Linux, Cybersecurity and Transparency
Automotive Linux, Cybersecurity and Transparency
 
Automotive Grade Linux and systemd
Automotive Grade Linux and systemdAutomotive Grade Linux and systemd
Automotive Grade Linux and systemd
 
Systemd for developers
Systemd for developersSystemd for developers
Systemd for developers
 
Developing Automotive Linux
Developing Automotive LinuxDeveloping Automotive Linux
Developing Automotive Linux
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
 
Technology, Business and Regulation of the Connected Car
Technology, Business and Regulation of the Connected CarTechnology, Business and Regulation of the Connected Car
Technology, Business and Regulation of the Connected Car
 
Best practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-treeBest practices for long-term support and security of the device-tree
Best practices for long-term support and security of the device-tree
 
The “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I NetworkingThe “Telematics Horizon” V2V and V2I Networking
The “Telematics Horizon” V2V and V2I Networking
 
Developing automotive Linux
Developing automotive LinuxDeveloping automotive Linux
Developing automotive Linux
 
Automotive Free Software 2013: "Right to Repair" and Privacy
Automotive Free Software 2013: "Right to Repair" and PrivacyAutomotive Free Software 2013: "Right to Repair" and Privacy
Automotive Free Software 2013: "Right to Repair" and Privacy
 
Addressing the hard problems of automotive Linux: networking and IPC
Addressing the hard problems of automotive Linux: networking and IPCAddressing the hard problems of automotive Linux: networking and IPC
Addressing the hard problems of automotive Linux: networking and IPC
 

Recently uploaded

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 

Recently uploaded (20)

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 

IRQs: the Hard, the Soft, the Threaded and the Preemptible

  • 1. IRQs: the Hard, the Soft, the Threaded and the Preemptible Alison Chaiken Latest version of these slides alison@she-devel.com Embedded Linux Conference Europe Oct 11, 2016 Example code Version 2, actually presented live
  • 2. 2 Thursday October 13, 2016 15:30: Debugging Methodologies for Realtime Issues Joel Fernandes, Google this same room Knocking at Your Back Door (or How Dealing with Modern Interrupt Architectures can Affect Your Sanity) Marc Zyngier, ARM Ltd Hall Berlin A
  • 3. 3 Agenda ● Why do IRQs exist? ● About kinds of hard-IRQ handlers ● About softirqs and tasklets ● Differences in IRQ handling between RT and non-RT kernels ● Studying IRQ behavior via kprobes, event tracing, mpstat and eBPF ● Detailed example: when does NAPI take over for eth IRQs? “Kunst nicht lehrbar ist. Sie müssen wieder in der Werkstatt aufgehen.” -- Walter Gropius
  • 4. 4 Sample questions to be answered ● What's all stuff in /proc/interrupts anyway? ● What are IPIs and NMIs? ● Why are atomic operations expensive for ARM? ● Why are differences between mainline and RT for softirqs? ● What is 'current' task while in softirq? ● What function is running inside the threaded IRQs? ● When do we switch from individual hard IRQ processing to NAPI?
  • 5. 5 Interrupt handling: a brief pictorial summary DennisJarvis,http://tinyurl.com/jmkw23h onefulllife,http://tinyurl.com/j25lal5 Top half: the hard IRQ Bottom half: the soft IRQ
  • 6. 6 Why do we need interrupts at all? ● IRQs allow devices to notify the kernel that they require maintenance. ● Alternatives include – polling (servicing devices at a pre-configured interval); – traditional IPC to user-space drivers. ● Even a single-threaded RTOS or a bootloader needs a system timer.
  • 7. 7 Interrupts in Das U-boot ● For ARM, minimal IRQ support: – clear exceptions and reset timer (e.g., arch/arm/lib/interrupts_64.c or arch/arm/cpu/armv8/exceptions.S) ● For x86, interrupts are serviced via a stack-push followed by a jump (arch/x86/cpu/interrupts.c) – PCI has full-service interrupt handling (arch/x86/cpu/irq.c)
  • 8. 8 Interrupts in RTOS: Xenomai/ADEOS IPIPE From Adeos website, covered by GFDL
  • 9. 9 Zoology of IRQs ● Hard versus soft ● Level- vs. edge-triggered, simple, fast EOI or per-CPU ● Local vs. global; System vs. device ● Maskable vs. non-maskable ● Shared or not; chained or not ● Multiple interrupt controllers per SOC 'cat /proc/interrupts' or 'mpstat -A' ByBirdBeaksA.svg:L.Shyamalderivativework:Leptictidium(talk)-BirdBeaksA.svg,CCBY-SA2.5,https://commons.wikimedia.org/w/index.php?curid=6626434
  • 10. 10 ARM IPIs, from arch/arm/kernel/smp.c $ # cat /proc/interrupts, look at bottom void handle_IPI(int ipinr, struct pt_regs *regs) switch (ipinr) { case IPI_TIMER: tick_receive_broadcast(); case IPI_RESCHEDULE: scheduler_ipi(); case IPI_CALL_FUNC: generic_smp_call_function_interrupt(); case IPI_CPU_STOP: ipi_cpu_stop(cpu); case IPI_IRQ_WORK: irq_work_run(); case IPI_COMPLETION: ipi_complete(cpu); } Handlers are in kernel/sched/core.c
  • 11. 11 What is an NMI? ● A 'non-maskable' interrupt is related to: – HW problem: parity error, bus error, watchdog timer expiration . . . – also used by perf /* non-maskable interrupt control */ #define NMICR_NMIF 0x0001 /* NMI pin interrupt flag */ #define NMICR_WDIF 0x0002 /* watchdog timer overflow */ #define NMICR_ABUSERR 0x0008 /* async bus error flag */ From arch/arm/mn10300/include/asm/intctl-regs.h ByJohnJewell-Fenix,CCBY2.0,https://commons.wikimedia.org/w/index.php?curid=49332041 SKIP
  • 12. 12 How IRQ masking works arch/arm/include/asm/irqflags.h: #define arch_local_irq_enable arch_local_irq_enable static inline void arch_local_irq_enable(void) { asm volatile( "cpsie i @ arch_local_irq_enable" ::: "memory", "cc"); } arch/arm64/include/asm/irqflags.h: static inline void arch_local_irq_enable(void) { asm volatile( "msr daifclr, #2 // arch_local_irq_enable" ::: "memory"); } arch/x86/include/asm/irqflags.h: static inline notrace void arch_local_irq_enable(void) { native_irq_enable(); } static inline void native_irq_enable(void) { asm volatile("sti": : :"memory"); } “change processor state” only current core SKIP
  • 13. 13 x86's Infamous System Management Interrupt ● SMI jumps out of kernel into System Management Mode – controlled by System Management Engine (Skochinsky) ● Identified as security vulnerability by Invisible Things Lab ● Not directly visible to Linux ● Traceable via hw_lat detector (sort of) [RFC][PATCH 1/3] tracing: Added hardware latency tracer, Aug 4 From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> The hardware latency tracer has been in the PREEMPT_RT patch for some time. It is used to detect possible SMIs or any other hardware interruptions that the kernel is unaware of. Note, NMIs may also be detected, but that may be good to note as well.
  • 14. 14 ARM's Fast Interrupt reQuest ● An NMI with optimized handling due to dedicated registers. ● Underutilized by Linux drivers. ● Serves as the basis for Android's fiq_debugger.
  • 15. 15 IRQ 'Domains' Correspond to Different INTC's CONFIG_IRQ_DOMAIN_DEBUG: This option will show the mapping relationship between hardware irq numbers and Linux irq numbers. The mapping is exposed via debugfs in the file "irq_domain_mapping". Note: ● There are a lot more IRQs than in /proc/interrupts. ● There are more IRQs in /proc/interrupts than in 'ps axl | grep irq'. ● Some IRQs are not used. ● Some are processor-reserved and not kernel-managed. SKIP
  • 16. Example: i.MX6 General Power Controller Unmasked IRQs can wakeup sleeping power domains.
  • 17. Threaded IRQs in RT kernel ps axl | grep irq with both RT and non-RT kernels. Handling IRQs as kernel threads allows priority and CPU affinity to be managed individually. IRQ handlers running in threads can themselves be interrupted.
  • 19. What function do threaded IRQs run? /* request_threaded_irq - allocate an interrupt line * @handler: Function to be called when the IRQ occurs. * Primary handler for threaded interrupts * If NULL and thread_fn != NULL the default * primary handler is installed * * @thread_fn: Function called from the irq handler thread * If NULL, no irq thread is created */ Even in mainline, request_irq() = requested_threaded_irq() with NULL thread_fn. EXAMPLE
  • 20. 20 Result: -- irq_default_primary_handler() runs in interrupt context. -- All it does is wake up the thread. -- Then handler runs in irq/<name> thread. Result: -- handler runs in interrupt context. -- thread_fn runs in irq/<name> thread. request_irq(handler) request_threaded_irq(handler, NULL) direct invocation of request_threaded_irq()CASE 1 irq_setup_forced_threading() CASE 0 indirect invocation of request_threaded_irq()
  • 21. 21 Threaded IRQs in RT, mainline and mainline with “threadirqs” boot param ● RT: all hard-IRQ handlers that don't set IRQF_NOTHREAD run in threads. ● Mainline: only those hard-IRQ handlers whose registration requests explicitly call request_threaded_irq() run in threads. ● Mainline with threadirqs kernel cmdline: like RT, but CPU affinity of IRQ threads cannot be set. genirq: Force interrupt thread on RT genirq: Do not invoke the affinity callback via a workqueue on RT
  • 22. 22 Shared interrupts: mmc driver ● Check 'ps axl | grep irq | grep mmc': 1 0 122 2 -51 0 - S ? 0:00 [irq/16-mmc0] 1 0 123 2 -50 0 - S ? 0:00 [irq/16-s-mmc0] ● 'cat /proc/interrupts': mmc and ehci-hcd share an IRQ line 16: 204 IR-IO-APIC 16-fasteoi mmc0,ehci_hcd:usb3 ● drivers/mmc/host/sdhci.c: ret = request_threaded_irq(host->irq, sdhci_irq, sdhci_thread_irq, IRQF_SHARED,mmc_hostname(mmc), host); handler thread_fn
  • 23. Why are atomic operations more expensive (ARM)? arch/arm/include/asm/atomic.h: static inline void atomic_##op(int i, atomic_t *v) { raw_local_irq_save(flags); v->counter c_op i; raw_local_irq_restore(flags); } include/linux/irqflags.h: #define raw_local_irq_save(flags) do { flags = arch_local_irq_save(); } while (0) arch/arm/include/asm/atomic.h: /* Save the current interrupt enable state & disable IRQs */ static inline unsigned long arch_local_irq_save(void) { . . . }
  • 24. 24 Introduction to softirqs In kernel/softirq.c: const char * const softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", "TASKLET", "SCHED", "HRTIMER", "RCU" }; Tasklet interface Raised by devices Kernel housekeeping In ksoftirqd, softirqs are serviced in the listed order. IRQ_POLL since 4.4 Gone since 4.1
  • 25. 25 What are tasklets? ● Tasklets perform deferred work not handled by other softirqs. ● Examples: crypto, USB, DMA, keyboard . . . ● More latency-sensitive drivers (sound, PCI) are part of tasklet_hi_vec. ● Any driver can create a tasklet. ● tasklet_hi_schedule() or tasklet_schedule() are called directly by ISR. const char * const softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", "TASKLET", "SCHED", "HRTIMER", "RCU" };
  • 26. 26 [alison@sid ~]$ sudo mpstat -I SCPU Linux 4.1.0-rt17+ (sid) 05/29/2016 _x86_64_(4 CPU) CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s 0 0.03 249.84 0.00 0.11 19.96 0.43 238.75 0.68 0.00 1 0.01 249.81 0.38 1.00 38.25 1.98 236.69 0.53 0.00 2 0.02 249.72 0.19 0.11 53.34 3.83 233.94 1.44 0.00 3 0.59 249.72 0.01 2.05 19.34 2.63 234.04 1.72 0.00 Linux 4.6.0+ (sid) 05/29/2016 _x86_64_(4 CPU) CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s 0 0.26 16.13 0.20 0.33 40.90 0.73 9.18 0.00 19.04 1 0.00 9.45 0.00 1.31 14.38 0.61 7.85 0.00 17.88 2 0.01 15.38 0.00 0.20 0.08 0.29 13.21 0.00 16.24 3 0.00 9.77 0.00 0.05 0.15 0.00 8.50 0.00 15.32 Linux 4.1.18-rt17-00028-g8da2a20 (vpc23) 06/04/16 _armv7l_ (2 CPU) CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s 0 0.00 999.72 0.18 9.54 0.00 89.29 191.69 261.06 0.00 1 0.00 999.35 0.00 16.81 0.00 15.13 126.75 260.89 0.00 Linux 4.7.0 (nitrogen6x) 07/31/16 _armv7l_ (4 CPU) CPU HI/s TIMER/s NET_TX/s NET_RX/s BLOCK/s TASKLET/s SCHED/s HRTIMER/s RCU/s 0 0.00 2.84 0.50 40.69 0.00 0.38 2.78 0.00 3.03 1 0.00 89.00 0.00 0.00 0.00 0.00 0.64 0.00 46.22 2 0.00 16.59 0.00 0.00 0.00 0.00 0.23 0.00 3.05 3 0.00 10.22 0.00 0.00 0.00 0.00 0.25 0.00 1.45 SKIP
  • 27. 27 Two paths by which softirqs run Related demo and sample code system management thread run_ksoftirqd() Hard-IRQ handler system management thread exhausts timeslice? local_bh_enable() raises softirqraises softirq __do_softirq()do_current_softirqs() (RT) or __do_softirq() CASE 0 (left) CASE 1 (right)
  • 28. 28 Case 0: Run softirqs at exit of a hard-IRQ handler while (current->softirqs_raised) { i = __ffs(current->softirqs_raised); do_single_softirq(i); } RT (4.6.2-rt5) non-RT (4.6.2) local_bh_enable(); local_bh_enable(); __local_bh_enable(); do_softirq(); do_current_softirqs(); __do_softirq(); Run softirqs raised in the current context. Run all pending softirqs up to MAX_IRQ_RESTART. handle_pending_softirqs(); handle_softirq(); while ((softirq_bit = ffs(pending))) handle_softirq(); EXAMPLE
  • 29. 29 Case 1: Scheduler runs the rest from ksoftirqd RT (4.6.2-rt5) non-RT (4.6.2) do_softirq(); __do_softirq(); h = softirq_vec; while ((softirq_bit = ffs(pending))) { h += softirq_bit - 1; h->action(h); } run_ksoftirqd(); run_ksoftirqd(); do_current_softirqs() [ where current == ksoftirqd ]
  • 30. 30 4.6.2-rt5: [ 6937.393805] e1000e_poll+0x126/0xa70 [e1000e] [ 6937.393808] check_preemption_disabled+0xab/0x240 [ 6937.393815] net_rx_action+0x53e/0xc90 [ 6937.393824] do_current_softirqs+0x488/0xc30 [ 6937.393831] do_current_softirqs+0x5/0xc30 [ 6937.393836] __local_bh_enable+0xf2/0x1a0 [ 6937.393840] irq_forced_thread_fn+0x91/0x140 [ 6937.393845] irq_thread+0x170/0x310 [ 6937.393848] irq_finalize_oneshot.part.6+0x4f0/0x4f0 [ 6937.393853] irq_forced_thread_fn+0x140/0x140 [ 6937.393857] irq_thread_check_affinity+0xa0/0xa0 [ 6937.393862] kthread+0x12b/0x1b0 } hard-IRQ handler kick-off softIRQ } 4.7 mainline: [11661.191187] e1000e_poll+0x126/0xa70 [e1000e] [11661.191197] net_rx_action+0x52e/0xcd0 [11661.191206] __do_softirq+0x15c/0x5ce [11661.191215] irq_exit+0xa3/0xd0 [11661.191222] do_IRQ+0x62/0x110 [11661.191230] common_interrupt+0x82/0x82 hard-IRQ handler } kick off soft IRQ RT vs Mainline: entering softirq handler SKIP
  • 31. 31 Summary of softirq execution paths Case 0: Behavior of local_bh_enable() differs significantly between RT and mainline kernel. Case 1: Behavior of ksoftirqd itself is mostly the same (note discussion of ktimersoftd below).
  • 32. 32 What is 'current'? include/asm-generic/current.h: #define get_current() (current_thread_info()->task) #define current get_current() arch/arm/include/asm/thread_info.h: static inline struct thread_info *current_thread_info(void) { return (struct thread_info *) (current_stack_pointer & ~(THREAD_SIZE - 1)); } arch/x86/include/asm/thread_info.h: static inline struct thread_info *current_thread_info(void) { return (struct thread_info *)(current_top_of_stack() - THREAD_SIZE);} In do_current_softirqs(), current is the threaded IRQ task.
  • 33. 33 What is 'current'? part 2 arch/arm/include/asm/thread_info.h: /* * how to get the current stack pointer in C */ register unsigned long current_stack_pointer asm ("sp"); arch/x86/include/asm/thread_info.h: static inline unsigned long current_stack_pointer(void) { unsigned long sp; #ifdef CONFIG_X86_64 asm("mov %%rsp,%0" : "=g" (sp)); #else asm("mov %%esp,%0" : "=g" (sp)); #endif return sp; } SKIP
  • 36. 36 Do timers, scheduler, RCU ever run as part of do_current_softirqs? Examples: -- every jiffy, raise_softirq_irqoff(HRTIMER_SOFTIRQ); -- scheduler_ipi() for NOHZ calls raise_softirq_irqoff(SCHED_SOFTIRQ); -- rcu_bh_qs() calls raise_softirq(RCU_SOFTIRQ); These run when ksoftirqd is current.
  • 37. 37 Demo: kprobe on do_current_softirqs() for RT kernel ● At Github ● Counts calls to do_current_softirqs() from ksoftirqd and from a hard-IRQ hander. ● Tested on 4.4.4-rt11 with Boundary Devices' Nitrogen i.MX6. Output showing what task of 'current_thread' is: [ 52.841425] task->comm is ksoftirqd/1 [ 70.051424] task->comm is ksoftirqd/1 [ 70.171421] task->comm is ksoftirqd/1 [ 105.981424] task->comm is ksoftirqd/1 [ 165.260476] task->comm is irq/43-2188000. [ 165.261406] task->comm is ksoftirqd/1 [ 225.321529] task->comm is irq/43-2188000. explanation
  • 38. 38 struct task_struct { #ifdef CONFIG_PREEMPT_RT_BASE struct rcu_head put_rcu; int softirq_nestcnt; unsigned int softirqs_raised; #endif }; Softirqs can be pre-empted with PREEMPT_RT include/linux/sched.h:
  • 39. 39 RT-Linux headache: 'softirq starvation' ● ksoftirqd scarcely gets to run. ● Events that are triggered by timer interrupt won't happen. ● Example: main event loop in userspace did not run due to missed timer ticks. Reference: “Understanding a Real-Time System” by Rostedt, slides and video “sched: RT throttling activated” or “INFO: rcu_sched detected stalls on CPUs”
  • 40. 40 (partial) RT solution: ktimersoftd Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Wed Jan 20 2016 +0100 softirq: split timer softirqs out of ksoftirqd With enough networking load it is possible that the system never goes idle and schedules ksoftirqd and everything else with a higher priority. One of the tasks left behind is one of RCU's threads and so we see stalls and eventually run out of memory. This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd` thread into its own `ktimersoftd`. The former can now run SCHED_OTHER (same as mainline) and the latter at SCHED_FIFO due to the wakeups. [ . . . ]
  • 41. 41
  • 42. 42 ftrace produces a copious amount of output
  • 43. 43 Investigating IRQs with eBPF: bcc ● BCC - Tools for BPF-based Linux analysis ● tools/ and examples/ illustrate interfaces to kprobes and uprobes. ● BCC tools are: – a convenient way to study arbitrary infrequent events dynamically; – based on dynamic code insertion using Clang Rewriter JIT; – lightweight due to in-kernel data storage.
  • 44. 44 eBPF, IOvisor and IRQs: limitations ● JIT compiler is currently available for the x86-64, arm64, and s390 architectures. ● No stack traces unless CONFIG_FRAME_POINTER=y ● Requires recent kernel, LLVM and Clang ● bcc/src/cc/export/helpers.h: #ifdef __powerpc__ [ . . . ] #elif defined(__x86_64__) [ . . . ] #else #error "bcc does not support this platform yet" #endif
  • 45. 45 bcc tips ● Kernel source must be present on the host where the probe runs. ● /lib/modules/$(uname -r)/build/include/generated must exist. ● To switch between kernel branches and continue quickly using bcc: – run 'mrproper; make config; make' – 'make' need only to populate include/generated in kernel source before bcc again becomes available. – 'make headers_install' as non-root user SKIP
  • 46. 46 Get latest version of clang by compiling from source (or from Debian Sid) $ git clone http://llvm.org/git/llvm.git $ cd llvm/tools $ git clone --depth 1 http://llvm.org/git/clang.git $ cd ..; mkdir build; cd build $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" $ make -j $(getconf _NPROCESSORS_ONLN) SKIP from samples/bpf/README.rst
  • 47. 47 Example: NAPI: changing the bottom half DiO.Quincel-Operapropria,CCBY-SA4.0 ByMcSmit-Ownwork,CCBY-SA3.0
  • 48. 48 Quick NAPI refresher The problem: “High-speed networking can create thousands of interrupts per second, all of which tell the system something it already knew: it has lots of packets to process.” The solution: “Interrupt mitigation . . . NAPI allows drivers to run with (some) interrupts disabled during times of high traffic, with a corresponding decrease in system load.” The implementation: Poll the driver and drop packets without processing in the NIC if the polling frequency necessitates. net/core/dev.c in RT
  • 49. 49 Example: i.MX6 FEC RGMII NAPI turn-on static irqreturn_t fec_enet_interrupt(int irq, void *dev_id) [ . . . ] if ((fep->work_tx || fep->work_rx) && fep->link) { if (napi_schedule_prep(&fep->napi)) { /* Disable the NAPI interrupts */ writel(FEC_ENET_MII, fep->hwp + FEC_IMASK); __napi_schedule(&fep->napi); } } == irq_forced_thread_fn() for irq/43 Back to threaded IRQs
  • 50. 50 Example: i.MX6 FEC RGMII NAPI turn-off static int fec_enet_rx_napi(struct napi_struct *napi, int budget){ [ . . . ] pkts = fec_enet_rx(ndev, budget); if (pkts < budget) { napi_complete(napi); writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK); } } netif_napi_add(ndev, &fep->napi, fec_enet_rx_napi, NAPI_POLL_WEIGHT); Interrupts are re-enabled when budget is not consumed.
  • 51. Using existing tracepoints ● function_graph tracing causes a lot of overhead. ● How about napi_poll tracer in /sys/kernel/debug/events/napi? – Fires constantly with any network traffic. – Displays no obvious change in behavior when eth IRQ is disabled and polling starts.
  • 52. 52 The Much Easier Way: BCC on x86_64 with 4.6.2-rt5 and Clang-3.8
  • 53. 53 Handlind Eth IRQs in ksoftirqd on x86_64, but NAPI? root $ ./stackcount.py e1000_receive_skb Tracing 1 functions for "e1000_receive_skb" ^C e1000_receive_skb e1000e_poll net_rx_action do_current_softirqs run_ksoftirqd smpboot_thread_fn kthread ret_from_fork 1 e1000_receive_skb e1000e_poll net_rx_action do_current_softirqs __local_bh_enable irq_forced_thread_fn irq_thread kthread ret_from_fork 26469 running from ksoftirqd, not from hard IRQ handler. Normal behavior: packet handler runs immediately after eth IRQ, in its context. COUNTS 4.6.2-rt5
  • 54. 54 Switch to NAPI on x86_64 [alison@sid]$ sudo modprobe kp_ksoft eth_irq_procid=1 [ ] __raise_softirq_irqoff_ksoft: 582 hits [ ] kprobe at ffffffff81100920 unregistered [alison@sid]$ sudo ./stacksnoop.py __raise_softirq_irqoff_ksoft 144.803096056 __raise_softirq_irqoff_ksoft ffffffff81100921 __raise_softirq_irqoff_ksoft ffffffff810feda9 do_current_softirqs ffffffff810ffeae run_ksoftirqd ffffffff8114d255 smpboot_thread_fn ffffffff81144a99 kthread ffffffff8205ed82 ret_from_fork
  • 55. 55 Same Experiment, but non-RT 4.6.2 Most frequent: e1000_receive_skb e1000e_poll net_rx_action __softirqentry_text_start irq_exit do_IRQ ret_from_intr cpuidle_enter call_cpuidle cpu_startup_entry start_secondary 1016045 Run in ksoftirqd: e1000_receive_skb e1000e_poll net_rx_action __softirqentry_text_start run_ksoftirqd smpboot_thread_fn kthread ret_from_fork 1162 At least 70 other call stacks observed in a few seconds. SKIP
  • 56. 56 Due to handle_pending_softirqs(), any hard IRQ can run before a given softirq (non-RT 4.6.2) e1000_receive_skb e1000e_poll net_rx_action __softirqentry_text_start irq_exit do_IRQ ret_from_intr pipe_write __vfs_write vfs_write sys_write entry_SYSCALL_64_fastpath 357 e1000_receive_skb e1000e_poll net_rx_action __softirqentry_text_start irq_exit do_IRQ ret_from_intr __alloc_pages_nodemask alloc_pages_vma handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault 366
  • 57. 57 Same Experiment, but 4.6.2 with 'threadirqs' boot param e1000_receive_skb e1000e_poll net_rx_action __softirqentry_text_start do_softirq_own_stack do_softirq.part.16 __local_bh_enable_ip irq_forced_thread_fn irq_thread kthread ret_from_fork 569174 With 'threadirqs' cmdline parameter at boot. Note: no do_current_softirqs()
  • 59. 59 Documentation/kprobes.txt “In general, you can install a probe anywhere in the kernel. In particular, you can probe interrupt handlers.” Takeaway: not limited to existing tracepoints!
  • 60. 60 root@nitrogen6x:~# insmod 4.6.2/kp_raise_softirq_irqoff.ko [ 1749.935955] Planted kprobe at 8012c1b4 [ 1749.936088] Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM [ 1749.936109] Modules linked in: kp_raise_softirq_irqoff(+) [ 1749.936116] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.2 [ 1749.936119] Hardware name: Freescale i.MX6 Quad/DualLite [ 1749.936131] PC is at __raise_softirq_irqoff+0x0/0xf0 [ 1749.936144] LR is at __napi_schedule+0x5c/0x7c [ 1749.936766] Kernel panic - not syncing: Fatal exception in interrupt Not quite anywhere Mainline stable 4.6.2
  • 61. 61 Adapt samples/kprobes/kprobe_example.c /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { .symbol_name= "__raise_softirq_irqoff_ksoft", }; /* kprobe post_handler: called after the probed instruction is executed */ static void handler_post(struct kprobe *p, struct pt_regs *regs,unsigned long flags) { unsigned id = smp_processor_id(); /* change id to that where the eth IRQ is pinned */ if (id == 0) { pr_info("Switched to ethernet NAPI.n"); pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx," " lr = 0x%lx, cpsr = 0x%lxn", p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr); } } code at Github in net/core/dev.c
  • 62. 62 Watching net_rx_action() switch to NAPI alison@laptop:~# make ARCH=arm CROSS_COMPILE=arm-linux- gnueabi- samples/kprobes/ modules root@nitrogen6x:~# modprobe kp_ksoft.ko eth_proc_id=1 root@nitrogen6x:~# dmesg | tail [ 6548.644584] Planted kprobe at 8003344 root@nitrogen6x:~# dmesg | grep post_handler root@nitrogen6x:~# . . . . . Start DOS attack . . . Wait 15 seconds . . . . root@nitrogen6x:~# dmesg | tail [ 6548.644584] Planted kprobe at 80033440 [ 6617.858101] pre_handler: p->addr = 0x80033440, pc = 0x80033444, lr = 0x80605ff0, cpsr = 0x20070193 [ 6617.858104] Switched to ethernet NAPI.
  • 63. 63 Another example of output Insert/remove two probes during packet storm: root@nitrogen6x:~# modprobe -r kp_ksoft [ 232.471922] __raise_softirq_irqoff_ksoft: 14 hits [ 232.471922] kprobe at 80033440 unregistered root@nitrogen6x:~# modprobe -r kp_napi_complete [ 287.225318] napi_complete_done: 1893005 hits [ 287.262011] kprobe at 80605cc0 unregistered
  • 64. 64 Counting activation of two softirq execution paths show you the codez static struct kprobe kp = { .symbol_name= "do_current_softirqs", }; if (raised == NET_RX_SOFTIRQ) { ti = current_thread_info(); task = ti->task; if (chatty) pr_debug("task->comm is %sn", task->comm); if (strstr(task->comm, "ksoftirq")) p->ksoftirqd_count++; if (strstr(task->comm, "irq/")) p->local_bh_enable_count++; } previously included results modprobe kp_do_current_softirqs chatty=1 store counters in struct kprobe{}
  • 65. 65 Summary ● IRQ handling involves a 'hard', fast part or 'top half' and a 'soft', slower part or 'bottom half.' ● Hard IRQs include arch-dependent system features plus software-generated IPIs. ● Soft IRQs may run directly after the hard IRQ that raises them, or at a later time in ksoftirqd. ● Threaded, preemptible IRQs are a salient feature of RT Linux. ● The management of IRQs, as illustrated by NAPI's response to DOS, remains challenging. ● If you can use bcc and eBPF, you should be!
  • 66. 66 Acknowledgements Thanks to Sebastian Siewor, Brenden Blanco, Brendan Gregg, Steven Rostedt and Dave Anders for advice and inspiration. Special thanks to Joel Fernandes and Sarah Newman for detailed feedback on an earlier version.
  • 67. 67 Useful Resources ● NAPI docs ● Documentation/kernel-per-CPU-kthreads ● Documentation/DocBook/genericirq.pdf ● Brendan Gregg's blog ● Tasklets and softirqs discussion at KLDP wiki ● #iovisor at OFTC IRC ● Alexei Starovoitov's 2015 LLVM Microconf slides
  • 69. 69 Softirqs that don't run in context of hard-IRQ handlers run “on behalf of ksoftirqd” static inline void ksoftirqd_set_sched_params(unsigned int cpu) { /* Take over all but timer pending softirqs when starting */ local_irq_disable(); current->softirqs_raised = local_softirq_pending() & ~TIMER_SOFTIRQS; local_irq_enable(); } static struct smp_hotplug_thread softirq_threads = { .store = &ksoftirqd, .setup = ksoftirqd_set_sched_params, .thread_should_run = ksoftirqd_should_run, .thread_fn = run_ksoftirqd, .thread_comm = "ksoftirqd/%u", };
  • 70. 70 Compare output to source with GDB [alison@hildesheim linux-4.4.4 (trace_napi)]$ arm-linux-gnueabihf-gdb vmlinux (gdb) p *(__raise_softirq_irqoff_ksoft) $1 = {void (unsigned int)} 0x80033440 <__raise_softirq_irqoff_ksoft> (gdb) l *(0x80605ff0) 0x80605ff0 is in net_rx_action (net/core/dev.c:4968). 4963 list_splice_tail(&repoll, &list); 4964 list_splice(&list, &sd->poll_list); 4965 if (!list_empty(&sd->poll_list)) 4966 __raise_softirq_irqoff_ksoft(NET_RX_SOFTIRQ); 4967 4968 net_rps_action_and_irq_enable(sd); 4969 }