SlideShare a Scribd company logo
1 of 104
Download to read offline
©2019 VMware, Inc.
ftrace
Where modifying a running
kernel all started!
Steven Rostedt
Open Source Engineer
rostedt@goodmis.org / srostedt@vmware.com
2©2019 VMware, Inc.
Ftrace Function hooks
●
Allows attaching to a function in the kernel
– Function Tracer
– Function Graph Tracer
– Perf
– Stack Tracer
– Kprobes
– SystemTap
– Pstore
3©2019 VMware, Inc.
Function Tracing
# cd /sys/kernel/tracing
# echo function > current_tracer
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 159693/4101675 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# ||||/ delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
cat-3432 [002] d...... 60071.538270: __rcu_read_unlock <-__is_insn_slot_addr
cat-3432 [002] d...... 60071.538270: is_bpf_text_address <-kernel_text_address
cat-3432 [002] d...... 60071.538270: __rcu_read_lock <-is_bpf_text_address
cat-3432 [002] d...... 60071.538271: bpf_prog_kallsyms_find <-is_bpf_text_address
cat-3432 [002] d...... 60071.538271: __rcu_read_unlock <-is_bpf_text_address
cat-3432 [002] d...... 60071.538271: init_object <-alloc_debug_processing
cat-3432 [002] d...... 60071.538271: deactivate_slab.isra.74 <-___slab_alloc
cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74
cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74
cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74
cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74
cat-3432 [002] d...... 60071.538273: preempt_count_add <-deactivate_slab.isra.74
cat-3432 [002] d...1.. 60071.538273: preempt_count_sub <-deactivate_slab.isra.74
cat-3432 [002] d...... 60071.538273: _raw_spin_lock <-deactivate_slab.isra.74
cat-3432 [002] d...... 60071.538273: preempt_count_add <-_raw_spin_lock
cat-3432 [002] d...1.. 60071.538273: do_raw_spin_trylock <-_raw_spin_lock
4©2019 VMware, Inc.
Function Graph Tracing
# cd /sys/kernel/tracing
# echo function_graph > current_tracer
# cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
3) 0.868 us | } /* rt_spin_lock_slowlock_locked */
3) | _raw_spin_unlock_irqrestore() {
3) 0.294 us | do_raw_spin_unlock();
3) 0.374 us | preempt_count_sub();
3) 1.542 us | }
3) 0.198 us | put_pid();
3) 5.727 us | } /* rt_spin_lock_slowlock */
3) + 18.867 us | } /* rt_spin_lock */
3) | rt_spin_unlock() {
3) | rt_mutex_futex_unlock() {
3) | _raw_spin_lock_irqsave() {
3) 0.224 us | preempt_count_add();
3) 0.376 us | do_raw_spin_trylock();
3) 1.767 us | }
3) 0.264 us | __rt_mutex_unlock_common();
3) | _raw_spin_unlock_irqrestore() {
3) 0.278 us | do_raw_spin_unlock();
3) 0.249 us | preempt_count_sub();
3) 1.421 us | }
3) 4.565 us | }
3) | migrate_enable() {
3) 0.275 us | preempt_count_add();
5©2019 VMware, Inc.
Dynamic Function Tracing
# cd /sys/kernel/tracing
# echo ‘*sched*’ > set_ftrace_filter
# echo function > current_tracer
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 35104/35104 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# ||||/ delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
bash-1294 [000] d..h... 60276.948739: tick_sched_timer <-__hrtimer_run_queues
bash-1294 [000] d..h... 60276.948741: tick_sched_do_timer <-tick_sched_timer
bash-1294 [000] d..h... 60276.948743: tick_sched_handle <-tick_sched_timer
bash-1294 [000] d..h... 60276.948745: rcu_sched_clock_irq <-update_process_times
bash-1294 [000] d..h... 60276.948745: scheduler_tick <-update_process_times
bash-1294 [000] d...2.. 60276.948754: resched_curr_lazy <-check_preempt_wakeup
bash-1294 [000] d.L.... 60276.948756: preempt_schedule_irq <-
restore_regs_and_return_to_kernel
ksoftirqd/0-9 [000] ....... 60276.948769: schedule <-smpboot_thread_fn
bash-1294 [000] d...311 60276.948908: resched_curr <-check_preempt_curr
bash-1294 [000] d...311 60276.948908: native_smp_send_reschedule <-check_preempt_curr
<idle>-0 [003] dn..1.. 60276.948922: smp_reschedule_interrupt <-reschedule_interrupt
<idle>-0 [003] dn..1.. 60276.948923: scheduler_ipi <-reschedule_interrupt
6©2019 VMware, Inc.
How does it work?
●
gcc’s profiler option: -pg
– Adds a special “mcount” call to all non-inlined functions
– mcount is a trampoline to jump to C code
– All non-inlined functions call mcount near the beginning (after frame setup)
– Requires frame pointers
7©2019 VMware, Inc.
How does it work?
●
gcc’s profiler option: -pg
– Adds a special “mcount” call to all non-inlined functions
– mcount is a trampoline to jump to C code
– All non-inlined functions call mcount near the beginning (after frame setup)
– Requires frame pointers
●
x86 now only uses: -pg -mfentry
– Adds a special “__fentry__” call to all non-inlined functions
– __fentry__ is also a trampoline to jump to C code
– All non-inlined function call __fentry__ at the beginning of the function
– No need to have frame pointers
8©2019 VMware, Inc.
A Function Call
asmlinkage __visible void __sched schedule(void)
{
struct task_struct *tsk = current;
sched_submit_work(tsk);
do {
preempt_disable();
__schedule(false);
sched_preempt_enable_no_resched();
} while (need_resched());
sched_update_worker(tsk);
}
9©2019 VMware, Inc.
WARNING!
The following slides
may not be suitable for
some audiences
10©2019 VMware, Inc.
WARNING!
The next slide contains
ASSEMBLY!
11©2019 VMware, Inc.
Disassembled Function Call
<schedule>:
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff819dbce6: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff819dbd03 <schedule+0x23>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff819dbd42 <schedule+0x62>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff819dbd22 <schedule+0x42>
31 ff xor %edi,%edi
e8 a6 f8 ff ff callq ffffffff819db5b0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
12©2019 VMware, Inc.
Disassembled Function Call
<schedule>:
e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__>
ffffffff81a14911: R_X86_64_PLT32 __fentry__-0x4
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
With -pg -mfentry options
13©2019 VMware, Inc.
At Kernel Boot Up
<schedule>:
callq <__fentry__>
push %rbx <__fentry__>:
retq
14©2019 VMware, Inc.
Where are all the __fentry__ callers?
Can’t just leave them there

Too much overhead

Just calling and doing a return adds 13% overhead!
Need to convert them to nops at boot up
Need to know where they are
Best to find them at compile time!
15©2019 VMware, Inc.
recordmcount
scripts/recordmcount.c (and there’s a perl version too!)
Reads the object files one at a time
Reads the relocation tables

Finds all the calls to __fentry__

Creates a table (array)

Links them back into the object file

New section called __mcount_loc
– Even for __fentry__ locations
16©2019 VMware, Inc.
recordmcount
scripts/recordmcount.c (and there’s a perl version too!)
Reads the object files one at a time
Reads the relocation tables

Finds all the calls to __fentry__

Creates a table (array)

Links them back into the object file

New section called __mcount_loc
– Even for __fentry__ locations

gcc 5 added -mrecord-mcount (to do this for us)
17©2019 VMware, Inc.
recordmcount (kernel/sched/core.o)
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
<schedule>:
callq <__fentry__>
[..]
<yield>:
callq <__fentry__>
[..]
<preempt_schedule_common>:
callq <__fentry__>
[..]
<_cond_resched>:
callq <__fentry__>
[..]
<schedule_idle>:
callq <__fentry__>
[..]
18©2019 VMware, Inc.
recordmcount (kernel/sched/core.o)
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
<schedule>:
callq <__fentry__>
[..]
<yield>:
callq <__fentry__>
[..]
<preempt_schedule_common>:
callq <__fentry__>
[..]
<_cond_resched>:
callq <__fentry__>
[..]
<schedule_idle>:
callq <__fentry__>
[..]
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
19©2019 VMware, Inc.
Linker Magic!
vmlinux.lds

include/asm-generic/vmlinux.lds.h
Magic Variables

__start_mcount_loc

__stop_mcount_loc
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
#ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY
#define MCOUNT_REC() . = ALIGN(8); 
__start_mcount_loc = .; 
KEEP(*(__patchable_function_entries)) 
__stop_mcount_loc = .;
#else
#define MCOUNT_REC() . = ALIGN(8); 
__start_mcount_loc = .; 
KEEP(*(__mcount_loc)) 
__stop_mcount_loc = .;
#endif
#else
#define MCOUNT_REC()
#endif
20©2019 VMware, Inc.
Linker Magic!
vmlinux.lds

include/asm-generic/vmlinux.lds.h
Magic Variables

__start_mcount_loc

__stop_mcount_loc
parisc architecture
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
#ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY
#define MCOUNT_REC() . = ALIGN(8); 
__start_mcount_loc = .; 
KEEP(*(__patchable_function_entries)) 
__stop_mcount_loc = .;
#else
#define MCOUNT_REC() . = ALIGN(8); 
__start_mcount_loc = .; 
KEEP(*(__mcount_loc)) 
__stop_mcount_loc = .;
#endif
#else
#define MCOUNT_REC()
#endif
21©2019 VMware, Inc.
Linker Magic
vmlinux:
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
<__mcount_loc>:
&__put_page
&put_pages_list
&__activate_page
&activate_page
&lru_cache_add
<__mcount_loc>:
&vfs_llseek
&default_llseek
&new_sync_read
&new_sync_write
&__vfs_write
kernel/sched/core.o:
mm/swap.o:
fs/read_write.o:
22©2019 VMware, Inc.
Linker Magic
<__start_mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
&__put_page
&put_pages_list
&__activate_page
&activate_page
&lru_cache_add
&vfs_llseek
&default_llseek
&new_sync_read
&new_sync_write
&__vfs_write
[...]
<__stop_mcount_loc>:
vmlinux:
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
<__mcount_loc>:
&__put_page
&put_pages_list
&__activate_page
&activate_page
&lru_cache_add
<__mcount_loc>:
&vfs_llseek
&default_llseek
&new_sync_read
&new_sync_write
&__vfs_write
kernel/sched/core.o:
mm/swap.o:
fs/read_write.o:
23©2019 VMware, Inc.
Linker Magic
<__start_mcount_loc>:
0xffffffff81a14910
0xffffffff81a149b0
0xffffffff81a14c00
0xffffffff81a14c20
0xffffffff81a14c50
0xffffffff8126f7b0
0xffffffff8126f8f0
0xffffffff8126fcc0
0xffffffff81270440
0xffffffff81270690
0xffffffff8131f0f0
0xffffffff8131f120
0xffffffff8131fb40
0xffffffff8131fd00
0xffffffff8131fed0
[...]
<__stop_mcount_loc>:
vmlinux:
<__mcount_loc>:
&schedule
&yield
&preempt_schedule_common
&_cond_resched
&schedule_idle
<__mcount_loc>:
&__put_page
&put_pages_list
&__activate_page
&activate_page
&lru_cache_add
<__mcount_loc>:
&vfs_llseek
&default_llseek
&new_sync_read
&new_sync_write
&__vfs_write
kernel/sched/core.o:
mm/swap.o:
fs/read_write.o:
24©2019 VMware, Inc.
Finding __fentry__
<schedule>:
callq <__fentry__>
[..]
<yield>:
callq <__fentry__>
[..]
<preempt_schedule_common>:
callq <__fentry__>
[..]
<_cond_resched>:
callq <__fentry__>
[..]
<schedule_idle>:
callq <__fentry__>
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
vmlinux:
25©2019 VMware, Inc.
<schedule>:
callq <__fentry__>
[..]
<yield>:
callq <__fentry__>
[..]
<preempt_schedule_common>:
callq <__fentry__>
[..]
<_cond_resched>:
callq <__fentry__>
[..]
<schedule_idle>:
callq <__fentry__>
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
Finding __fentry__
vmlinux:
26©2019 VMware, Inc.
Finding __fentry__
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
vmlinux:
27©2019 VMware, Inc.
Finding __fentry__
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
vmlinux:
gcc 5 also added -mnop-mcount
28©2019 VMware, Inc.
Finding __fentry__
vmlinux:
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
29©2019 VMware, Inc.
Finding __fentry__
vmlinux:
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
30©2019 VMware, Inc.
What about Tracing?
Need to know where to enable tracing
We threw away the __mcount_loc section
31©2019 VMware, Inc.
What about Tracing?
Need to know where to enable tracing
We threw away the __mcount_loc section

The __mcount_loc section isn’t enough for us

Tracing requires saving state
32©2019 VMware, Inc.
struct dyn_ftrace
struct dyn_ftrace {
unsigned long ip; /* address of mcount call-site */
unsigned long flags;
struct dyn_arch_ftrace arch;
};
33©2019 VMware, Inc.
struct dyn_ftrace
struct dyn_ftrace {
unsigned long ip; /* address of mcount call-site */
unsigned long flags;
struct dyn_arch_ftrace arch;
};
arch/x86/include/asm/ftrace.h:
struct dyn_arch_ftrace {
/* No extra data needed for x86 */
};
34©2019 VMware, Inc.
struct dyn_ftrace
struct dyn_ftrace {
unsigned long ip; /* address of mcount call-site */
unsigned long flags;
struct dyn_arch_ftrace arch;
};
arch/powerpc/include/asm/ftrace.h:
struct dyn_arch_ftrace {
struct module *mod;
};
35©2019 VMware, Inc.
Tracing data
Copy from __mcount_loc before deleting that section
Sorted for quick lookup
Allocated in groups of pages

details out of scope for this talk
Data reported at boot up
– Allocated 39,317 dyn_ftrace structures
– Used up 154 (4K) pages
– Total of 630,784 bytes of memory
$ dmesg |grep ftrace
[ 0.528844] ftrace: allocating 39317 entries in 154 pages
$ uname -r
5.1.11-200.fc29.x86_64
36©2019 VMware, Inc.
Finding __fentry__
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
<__start_mcount_loc>:
[...]
<__stop_mcount_loc>:
vmlinux: ip = 0xffffffff81a14910
flags = 0
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
ip = 0xffffffff8131fed0
flags = 0
<ftrace_pages>
37©2019 VMware, Inc.
Finding __fentry__
ip = 0xffffffff81a14910
flags = 0
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
<ftrace_pages>
# cat available_filter_functions
schedule
yield
preempt_schedule_common
_cond_resched
schedule_idle
__put_page
put_pages_list
__activate_page
activate_page
lru_cache_add
vfs_llseek
default_llseek
new_sync_read
new_sync_write
38©2019 VMware, Inc.
Finding __fentry__
ip = 0xffffffff81a14910
flags = 0
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
<ftrace_pages>
# echo default_llseek > set_ftrace_filter
# echo sched_idle >> set_ftrace_filtre
# cat set_ftrace_filter
schedule_idle
default_llseek
39©2019 VMware, Inc.
dyn_ftrace.flags
Bits 0-24: Counter for number of callbacks registered to function
Bit 25: Function is being initialized and not ready to touch

module init
Bit 26: Return from callback may modify IP address

kprobe or live patching
Bit 27: Has unique trampoline and its enabled
Bit 28: Has unique trampoline
Bit 29: Saves regs is enabled (see bit 30)
Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does)
Bit 31: The function is being traced
40©2019 VMware, Inc.
dyn_ftrace.flags
Bits 0-24: Counter for number of callbacks registered to function
Bit 25: Function is being initialized and not ready to touch

module init
Bit 26: Return from callback may modify IP address

kprobe or live patching
Bit 27: Has unique trampoline and its enabled
Bit 28: Has unique trampoline
Bit 29: Saves regs is enabled (see bit 30)
Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does)
Bit 31: The function is being traced
41©2019 VMware, Inc.
Finding __fentry__
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
vmlinux: ip = 0xffffffff81a14910
flags = 0
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
ip = 0xffffffff8131fed0
flags = 0
<ftrace_pages>
42©2019 VMware, Inc.
Finding __fentry__
vmlinux: ip = 0xffffffff81a14910
flags = 0x40000001
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0x00000001
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
ip = 0xffffffff8131fed0
flags = 0
<ftrace_pages>
<schedule>:
nop
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
bit 30
count = 1
count = 1
43©2019 VMware, Inc.
Finding __fentry__
<schedule>:
call ftrace_regs_caller
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
call ftrace_caller
[..]
vmlinux: ip = 0xffffffff81a14910
flags = 0xe0000001
ip = 0xffffffff81a149b0
flags = 0
ip = 0xffffffff81a14c00
flags = 0
ip = 0xffffffff81a14c20
flags = 0
ip = 0xffffffff81a14c50
flags = 0
ip = 0xffffffff8126f7b0
flags = 0x80000001
ip = 0xffffffff8126f8f0
flags = 0
ip = 0xffffffff8126fcc0
flags = 0
ip = 0xffffffff81270440
flags = 0
ip = 0xffffffff81270690
flags = 0
ip = 0xffffffff8131f0f0
flags = 0
ip = 0xffffffff8131f120
flags = 0
ip = 0xffffffff8131fb40
flags = 0
ip = 0xffffffff8131fd00
flags = 0
ip = 0xffffffff8131fed0
flags = 0
<ftrace_pages>
bit 29,30,31
count = 1
bit 31
count = 1
44©2019 VMware, Inc.
Modifying code at runtime!
Not the same as at boot up
SMP boxes need to take extra care
Other CPUs may be executing the code you change
x86 has non uniform instruction (different sizes)
Instructions may cross cache and page boundaries
45©2019 VMware, Inc.
Modifying code at runtime!
<schedule>:
0f 1f 44 00 00 nop
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
46©2019 VMware, Inc.
Modifying code at runtime!
<schedule>:
e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__>
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
47©2019 VMware, Inc.
Modifying code at runtime!
<schedule>:
0f 1f 44 00 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
<schedule>:
0f 1f 44 00 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
CPU 0 CPU 1
48©2019 VMware, Inc.
Modifying code at runtime!
<schedule>:
0f 1f 44 00 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
<schedule>:
e8 1b d0 1e 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
CPU 0 CPU 1
49©2019 VMware, Inc.
Modifying code at runtime!
<schedule>:
0f 1f d0 1e 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
<schedule>:
e8 1b d0 1e 00
53
65 48 8b 1c 25 00 61
01 00
48 8b 43 10
48 85 c0
CPU 0 CPU 1
50©2019 VMware, Inc.
0f 1f d0 1e 00 ???
0f 1f d0 1e 00
51©2019 VMware, Inc.
0f 1f d0 1e 00 ???
BOOM!
CRASH!
General Protection Fault!
REBOOT!
52©2019 VMware, Inc.
How to go from this!
<schedule>:
0f 1f 44 00 00 nop
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
53©2019 VMware, Inc.
To this?
<schedule>:
e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__>
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
54©2019 VMware, Inc.
Breakpoints!
55©2019 VMware, Inc.
Breakpoints!
<schedule>:
0f 1f 44 00 00 nop
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
56©2019 VMware, Inc.
Breakpoints!
<schedule>:
<cc> 1f 44 00 00 <int3>nop
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
57©2019 VMware, Inc.
How this works
<schedule>:
<int3>nop
push %rbx
mov %gs:0x16100,%rbx
mov 0x10(%rbx),%rax
test %rax,%rax
58©2019 VMware, Inc.
How this works
<schedule>:
<int3>nop
push %rbx
mov %gs:0x16100,%rbx
mov 0x10(%rbx),%rax
test %rax,%rax
do_int3(struct pt_regs *regs) {
regs->ip += 5;
return
}
59©2019 VMware, Inc.
How this works
<schedule>:
<int3>nop
push %rbx
mov %gs:0x16100,%rbx
mov 0x10(%rbx),%rax
test %rax,%rax
do_int3(struct pt_regs *regs) {
regs->ip += 5;
return
}
60©2019 VMware, Inc.
How this works
<schedule>:
<int3>nop
push %rbx
mov %gs:0x16100,%rbx
mov 0x10(%rbx),%rax
test %rax,%rax
do_int3(struct pt_regs *regs) {
regs->ip += 5;
return
}
61©2019 VMware, Inc.
How this works
<schedule>:
<int3>nop
push %rbx
mov %gs:0x16100,%rbx
mov 0x10(%rbx),%rax
test %rax,%rax
do_int3(struct pt_regs *regs) {
regs->ip += 5;
return
}
62©2019 VMware, Inc.
Breakpoints!
<schedule>:
<cc> 1f 44 00 00 <int3>nop
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
63©2019 VMware, Inc.
Breakpoints!
<schedule>:
<cc>1b d0 1e 00 <int3>callq ffffffff81c01930 <__fentry__>
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
64©2019 VMware, Inc.
Breakpoints!
<schedule>:
e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__>
53 push %rbx
65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx
01 00
ffffffff81a1491b: R_X86_64_32S current_task
48 8b 43 10 mov 0x10(%rbx),%rax
48 85 c0 test %rax,%rax
74 10 je ffffffff81a14938 <schedule+0x28>
f6 43 24 20 testb $0x20,0x24(%rbx)
75 49 jne ffffffff81a14977 <schedule+0x67>
48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx)
00
74 1f je ffffffff81a14957 <schedule+0x47>
31 ff xor %edi,%edi
e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule>
65 48 8b 04 25 00 61 mov %gs:0x16100,%rax
01 00
65©2019 VMware, Inc.
Registering a callback with ftrace
Call register_ftrace_function()
Takes a ftrace_ops descriptor
Static ftrace_ops (allocated at build time)

Top level ftrace tracers
– function
– function_graph
– stack tracer
– latency tracers
Dynamic ftrace_ops (allocated via kmalloc() )

perf

kprobes

ftrace instances (sub buffers)
66©2019 VMware, Inc.
ftrace_ops structure
struct ftrace_ops {
ftrace_func_t func;
struct ftrace_ops __rcu *next;
unsigned long flags;
void *private;
ftrace_func_t saved_func;
#ifdef CONFIG_DYNAMIC_FTRACE
struct ftrace_ops_hash local_hash;
struct ftrace_ops_hash *func_hash;
struct ftrace_ops_hash old_hash;
unsigned long trampoline;
unsigned long trampoline_size;
#endif
};
67©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call ftrace_stub
restore_regs
ftrace_stub:
retq
68©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call func_trace
restore_regs
ftrace_stub:
retq
void func_trace() {
/* trace */
}
69©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call func_trace
restore_regs
ftrace_stub:
retq
void func_trace() {
/* trace */
}
ftrace_ops.func
70©2019 VMware, Inc.
Calling more that one callback on a function?
Direct calls to a single function are easy
Handling more than one, requires a list operation
But then all functions being traced will go through a list!
71©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
nop
[..]
<preempt_schedule_common>:
nop
[..]
<_cond_resched>:
nop
[..]
<schedule_idle>:
nop
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call list_func
restore_regs
ftrace_stub:
retq
void list_func() {
/* iterate */
}
void func1_func() {
/* trace */
}
void func2_func() {
/* trace */
}
72©2019 VMware, Inc.
Multiple function callback example
Run function tracer on all functions
Run perf on just the scheduler
73©2019 VMware, Inc.
Multiple function callback example
Want to trace
schedule_idle()?
NO
list_func()
perf
Yes!
function tracer
74©2019 VMware, Inc.
Multiple function callback example
Want to trace
__cond_resched()?
NO
list_func()
perf
Yes!
function tracer
75©2019 VMware, Inc.
Multiple function callback example
Want to trace
yield()?
NO
list_func()
perf
Yes!
function tracer
76©2019 VMware, Inc.
Multiple function callback example
Want to trace
schedule()?
Yes!
list_func()
perf
Yes!
function tracer
77©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
callq ftrace_caller
[..]
<preempt_schedule_common>:
callq ftrace_caller
[..]
<_cond_resched>:
callq ftrace_caller
[..]
<schedule_idle>:
callq ftrace_caller
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call list_func
restore_regs
ftrace_stub:
retq
void list_func() {
/* iterate */
}
void function_trace()
{
/* function tracing */
}
void perf_func()
{
/* function
profiling */
}
78©2019 VMware, Inc.
ftrace_caller trampoline
<schedule>:
callq ftrace_caller
[..]
<yield>:
callq dynamic_trampoline
[..]
<preempt_schedule_common>:
callq dynamic_trampoline
[..]
<_cond_resched>:
callq dynamic_trampoline
[..]
<schedule_idle>:
callq dynamic_trampoline
[..]
vmlinux:
<ftrace_caller>:
save_regs
load_regs
ftrace_call:
call list_func
restore_regs
ftrace_stub:
retq
void list_func() {
/* iterate */
}
void function_trace()
{
/* function tracing */
}
void perf_func()
{
/* function
profiling */
}
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
79©2019 VMware, Inc.
Problems with dynamic trampolines
When can you free them?
How do you know they are still not in use?
80©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
callq dynamic_trampoline
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
81©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
callq dynamic_trampoline
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
82©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
callq dynamic_trampoline
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
83©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
callq dynamic_trampoline
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
84©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
85©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
kfree(dynamic_trampoline)
86©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Scheduled
87©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
88©2019 VMware, Inc.
Dynamic Trampoline Problem
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
CRASH!
89©2019 VMware, Inc.
Problems with dynamic trampolines
When can you free them?
How do you know they are still not in use?
90©2019 VMware, Inc.
Problems with dynamic trampolines
When can you free them?
How do you know they are still not in use?
Use RCU!
91©2019 VMware, Inc.
call_rcu_tasks()
Added in Linux v3.18

Commit 8315f42295d2667 by Paul E. McKenney
synchronize_rcu_tasks()

Waits for all tasks to voluntary schedule

We do not allow ftrace callbacks to schedule

The trampoline will not schedule
92©2019 VMware, Inc.
call_rcu_tasks()
Added in Linux v3.18

Commit 8315f42295d2667 by Paul E. McKenney
synchronize_rcu_tasks()

Waits for all tasks to voluntary schedule

We do not allow ftrace callbacks to schedule

The trampoline will not schedule
Used by ftrace in v4.12
93©2019 VMware, Inc.
call_rcu_tasks()
Added in Linux v3.18

Commit 8315f42295d2667 by Paul E. McKenney
synchronize_rcu_tasks()

Waits for all tasks to voluntary schedule

We do not allow ftrace callbacks to schedule

The trampoline will not schedule
Used by ftrace in v4.12

Yes Steven was lazy

Added with the threat that Paul was going to remove it
94©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
95©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
call_rcu_tasks(dynamic_trampoline)
96©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Preempted!
call_rcu_tasks(dynamic_trampoline)
Wait’s for all tasks to voluntarily schedule
97©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
Scheduled
call_rcu_tasks(dynamic_trampoline)
Wait’s for all tasks to voluntarily schedule
98©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
call_rcu_tasks(dynamic_trampoline)
Wait’s for all tasks to voluntarily schedule
99©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
call_rcu_tasks(dynamic_trampoline)
Wait’s for all tasks to voluntarily schedule
100©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
call_rcu_tasks(dynamic_trampoline)
All tasks have scheduled
101©2019 VMware, Inc.
Dynamic Trampoline Solution
<schedule>:
nop
push %rbx
mov %gs:0x16100,%rbx
vmlinux:
<dynamic_trampoline>:
save_regs
load_regs
ftrace_call:
call function_trace
restore_regs
ftrace_stub:
retq
kfree(dynamic_trampoline)
102©2019 VMware, Inc.
More uses of the function callback code
ftrace_regs_caller() gives all registers
A callback can modify any register

Needs a flag in ftrace_ops to modify the instruction pointer (ip)
103©2019 VMware, Inc.
Live Kernel Patching!
<schedule>:
callq ftrace_caller
[..]
Buggy schedule() function
<ftrace_caller>:
save_regs
load_regs
call kernel_patch
restore_regs
retq
void kernel_patch()
{
regs.ip = schedule_fix;
}
<schedule_fix>:
nop
[..]
Fixed schedule() function
Thank You

More Related Content

What's hot

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...Adrian Huang
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Function Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver인구 강
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingEric Lin
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototypingYan Vugenfirer
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal BootloaderSatpal Parmar
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals哲豪 康哲豪
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 

What's hot (20)

malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Function Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 

Similar to Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started

Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Vincenzo Iozzo
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTPMustafa TURAN
 
Real World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js ApplicationsReal World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js ApplicationsBen Hall
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
SaltConf14 - Ben Cane - Using SaltStack in High Availability Environments
SaltConf14 - Ben Cane - Using SaltStack in High Availability EnvironmentsSaltConf14 - Ben Cane - Using SaltStack in High Availability Environments
SaltConf14 - Ben Cane - Using SaltStack in High Availability EnvironmentsSaltStack
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423Giulio Vian
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStormShu Sugimoto
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Gavin Guo
 
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?Embedded Recipes 2019 - RT is about to make it to mainline. Now what?
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?Anne Nicolas
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 
Joanna Rutkowska Subverting Vista Kernel
Joanna Rutkowska   Subverting Vista KernelJoanna Rutkowska   Subverting Vista Kernel
Joanna Rutkowska Subverting Vista Kernelguestf1a032
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in RustInfluxData
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabricandymccurdy
 

Similar to Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started (20)

Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
4 Sessions
4 Sessions4 Sessions
4 Sessions
 
Real World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js ApplicationsReal World Lessons on the Pain Points of Node.js Applications
Real World Lessons on the Pain Points of Node.js Applications
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
SaltConf14 - Ben Cane - Using SaltStack in High Availability Environments
SaltConf14 - Ben Cane - Using SaltStack in High Availability EnvironmentsSaltConf14 - Ben Cane - Using SaltStack in High Availability Environments
SaltConf14 - Ben Cane - Using SaltStack in High Availability Environments
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Cooking pies with Celery
Cooking pies with CeleryCooking pies with Celery
Cooking pies with Celery
 
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423
Infrastructure as Code in your CD pipelines - London Microsoft DevOps 0423
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Practical Operation Automation with StackStorm
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
 
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...
 
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?Embedded Recipes 2019 - RT is about to make it to mainline. Now what?
Embedded Recipes 2019 - RT is about to make it to mainline. Now what?
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
Joanna Rutkowska Subverting Vista Kernel
Joanna Rutkowska   Subverting Vista KernelJoanna Rutkowska   Subverting Vista Kernel
Joanna Rutkowska Subverting Vista Kernel
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabric
 

More from Anne Nicolas

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstAnne Nicolas
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIAnne Nicolas
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyAnne Nicolas
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureAnne Nicolas
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Anne Nicolas
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataAnne Nicolas
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Anne Nicolas
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxAnne Nicolas
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialAnne Nicolas
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconAnne Nicolas
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureAnne Nicolas
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayAnne Nicolas
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerAnne Nicolas
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationAnne Nicolas
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaAnne Nicolas
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPAnne Nicolas
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Anne Nicolas
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyAnne Nicolas
 

More from Anne Nicolas (20)

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easy
 

Recently uploaded

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 

Recently uploaded (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 

Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started

  • 1. ©2019 VMware, Inc. ftrace Where modifying a running kernel all started! Steven Rostedt Open Source Engineer rostedt@goodmis.org / srostedt@vmware.com
  • 2. 2©2019 VMware, Inc. Ftrace Function hooks ● Allows attaching to a function in the kernel – Function Tracer – Function Graph Tracer – Perf – Stack Tracer – Kprobes – SystemTap – Pstore
  • 3. 3©2019 VMware, Inc. Function Tracing # cd /sys/kernel/tracing # echo function > current_tracer # cat trace # tracer: function # # entries-in-buffer/entries-written: 159693/4101675 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | cat-3432 [002] d...... 60071.538270: __rcu_read_unlock <-__is_insn_slot_addr cat-3432 [002] d...... 60071.538270: is_bpf_text_address <-kernel_text_address cat-3432 [002] d...... 60071.538270: __rcu_read_lock <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: bpf_prog_kallsyms_find <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: __rcu_read_unlock <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: init_object <-alloc_debug_processing cat-3432 [002] d...... 60071.538271: deactivate_slab.isra.74 <-___slab_alloc cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538273: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: _raw_spin_lock <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: preempt_count_add <-_raw_spin_lock cat-3432 [002] d...1.. 60071.538273: do_raw_spin_trylock <-_raw_spin_lock
  • 4. 4©2019 VMware, Inc. Function Graph Tracing # cd /sys/kernel/tracing # echo function_graph > current_tracer # cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3) 0.868 us | } /* rt_spin_lock_slowlock_locked */ 3) | _raw_spin_unlock_irqrestore() { 3) 0.294 us | do_raw_spin_unlock(); 3) 0.374 us | preempt_count_sub(); 3) 1.542 us | } 3) 0.198 us | put_pid(); 3) 5.727 us | } /* rt_spin_lock_slowlock */ 3) + 18.867 us | } /* rt_spin_lock */ 3) | rt_spin_unlock() { 3) | rt_mutex_futex_unlock() { 3) | _raw_spin_lock_irqsave() { 3) 0.224 us | preempt_count_add(); 3) 0.376 us | do_raw_spin_trylock(); 3) 1.767 us | } 3) 0.264 us | __rt_mutex_unlock_common(); 3) | _raw_spin_unlock_irqrestore() { 3) 0.278 us | do_raw_spin_unlock(); 3) 0.249 us | preempt_count_sub(); 3) 1.421 us | } 3) 4.565 us | } 3) | migrate_enable() { 3) 0.275 us | preempt_count_add();
  • 5. 5©2019 VMware, Inc. Dynamic Function Tracing # cd /sys/kernel/tracing # echo ‘*sched*’ > set_ftrace_filter # echo function > current_tracer # cat trace # tracer: function # # entries-in-buffer/entries-written: 35104/35104 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | bash-1294 [000] d..h... 60276.948739: tick_sched_timer <-__hrtimer_run_queues bash-1294 [000] d..h... 60276.948741: tick_sched_do_timer <-tick_sched_timer bash-1294 [000] d..h... 60276.948743: tick_sched_handle <-tick_sched_timer bash-1294 [000] d..h... 60276.948745: rcu_sched_clock_irq <-update_process_times bash-1294 [000] d..h... 60276.948745: scheduler_tick <-update_process_times bash-1294 [000] d...2.. 60276.948754: resched_curr_lazy <-check_preempt_wakeup bash-1294 [000] d.L.... 60276.948756: preempt_schedule_irq <- restore_regs_and_return_to_kernel ksoftirqd/0-9 [000] ....... 60276.948769: schedule <-smpboot_thread_fn bash-1294 [000] d...311 60276.948908: resched_curr <-check_preempt_curr bash-1294 [000] d...311 60276.948908: native_smp_send_reschedule <-check_preempt_curr <idle>-0 [003] dn..1.. 60276.948922: smp_reschedule_interrupt <-reschedule_interrupt <idle>-0 [003] dn..1.. 60276.948923: scheduler_ipi <-reschedule_interrupt
  • 6. 6©2019 VMware, Inc. How does it work? ● gcc’s profiler option: -pg – Adds a special “mcount” call to all non-inlined functions – mcount is a trampoline to jump to C code – All non-inlined functions call mcount near the beginning (after frame setup) – Requires frame pointers
  • 7. 7©2019 VMware, Inc. How does it work? ● gcc’s profiler option: -pg – Adds a special “mcount” call to all non-inlined functions – mcount is a trampoline to jump to C code – All non-inlined functions call mcount near the beginning (after frame setup) – Requires frame pointers ● x86 now only uses: -pg -mfentry – Adds a special “__fentry__” call to all non-inlined functions – __fentry__ is also a trampoline to jump to C code – All non-inlined function call __fentry__ at the beginning of the function – No need to have frame pointers
  • 8. 8©2019 VMware, Inc. A Function Call asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk = current; sched_submit_work(tsk); do { preempt_disable(); __schedule(false); sched_preempt_enable_no_resched(); } while (need_resched()); sched_update_worker(tsk); }
  • 9. 9©2019 VMware, Inc. WARNING! The following slides may not be suitable for some audiences
  • 10. 10©2019 VMware, Inc. WARNING! The next slide contains ASSEMBLY!
  • 11. 11©2019 VMware, Inc. Disassembled Function Call <schedule>: 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff819dbce6: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff819dbd03 <schedule+0x23> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff819dbd42 <schedule+0x62> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff819dbd22 <schedule+0x42> 31 ff xor %edi,%edi e8 a6 f8 ff ff callq ffffffff819db5b0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 12. 12©2019 VMware, Inc. Disassembled Function Call <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> ffffffff81a14911: R_X86_64_PLT32 __fentry__-0x4 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00 With -pg -mfentry options
  • 13. 13©2019 VMware, Inc. At Kernel Boot Up <schedule>: callq <__fentry__> push %rbx <__fentry__>: retq
  • 14. 14©2019 VMware, Inc. Where are all the __fentry__ callers? Can’t just leave them there  Too much overhead  Just calling and doing a return adds 13% overhead! Need to convert them to nops at boot up Need to know where they are Best to find them at compile time!
  • 15. 15©2019 VMware, Inc. recordmcount scripts/recordmcount.c (and there’s a perl version too!) Reads the object files one at a time Reads the relocation tables  Finds all the calls to __fentry__  Creates a table (array)  Links them back into the object file  New section called __mcount_loc – Even for __fentry__ locations
  • 16. 16©2019 VMware, Inc. recordmcount scripts/recordmcount.c (and there’s a perl version too!) Reads the object files one at a time Reads the relocation tables  Finds all the calls to __fentry__  Creates a table (array)  Links them back into the object file  New section called __mcount_loc – Even for __fentry__ locations  gcc 5 added -mrecord-mcount (to do this for us)
  • 17. 17©2019 VMware, Inc. recordmcount (kernel/sched/core.o) <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..]
  • 18. 18©2019 VMware, Inc. recordmcount (kernel/sched/core.o) <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle
  • 19. 19©2019 VMware, Inc. Linker Magic! vmlinux.lds  include/asm-generic/vmlinux.lds.h Magic Variables  __start_mcount_loc  __stop_mcount_loc #ifdef CONFIG_FTRACE_MCOUNT_RECORD #ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__patchable_function_entries)) __stop_mcount_loc = .; #else #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__mcount_loc)) __stop_mcount_loc = .; #endif #else #define MCOUNT_REC() #endif
  • 20. 20©2019 VMware, Inc. Linker Magic! vmlinux.lds  include/asm-generic/vmlinux.lds.h Magic Variables  __start_mcount_loc  __stop_mcount_loc parisc architecture #ifdef CONFIG_FTRACE_MCOUNT_RECORD #ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__patchable_function_entries)) __stop_mcount_loc = .; #else #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__mcount_loc)) __stop_mcount_loc = .; #endif #else #define MCOUNT_REC() #endif
  • 21. 21©2019 VMware, Inc. Linker Magic vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  • 22. 22©2019 VMware, Inc. Linker Magic <__start_mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write [...] <__stop_mcount_loc>: vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  • 23. 23©2019 VMware, Inc. Linker Magic <__start_mcount_loc>: 0xffffffff81a14910 0xffffffff81a149b0 0xffffffff81a14c00 0xffffffff81a14c20 0xffffffff81a14c50 0xffffffff8126f7b0 0xffffffff8126f8f0 0xffffffff8126fcc0 0xffffffff81270440 0xffffffff81270690 0xffffffff8131f0f0 0xffffffff8131f120 0xffffffff8131fb40 0xffffffff8131fd00 0xffffffff8131fed0 [...] <__stop_mcount_loc>: vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  • 24. 24©2019 VMware, Inc. Finding __fentry__ <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux:
  • 25. 25©2019 VMware, Inc. <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: Finding __fentry__ vmlinux:
  • 26. 26©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux:
  • 27. 27©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux: gcc 5 also added -mnop-mcount
  • 28. 28©2019 VMware, Inc. Finding __fentry__ vmlinux: <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>:
  • 29. 29©2019 VMware, Inc. Finding __fentry__ vmlinux: <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..]
  • 30. 30©2019 VMware, Inc. What about Tracing? Need to know where to enable tracing We threw away the __mcount_loc section
  • 31. 31©2019 VMware, Inc. What about Tracing? Need to know where to enable tracing We threw away the __mcount_loc section  The __mcount_loc section isn’t enough for us  Tracing requires saving state
  • 32. 32©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; };
  • 33. 33©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; }; arch/x86/include/asm/ftrace.h: struct dyn_arch_ftrace { /* No extra data needed for x86 */ };
  • 34. 34©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; }; arch/powerpc/include/asm/ftrace.h: struct dyn_arch_ftrace { struct module *mod; };
  • 35. 35©2019 VMware, Inc. Tracing data Copy from __mcount_loc before deleting that section Sorted for quick lookup Allocated in groups of pages  details out of scope for this talk Data reported at boot up – Allocated 39,317 dyn_ftrace structures – Used up 154 (4K) pages – Total of 630,784 bytes of memory $ dmesg |grep ftrace [ 0.528844] ftrace: allocating 39317 entries in 154 pages $ uname -r 5.1.11-200.fc29.x86_64
  • 36. 36©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux: ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages>
  • 37. 37©2019 VMware, Inc. Finding __fentry__ ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 <ftrace_pages> # cat available_filter_functions schedule yield preempt_schedule_common _cond_resched schedule_idle __put_page put_pages_list __activate_page activate_page lru_cache_add vfs_llseek default_llseek new_sync_read new_sync_write
  • 38. 38©2019 VMware, Inc. Finding __fentry__ ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 <ftrace_pages> # echo default_llseek > set_ftrace_filter # echo sched_idle >> set_ftrace_filtre # cat set_ftrace_filter schedule_idle default_llseek
  • 39. 39©2019 VMware, Inc. dyn_ftrace.flags Bits 0-24: Counter for number of callbacks registered to function Bit 25: Function is being initialized and not ready to touch  module init Bit 26: Return from callback may modify IP address  kprobe or live patching Bit 27: Has unique trampoline and its enabled Bit 28: Has unique trampoline Bit 29: Saves regs is enabled (see bit 30) Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does) Bit 31: The function is being traced
  • 40. 40©2019 VMware, Inc. dyn_ftrace.flags Bits 0-24: Counter for number of callbacks registered to function Bit 25: Function is being initialized and not ready to touch  module init Bit 26: Return from callback may modify IP address  kprobe or live patching Bit 27: Has unique trampoline and its enabled Bit 28: Has unique trampoline Bit 29: Saves regs is enabled (see bit 30) Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does) Bit 31: The function is being traced
  • 41. 41©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages>
  • 42. 42©2019 VMware, Inc. Finding __fentry__ vmlinux: ip = 0xffffffff81a14910 flags = 0x40000001 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0x00000001 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages> <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] bit 30 count = 1 count = 1
  • 43. 43©2019 VMware, Inc. Finding __fentry__ <schedule>: call ftrace_regs_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: call ftrace_caller [..] vmlinux: ip = 0xffffffff81a14910 flags = 0xe0000001 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0x80000001 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages> bit 29,30,31 count = 1 bit 31 count = 1
  • 44. 44©2019 VMware, Inc. Modifying code at runtime! Not the same as at boot up SMP boxes need to take extra care Other CPUs may be executing the code you change x86 has non uniform instruction (different sizes) Instructions may cross cache and page boundaries
  • 45. 45©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 46. 46©2019 VMware, Inc. Modifying code at runtime! <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 47. 47©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  • 48. 48©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: e8 1b d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  • 49. 49©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: e8 1b d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  • 50. 50©2019 VMware, Inc. 0f 1f d0 1e 00 ??? 0f 1f d0 1e 00
  • 51. 51©2019 VMware, Inc. 0f 1f d0 1e 00 ??? BOOM! CRASH! General Protection Fault! REBOOT!
  • 52. 52©2019 VMware, Inc. How to go from this! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 53. 53©2019 VMware, Inc. To this? <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 55. 55©2019 VMware, Inc. Breakpoints! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 56. 56©2019 VMware, Inc. Breakpoints! <schedule>: <cc> 1f 44 00 00 <int3>nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 57. 57©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax
  • 58. 58©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  • 59. 59©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  • 60. 60©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  • 61. 61©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  • 62. 62©2019 VMware, Inc. Breakpoints! <schedule>: <cc> 1f 44 00 00 <int3>nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 63. 63©2019 VMware, Inc. Breakpoints! <schedule>: <cc>1b d0 1e 00 <int3>callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 64. 64©2019 VMware, Inc. Breakpoints! <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  • 65. 65©2019 VMware, Inc. Registering a callback with ftrace Call register_ftrace_function() Takes a ftrace_ops descriptor Static ftrace_ops (allocated at build time)  Top level ftrace tracers – function – function_graph – stack tracer – latency tracers Dynamic ftrace_ops (allocated via kmalloc() )  perf  kprobes  ftrace instances (sub buffers)
  • 66. 66©2019 VMware, Inc. ftrace_ops structure struct ftrace_ops { ftrace_func_t func; struct ftrace_ops __rcu *next; unsigned long flags; void *private; ftrace_func_t saved_func; #ifdef CONFIG_DYNAMIC_FTRACE struct ftrace_ops_hash local_hash; struct ftrace_ops_hash *func_hash; struct ftrace_ops_hash old_hash; unsigned long trampoline; unsigned long trampoline_size; #endif };
  • 67. 67©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call ftrace_stub restore_regs ftrace_stub: retq
  • 68. 68©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call func_trace restore_regs ftrace_stub: retq void func_trace() { /* trace */ }
  • 69. 69©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call func_trace restore_regs ftrace_stub: retq void func_trace() { /* trace */ } ftrace_ops.func
  • 70. 70©2019 VMware, Inc. Calling more that one callback on a function? Direct calls to a single function are easy Handling more than one, requires a list operation But then all functions being traced will go through a list!
  • 71. 71©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void func1_func() { /* trace */ } void func2_func() { /* trace */ }
  • 72. 72©2019 VMware, Inc. Multiple function callback example Run function tracer on all functions Run perf on just the scheduler
  • 73. 73©2019 VMware, Inc. Multiple function callback example Want to trace schedule_idle()? NO list_func() perf Yes! function tracer
  • 74. 74©2019 VMware, Inc. Multiple function callback example Want to trace __cond_resched()? NO list_func() perf Yes! function tracer
  • 75. 75©2019 VMware, Inc. Multiple function callback example Want to trace yield()? NO list_func() perf Yes! function tracer
  • 76. 76©2019 VMware, Inc. Multiple function callback example Want to trace schedule()? Yes! list_func() perf Yes! function tracer
  • 77. 77©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: callq ftrace_caller [..] <preempt_schedule_common>: callq ftrace_caller [..] <_cond_resched>: callq ftrace_caller [..] <schedule_idle>: callq ftrace_caller [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void function_trace() { /* function tracing */ } void perf_func() { /* function profiling */ }
  • 78. 78©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: callq dynamic_trampoline [..] <preempt_schedule_common>: callq dynamic_trampoline [..] <_cond_resched>: callq dynamic_trampoline [..] <schedule_idle>: callq dynamic_trampoline [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void function_trace() { /* function tracing */ } void perf_func() { /* function profiling */ } <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  • 79. 79©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use?
  • 80. 80©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  • 81. 81©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  • 82. 82©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  • 83. 83©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  • 84. 84©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  • 85. 85©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! kfree(dynamic_trampoline)
  • 86. 86©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Scheduled
  • 87. 87©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  • 88. 88©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq CRASH!
  • 89. 89©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use?
  • 90. 90©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use? Use RCU!
  • 91. 91©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule
  • 92. 92©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule Used by ftrace in v4.12
  • 93. 93©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule Used by ftrace in v4.12  Yes Steven was lazy  Added with the threat that Paul was going to remove it
  • 94. 94©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  • 95. 95©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! call_rcu_tasks(dynamic_trampoline)
  • 96. 96©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  • 97. 97©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Scheduled call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  • 98. 98©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  • 99. 99©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  • 100. 100©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) All tasks have scheduled
  • 101. 101©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq kfree(dynamic_trampoline)
  • 102. 102©2019 VMware, Inc. More uses of the function callback code ftrace_regs_caller() gives all registers A callback can modify any register  Needs a flag in ftrace_ops to modify the instruction pointer (ip)
  • 103. 103©2019 VMware, Inc. Live Kernel Patching! <schedule>: callq ftrace_caller [..] Buggy schedule() function <ftrace_caller>: save_regs load_regs call kernel_patch restore_regs retq void kernel_patch() { regs.ip = schedule_fix; } <schedule_fix>: nop [..] Fixed schedule() function