Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started

27,400 views

Published on

Ftrace’s most powerful feature is the function tracer (and function graph tracer which is built from it). But to have this enabled on production systems, it had to have its overhead be negligible when disabled. As the function tracer uses gcc’s profiling mechanism, which adds a call to “mcount” (or more recently fentry, don’t worry if you don’t know what this is, it will all be explained) at the start of almost all functions, it had to do something about the overhead that causes. The solution was to turn those calls into “nops” (an instruction that the CPU simply ignores). But this was no easy feat. It took a lot to come up with a solution (and also turning a few network cards into bricks). This talk will explain the history of how ftrace came about implementing the function tracer, and brought with it the possibility of static branches and soon static calls!

Steven Rostedt

Published in: Software
  • Free Healing Soundscape - Just click "Play" to immerse yourself in a beautiful "Sound Bath" that clears negativity, awakens abundance, and brings miracles into your life! Listen for yourself right now. ♥♥♥ http://scamcb.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • 7 Sacred "Sign Posts" From The Universe Revealed. Discover the "secret language" the Universes uses to send us guided messages and watch as your greatest desires manifest before your eyes. Claim your free report. ★★★ https://bit.ly/30Ju5r6
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started

  1. 1. ©2019 VMware, Inc. ftrace Where modifying a running kernel all started! Steven Rostedt Open Source Engineer rostedt@goodmis.org / srostedt@vmware.com
  2. 2. 2©2019 VMware, Inc. Ftrace Function hooks ● Allows attaching to a function in the kernel – Function Tracer – Function Graph Tracer – Perf – Stack Tracer – Kprobes – SystemTap – Pstore
  3. 3. 3©2019 VMware, Inc. Function Tracing # cd /sys/kernel/tracing # echo function > current_tracer # cat trace # tracer: function # # entries-in-buffer/entries-written: 159693/4101675 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | cat-3432 [002] d...... 60071.538270: __rcu_read_unlock <-__is_insn_slot_addr cat-3432 [002] d...... 60071.538270: is_bpf_text_address <-kernel_text_address cat-3432 [002] d...... 60071.538270: __rcu_read_lock <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: bpf_prog_kallsyms_find <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: __rcu_read_unlock <-is_bpf_text_address cat-3432 [002] d...... 60071.538271: init_object <-alloc_debug_processing cat-3432 [002] d...... 60071.538271: deactivate_slab.isra.74 <-___slab_alloc cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538272: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538272: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: preempt_count_add <-deactivate_slab.isra.74 cat-3432 [002] d...1.. 60071.538273: preempt_count_sub <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: _raw_spin_lock <-deactivate_slab.isra.74 cat-3432 [002] d...... 60071.538273: preempt_count_add <-_raw_spin_lock cat-3432 [002] d...1.. 60071.538273: do_raw_spin_trylock <-_raw_spin_lock
  4. 4. 4©2019 VMware, Inc. Function Graph Tracing # cd /sys/kernel/tracing # echo function_graph > current_tracer # cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3) 0.868 us | } /* rt_spin_lock_slowlock_locked */ 3) | _raw_spin_unlock_irqrestore() { 3) 0.294 us | do_raw_spin_unlock(); 3) 0.374 us | preempt_count_sub(); 3) 1.542 us | } 3) 0.198 us | put_pid(); 3) 5.727 us | } /* rt_spin_lock_slowlock */ 3) + 18.867 us | } /* rt_spin_lock */ 3) | rt_spin_unlock() { 3) | rt_mutex_futex_unlock() { 3) | _raw_spin_lock_irqsave() { 3) 0.224 us | preempt_count_add(); 3) 0.376 us | do_raw_spin_trylock(); 3) 1.767 us | } 3) 0.264 us | __rt_mutex_unlock_common(); 3) | _raw_spin_unlock_irqrestore() { 3) 0.278 us | do_raw_spin_unlock(); 3) 0.249 us | preempt_count_sub(); 3) 1.421 us | } 3) 4.565 us | } 3) | migrate_enable() { 3) 0.275 us | preempt_count_add();
  5. 5. 5©2019 VMware, Inc. Dynamic Function Tracing # cd /sys/kernel/tracing # echo ‘*sched*’ > set_ftrace_filter # echo function > current_tracer # cat trace # tracer: function # # entries-in-buffer/entries-written: 35104/35104 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # ||||/ delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | bash-1294 [000] d..h... 60276.948739: tick_sched_timer <-__hrtimer_run_queues bash-1294 [000] d..h... 60276.948741: tick_sched_do_timer <-tick_sched_timer bash-1294 [000] d..h... 60276.948743: tick_sched_handle <-tick_sched_timer bash-1294 [000] d..h... 60276.948745: rcu_sched_clock_irq <-update_process_times bash-1294 [000] d..h... 60276.948745: scheduler_tick <-update_process_times bash-1294 [000] d...2.. 60276.948754: resched_curr_lazy <-check_preempt_wakeup bash-1294 [000] d.L.... 60276.948756: preempt_schedule_irq <- restore_regs_and_return_to_kernel ksoftirqd/0-9 [000] ....... 60276.948769: schedule <-smpboot_thread_fn bash-1294 [000] d...311 60276.948908: resched_curr <-check_preempt_curr bash-1294 [000] d...311 60276.948908: native_smp_send_reschedule <-check_preempt_curr <idle>-0 [003] dn..1.. 60276.948922: smp_reschedule_interrupt <-reschedule_interrupt <idle>-0 [003] dn..1.. 60276.948923: scheduler_ipi <-reschedule_interrupt
  6. 6. 6©2019 VMware, Inc. How does it work? ● gcc’s profiler option: -pg – Adds a special “mcount” call to all non-inlined functions – mcount is a trampoline to jump to C code – All non-inlined functions call mcount near the beginning (after frame setup) – Requires frame pointers
  7. 7. 7©2019 VMware, Inc. How does it work? ● gcc’s profiler option: -pg – Adds a special “mcount” call to all non-inlined functions – mcount is a trampoline to jump to C code – All non-inlined functions call mcount near the beginning (after frame setup) – Requires frame pointers ● x86 now only uses: -pg -mfentry – Adds a special “__fentry__” call to all non-inlined functions – __fentry__ is also a trampoline to jump to C code – All non-inlined function call __fentry__ at the beginning of the function – No need to have frame pointers
  8. 8. 8©2019 VMware, Inc. A Function Call asmlinkage __visible void __sched schedule(void) { struct task_struct *tsk = current; sched_submit_work(tsk); do { preempt_disable(); __schedule(false); sched_preempt_enable_no_resched(); } while (need_resched()); sched_update_worker(tsk); }
  9. 9. 9©2019 VMware, Inc. WARNING! The following slides may not be suitable for some audiences
  10. 10. 10©2019 VMware, Inc. WARNING! The next slide contains ASSEMBLY!
  11. 11. 11©2019 VMware, Inc. Disassembled Function Call <schedule>: 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff819dbce6: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff819dbd03 <schedule+0x23> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff819dbd42 <schedule+0x62> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff819dbd22 <schedule+0x42> 31 ff xor %edi,%edi e8 a6 f8 ff ff callq ffffffff819db5b0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  12. 12. 12©2019 VMware, Inc. Disassembled Function Call <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> ffffffff81a14911: R_X86_64_PLT32 __fentry__-0x4 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00 With -pg -mfentry options
  13. 13. 13©2019 VMware, Inc. At Kernel Boot Up <schedule>: callq <__fentry__> push %rbx <__fentry__>: retq
  14. 14. 14©2019 VMware, Inc. Where are all the __fentry__ callers? Can’t just leave them there  Too much overhead  Just calling and doing a return adds 13% overhead! Need to convert them to nops at boot up Need to know where they are Best to find them at compile time!
  15. 15. 15©2019 VMware, Inc. recordmcount scripts/recordmcount.c (and there’s a perl version too!) Reads the object files one at a time Reads the relocation tables  Finds all the calls to __fentry__  Creates a table (array)  Links them back into the object file  New section called __mcount_loc – Even for __fentry__ locations
  16. 16. 16©2019 VMware, Inc. recordmcount scripts/recordmcount.c (and there’s a perl version too!) Reads the object files one at a time Reads the relocation tables  Finds all the calls to __fentry__  Creates a table (array)  Links them back into the object file  New section called __mcount_loc – Even for __fentry__ locations  gcc 5 added -mrecord-mcount (to do this for us)
  17. 17. 17©2019 VMware, Inc. recordmcount (kernel/sched/core.o) <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..]
  18. 18. 18©2019 VMware, Inc. recordmcount (kernel/sched/core.o) <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle
  19. 19. 19©2019 VMware, Inc. Linker Magic! vmlinux.lds  include/asm-generic/vmlinux.lds.h Magic Variables  __start_mcount_loc  __stop_mcount_loc #ifdef CONFIG_FTRACE_MCOUNT_RECORD #ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__patchable_function_entries)) __stop_mcount_loc = .; #else #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__mcount_loc)) __stop_mcount_loc = .; #endif #else #define MCOUNT_REC() #endif
  20. 20. 20©2019 VMware, Inc. Linker Magic! vmlinux.lds  include/asm-generic/vmlinux.lds.h Magic Variables  __start_mcount_loc  __stop_mcount_loc parisc architecture #ifdef CONFIG_FTRACE_MCOUNT_RECORD #ifdef CC_USING_PATCHABLE_FUNCTION_ENTRY #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__patchable_function_entries)) __stop_mcount_loc = .; #else #define MCOUNT_REC() . = ALIGN(8); __start_mcount_loc = .; KEEP(*(__mcount_loc)) __stop_mcount_loc = .; #endif #else #define MCOUNT_REC() #endif
  21. 21. 21©2019 VMware, Inc. Linker Magic vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  22. 22. 22©2019 VMware, Inc. Linker Magic <__start_mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write [...] <__stop_mcount_loc>: vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  23. 23. 23©2019 VMware, Inc. Linker Magic <__start_mcount_loc>: 0xffffffff81a14910 0xffffffff81a149b0 0xffffffff81a14c00 0xffffffff81a14c20 0xffffffff81a14c50 0xffffffff8126f7b0 0xffffffff8126f8f0 0xffffffff8126fcc0 0xffffffff81270440 0xffffffff81270690 0xffffffff8131f0f0 0xffffffff8131f120 0xffffffff8131fb40 0xffffffff8131fd00 0xffffffff8131fed0 [...] <__stop_mcount_loc>: vmlinux: <__mcount_loc>: &schedule &yield &preempt_schedule_common &_cond_resched &schedule_idle <__mcount_loc>: &__put_page &put_pages_list &__activate_page &activate_page &lru_cache_add <__mcount_loc>: &vfs_llseek &default_llseek &new_sync_read &new_sync_write &__vfs_write kernel/sched/core.o: mm/swap.o: fs/read_write.o:
  24. 24. 24©2019 VMware, Inc. Finding __fentry__ <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux:
  25. 25. 25©2019 VMware, Inc. <schedule>: callq <__fentry__> [..] <yield>: callq <__fentry__> [..] <preempt_schedule_common>: callq <__fentry__> [..] <_cond_resched>: callq <__fentry__> [..] <schedule_idle>: callq <__fentry__> [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: Finding __fentry__ vmlinux:
  26. 26. 26©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux:
  27. 27. 27©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux: gcc 5 also added -mnop-mcount
  28. 28. 28©2019 VMware, Inc. Finding __fentry__ vmlinux: <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>:
  29. 29. 29©2019 VMware, Inc. Finding __fentry__ vmlinux: <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..]
  30. 30. 30©2019 VMware, Inc. What about Tracing? Need to know where to enable tracing We threw away the __mcount_loc section
  31. 31. 31©2019 VMware, Inc. What about Tracing? Need to know where to enable tracing We threw away the __mcount_loc section  The __mcount_loc section isn’t enough for us  Tracing requires saving state
  32. 32. 32©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; };
  33. 33. 33©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; }; arch/x86/include/asm/ftrace.h: struct dyn_arch_ftrace { /* No extra data needed for x86 */ };
  34. 34. 34©2019 VMware, Inc. struct dyn_ftrace struct dyn_ftrace { unsigned long ip; /* address of mcount call-site */ unsigned long flags; struct dyn_arch_ftrace arch; }; arch/powerpc/include/asm/ftrace.h: struct dyn_arch_ftrace { struct module *mod; };
  35. 35. 35©2019 VMware, Inc. Tracing data Copy from __mcount_loc before deleting that section Sorted for quick lookup Allocated in groups of pages  details out of scope for this talk Data reported at boot up – Allocated 39,317 dyn_ftrace structures – Used up 154 (4K) pages – Total of 630,784 bytes of memory $ dmesg |grep ftrace [ 0.528844] ftrace: allocating 39317 entries in 154 pages $ uname -r 5.1.11-200.fc29.x86_64
  36. 36. 36©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] <__start_mcount_loc>: [...] <__stop_mcount_loc>: vmlinux: ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages>
  37. 37. 37©2019 VMware, Inc. Finding __fentry__ ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 <ftrace_pages> # cat available_filter_functions schedule yield preempt_schedule_common _cond_resched schedule_idle __put_page put_pages_list __activate_page activate_page lru_cache_add vfs_llseek default_llseek new_sync_read new_sync_write
  38. 38. 38©2019 VMware, Inc. Finding __fentry__ ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 <ftrace_pages> # echo default_llseek > set_ftrace_filter # echo sched_idle >> set_ftrace_filtre # cat set_ftrace_filter schedule_idle default_llseek
  39. 39. 39©2019 VMware, Inc. dyn_ftrace.flags Bits 0-24: Counter for number of callbacks registered to function Bit 25: Function is being initialized and not ready to touch  module init Bit 26: Return from callback may modify IP address  kprobe or live patching Bit 27: Has unique trampoline and its enabled Bit 28: Has unique trampoline Bit 29: Saves regs is enabled (see bit 30) Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does) Bit 31: The function is being traced
  40. 40. 40©2019 VMware, Inc. dyn_ftrace.flags Bits 0-24: Counter for number of callbacks registered to function Bit 25: Function is being initialized and not ready to touch  module init Bit 26: Return from callback may modify IP address  kprobe or live patching Bit 27: Has unique trampoline and its enabled Bit 28: Has unique trampoline Bit 29: Saves regs is enabled (see bit 30) Bit 30: Needs to call ftrace_regs_caller (to save all regs like int3 does) Bit 31: The function is being traced
  41. 41. 41©2019 VMware, Inc. Finding __fentry__ <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: ip = 0xffffffff81a14910 flags = 0 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages>
  42. 42. 42©2019 VMware, Inc. Finding __fentry__ vmlinux: ip = 0xffffffff81a14910 flags = 0x40000001 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0x00000001 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages> <schedule>: nop [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] bit 30 count = 1 count = 1
  43. 43. 43©2019 VMware, Inc. Finding __fentry__ <schedule>: call ftrace_regs_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: call ftrace_caller [..] vmlinux: ip = 0xffffffff81a14910 flags = 0xe0000001 ip = 0xffffffff81a149b0 flags = 0 ip = 0xffffffff81a14c00 flags = 0 ip = 0xffffffff81a14c20 flags = 0 ip = 0xffffffff81a14c50 flags = 0 ip = 0xffffffff8126f7b0 flags = 0x80000001 ip = 0xffffffff8126f8f0 flags = 0 ip = 0xffffffff8126fcc0 flags = 0 ip = 0xffffffff81270440 flags = 0 ip = 0xffffffff81270690 flags = 0 ip = 0xffffffff8131f0f0 flags = 0 ip = 0xffffffff8131f120 flags = 0 ip = 0xffffffff8131fb40 flags = 0 ip = 0xffffffff8131fd00 flags = 0 ip = 0xffffffff8131fed0 flags = 0 <ftrace_pages> bit 29,30,31 count = 1 bit 31 count = 1
  44. 44. 44©2019 VMware, Inc. Modifying code at runtime! Not the same as at boot up SMP boxes need to take extra care Other CPUs may be executing the code you change x86 has non uniform instruction (different sizes) Instructions may cross cache and page boundaries
  45. 45. 45©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  46. 46. 46©2019 VMware, Inc. Modifying code at runtime! <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  47. 47. 47©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  48. 48. 48©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f 44 00 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: e8 1b d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  49. 49. 49©2019 VMware, Inc. Modifying code at runtime! <schedule>: 0f 1f d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 <schedule>: e8 1b d0 1e 00 53 65 48 8b 1c 25 00 61 01 00 48 8b 43 10 48 85 c0 CPU 0 CPU 1
  50. 50. 50©2019 VMware, Inc. 0f 1f d0 1e 00 ??? 0f 1f d0 1e 00
  51. 51. 51©2019 VMware, Inc. 0f 1f d0 1e 00 ??? BOOM! CRASH! General Protection Fault! REBOOT!
  52. 52. 52©2019 VMware, Inc. How to go from this! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  53. 53. 53©2019 VMware, Inc. To this? <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  54. 54. 54©2019 VMware, Inc. Breakpoints!
  55. 55. 55©2019 VMware, Inc. Breakpoints! <schedule>: 0f 1f 44 00 00 nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  56. 56. 56©2019 VMware, Inc. Breakpoints! <schedule>: <cc> 1f 44 00 00 <int3>nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  57. 57. 57©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax
  58. 58. 58©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  59. 59. 59©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  60. 60. 60©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  61. 61. 61©2019 VMware, Inc. How this works <schedule>: <int3>nop push %rbx mov %gs:0x16100,%rbx mov 0x10(%rbx),%rax test %rax,%rax do_int3(struct pt_regs *regs) { regs->ip += 5; return }
  62. 62. 62©2019 VMware, Inc. Breakpoints! <schedule>: <cc> 1f 44 00 00 <int3>nop 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  63. 63. 63©2019 VMware, Inc. Breakpoints! <schedule>: <cc>1b d0 1e 00 <int3>callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  64. 64. 64©2019 VMware, Inc. Breakpoints! <schedule>: e8 1b d0 1e 00 callq ffffffff81c01930 <__fentry__> 53 push %rbx 65 48 8b 1c 25 00 61 mov %gs:0x16100,%rbx 01 00 ffffffff81a1491b: R_X86_64_32S current_task 48 8b 43 10 mov 0x10(%rbx),%rax 48 85 c0 test %rax,%rax 74 10 je ffffffff81a14938 <schedule+0x28> f6 43 24 20 testb $0x20,0x24(%rbx) 75 49 jne ffffffff81a14977 <schedule+0x67> 48 83 bb 20 0c 00 00 cmpq $0x0,0xc20(%rbx) 00 74 1f je ffffffff81a14957 <schedule+0x47> 31 ff xor %edi,%edi e8 a1 f8 ff ff callq ffffffff81a141e0 <__schedule> 65 48 8b 04 25 00 61 mov %gs:0x16100,%rax 01 00
  65. 65. 65©2019 VMware, Inc. Registering a callback with ftrace Call register_ftrace_function() Takes a ftrace_ops descriptor Static ftrace_ops (allocated at build time)  Top level ftrace tracers – function – function_graph – stack tracer – latency tracers Dynamic ftrace_ops (allocated via kmalloc() )  perf  kprobes  ftrace instances (sub buffers)
  66. 66. 66©2019 VMware, Inc. ftrace_ops structure struct ftrace_ops { ftrace_func_t func; struct ftrace_ops __rcu *next; unsigned long flags; void *private; ftrace_func_t saved_func; #ifdef CONFIG_DYNAMIC_FTRACE struct ftrace_ops_hash local_hash; struct ftrace_ops_hash *func_hash; struct ftrace_ops_hash old_hash; unsigned long trampoline; unsigned long trampoline_size; #endif };
  67. 67. 67©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call ftrace_stub restore_regs ftrace_stub: retq
  68. 68. 68©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call func_trace restore_regs ftrace_stub: retq void func_trace() { /* trace */ }
  69. 69. 69©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call func_trace restore_regs ftrace_stub: retq void func_trace() { /* trace */ } ftrace_ops.func
  70. 70. 70©2019 VMware, Inc. Calling more that one callback on a function? Direct calls to a single function are easy Handling more than one, requires a list operation But then all functions being traced will go through a list!
  71. 71. 71©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: nop [..] <preempt_schedule_common>: nop [..] <_cond_resched>: nop [..] <schedule_idle>: nop [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void func1_func() { /* trace */ } void func2_func() { /* trace */ }
  72. 72. 72©2019 VMware, Inc. Multiple function callback example Run function tracer on all functions Run perf on just the scheduler
  73. 73. 73©2019 VMware, Inc. Multiple function callback example Want to trace schedule_idle()? NO list_func() perf Yes! function tracer
  74. 74. 74©2019 VMware, Inc. Multiple function callback example Want to trace __cond_resched()? NO list_func() perf Yes! function tracer
  75. 75. 75©2019 VMware, Inc. Multiple function callback example Want to trace yield()? NO list_func() perf Yes! function tracer
  76. 76. 76©2019 VMware, Inc. Multiple function callback example Want to trace schedule()? Yes! list_func() perf Yes! function tracer
  77. 77. 77©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: callq ftrace_caller [..] <preempt_schedule_common>: callq ftrace_caller [..] <_cond_resched>: callq ftrace_caller [..] <schedule_idle>: callq ftrace_caller [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void function_trace() { /* function tracing */ } void perf_func() { /* function profiling */ }
  78. 78. 78©2019 VMware, Inc. ftrace_caller trampoline <schedule>: callq ftrace_caller [..] <yield>: callq dynamic_trampoline [..] <preempt_schedule_common>: callq dynamic_trampoline [..] <_cond_resched>: callq dynamic_trampoline [..] <schedule_idle>: callq dynamic_trampoline [..] vmlinux: <ftrace_caller>: save_regs load_regs ftrace_call: call list_func restore_regs ftrace_stub: retq void list_func() { /* iterate */ } void function_trace() { /* function tracing */ } void perf_func() { /* function profiling */ } <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  79. 79. 79©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use?
  80. 80. 80©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  81. 81. 81©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  82. 82. 82©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  83. 83. 83©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: callq dynamic_trampoline push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  84. 84. 84©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  85. 85. 85©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! kfree(dynamic_trampoline)
  86. 86. 86©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Scheduled
  87. 87. 87©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq
  88. 88. 88©2019 VMware, Inc. Dynamic Trampoline Problem <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq CRASH!
  89. 89. 89©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use?
  90. 90. 90©2019 VMware, Inc. Problems with dynamic trampolines When can you free them? How do you know they are still not in use? Use RCU!
  91. 91. 91©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule
  92. 92. 92©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule Used by ftrace in v4.12
  93. 93. 93©2019 VMware, Inc. call_rcu_tasks() Added in Linux v3.18  Commit 8315f42295d2667 by Paul E. McKenney synchronize_rcu_tasks()  Waits for all tasks to voluntary schedule  We do not allow ftrace callbacks to schedule  The trampoline will not schedule Used by ftrace in v4.12  Yes Steven was lazy  Added with the threat that Paul was going to remove it
  94. 94. 94©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted!
  95. 95. 95©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! call_rcu_tasks(dynamic_trampoline)
  96. 96. 96©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Preempted! call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  97. 97. 97©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq Scheduled call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  98. 98. 98©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  99. 99. 99©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) Wait’s for all tasks to voluntarily schedule
  100. 100. 100©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq call_rcu_tasks(dynamic_trampoline) All tasks have scheduled
  101. 101. 101©2019 VMware, Inc. Dynamic Trampoline Solution <schedule>: nop push %rbx mov %gs:0x16100,%rbx vmlinux: <dynamic_trampoline>: save_regs load_regs ftrace_call: call function_trace restore_regs ftrace_stub: retq kfree(dynamic_trampoline)
  102. 102. 102©2019 VMware, Inc. More uses of the function callback code ftrace_regs_caller() gives all registers A callback can modify any register  Needs a flag in ftrace_ops to modify the instruction pointer (ip)
  103. 103. 103©2019 VMware, Inc. Live Kernel Patching! <schedule>: callq ftrace_caller [..] Buggy schedule() function <ftrace_caller>: save_regs load_regs call kernel_patch restore_regs retq void kernel_patch() { regs.ip = schedule_fix; } <schedule_fix>: nop [..] Fixed schedule() function
  104. 104. Thank You

×