Kernel bug hunting
Modena, 27 November 2019
Andrea Righi
andrea.righi@canonical.com
www.canonical.com
twitter: @arighi
Linux kernel is complex
●
25.590.567 lines of code right now
find -type f -name '*.[chS]' -exec wc -l {} ; | awk 'BEGIN{sum=0}{sum+=$1}END{print sum}'
●
229 patches last week week
git log --oneline v5.4-rc7..v5.4-rc8 | wc -l
●
195 files changed, 3398 insertions(+), 4081 deletions(-)
git diff --stat v5.4-rc7..v5.4-rc8 | tail -1
https://www.linuxcounter.net/statistics/kernel
Kernel bugs
●
Kernel panic
●
Fatal error, system becomes unusable
●
Kernel oops
●
Non-fatal error, some functionalities can be compromised
●
Wrong result
●
Fatal error from user’s perspective
●
Security vulnerability
●
Side-channel attack, data leakage, …
●
Performance regression
●
Everything is correct, but slower...
Debugging techniques
●
blinking LED
●
printk() / dump_stack()
●
procfs
●
SysReq key (Documentation/admin-guide/sysrq.rst)
●
debugger (i.e., kgdb, …)
●
virtualization
●
profiling / tracing
Kernel debugging hands-on
●
Virtualization can help to track down kernel bugs
●
virtme
●
Run the kernel inside a qemu/kvm instance, virtualizing the running
system
●
Generate crash dump
●
Analyze system data offline (after the crash)
●
crash test kernel module
●
https://github.com/arighi/crashtest
●
Simple scripts to speed up kernel development
(wrappers around virtme/qemu/crash):
●
https://github.com/arighi/kernel-crash-tools
Profiling vs tracing
Profiling vs tracing
●
Profiling
●
Create a periodic timed interrupt that collects the current
program counter, function address and the entire stack
back trace
●
Tracing
●
Record times and invocations of specific events
Profiling example: perf top
Tracing example: strace
●
strace(1): system call tracer in Linux
●
It uses the ptrace() system call that pauses the target
process for each syscall so that the debugger can read
the state
●
And it’s doing this twice: when the syscall begins and when
it ends!
strace overhead
### Regular execution ###
$ dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0,501455 s, 1.0 MB/s
### Strace execution (tracing a syscall that is never called) ###
$ strace -e trace=accept dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 44.0216 s, 11,6 kB/s
+++ exited with 0 +++
Tracing kernel functions: kprobe
eBPF
eBPF history
●
Initially it was BPF: Berkeley Packet Filter
●
It has its roots in BSD in the very early 1990’s
●
Originally designed as a mechanism for fast filtering network packets
●
3.15: Linux introduced eBPF: extended Berkeley Packet Filter
●
More efficient / more generic than the original BPF
●
3.18: eBPF VM exposed to user-space
●
4.9: eBPF programs can be attached to perf_events
●
4.10: eBPF programs can be attached to cgroups
●
4.15: eBPF LSM hooks
eBPF features
●
Highly efficient VM that lives in the kernel
●
Inject safe sanboxed bytecode into the kernel
●
Attach code to kernel functions / events
●
In-kernel JIT compiler
●
Dynamically translate eBPF bytecode into native opcodes
●
eBPF makes kernel programmable without having to
cross kernel/user-space boundaries
●
Access in-kernel data structures directly without the risk
of crashing, hanging or breaking the kernel in any way
eBPF as a VM
●
Example assembly of a simple
eBPF filter
●
Load 16-bit quantity from offset
12 in the packet to the
accumulator (ethernet type)
●
Compare the value to see if the
packet is an IP packet
●
If the packet is IP, return TRUE
(packet is accepted)
●
otherwise return 0 (packet is
rejected)
●
Only 4 VM instructions to filter
IP packets!
ldh [12]
jeq #ETHERTYPE_IP, l1, l2
l1: ret #TRUE
l2: ret #0
eBPF use cases
How many eBPF programs are running in your laptop?
●
Run this:
ls -la /proc/*/fd | grep bpf-prog | wc -l
Flame graphs
●
CPU flame graphs
●
x-axis
sample population
●
y-axis
●
stack depth
●
Wider boxes =
More samples =
More CPU time =
More overhead!
Flame Graph Search
s..
sun/nio/ch/SocketChannel..
org/mozi..
org..
io..
d..
tcp_v4_rcv
i..
org..
vfs_write
io/netty/channel/AbstractChannelHandlerContext:.fireChannelRead
JavaCalls::call_virtual
o..
ip_q..
org/mozi..
tcp_sen..
cpu..
org/..
[unknown]
io/netty/channel/AbstractCha..
JavaCalls::call_virtual
org/mozilla/javascript/gen/file__root_vert_x_2_1_..
ip..
io/netty/channel/nio/AbstractNioByteChannel$NioByteUnsafe:.read
io/netty/channel/nio/Abstr..
t..
s..
o..
JavaCalls::call_helper
Interpreter
ip_..
__do_softirq
ip_local_out
ep_p..
org/mozilla/javas..
s..
org/mozilla/javascript/gen/file__root_vert_x_2_1_5_sys_mods_io..
system_ca..
s..
_..
[unkn..
__..
sta..
sun..
_..
tcp_transmit_skb
do_softirq
org/m..
__..
JavaThread::thread_main_inner
io/netty/channel/ChannelDupl..
io/netty/channel/DefaultCha..
java
ip_rcv_fi..
t..
G..
o..
tcp_v4_..
__tcp..
do_sync_write
v..
x..
call_stub
ip_finish_out..
net_rx_act..
io/netty/channel/ChannelOut..
wrk
tcp_write_xmit
ip_..
loc..
syste..
i..
aeProcessEvents
system_call_fastpath
org..
do..
local_bh_en..
or..
swapper
org/mozilla/javascript/gen/file__root_vert_x_2_1_5..
JavaThread::run
sun/nio/ch/FileDispatch..
[..
__tcp_push_pendi..
tc..
sun/re..
tcp_..
tcp_w..
process_ba..
io/ne..
[..
inet_se..
ip..or..
thread_entry
Interpreter
org/vertx/java/core/http/impl/DefaultHttpServer$ServerHandler:.doM..
tcp_rcv..
vfs_write
ip_queue_xmit
sock_aio..
aeMain
_..
org/vertx/java/core/net/impl/VertxHandler:.channelRead
org/moz..
__netif_r..
__netif_r..
io/netty/channel/AbstractCh..
do_softirq_..
org/mozilla/javascript/gen/file__root_vert_x_2_1_5_sys_mods_io_..
io/netty/channel/nio/NioEventLoop:.processSelectedKeys
org/mozilla/javas..
Interpreter
so..
_..
[unknown]
io/netty/channel/AbstractCh..
or..
io/netty/channel/AbstractChannelHandlerContext:.fireChannelRead
ip_local_..
__..
org/vertx/java/core/net/impl..
io/..
sock_aio_write
ip_rcv
tcp_sendmsg
e..
do..
thread_main
ip_..
io..
ip_output
io/netty/channel/nio/NioEventLoop:.processSelectedKeysOptimized
java_start
org/vertx/java/core/http/impl/ServerConnection:.handleRequests..
pr..
socke..
sys_e..
org/moz..
or..
io/netty/channel/AbstractCha..
io/netty/handler/codec/ByteToMessageDecoder:.channelRead
x..
io/netty/channel/AbstractCha..
h..
start_thread
ne..
inet_sendmsg
start_thread
r..
ip_local_..
org/mozilla/javas..
o..
sys_write
socket_wri..
i..
io/netty/handler/codec/ByteT..
org/..
do_sync_..
sys_write
BCC tracing tools
●
BPF Compiler Collection https://github.com/iovisor/bcc
●
Front-end to eBPF
●
BCC makes eBPF programs easier to write
●
Include C wrapper around LLVM
●
Python
●
Lua
●
C++
●
C helper libs
●
Lots of pre-defined tools available
Example #1: trace exec()
●
Intercept all the processes executed in the system
Example #2: keylogger
●
Identify where and how keyboard characters are received
and processed by the kernel
Example #3: ping
●
Identify where ICMP packets (ECHO_REQUEST /
ECHO_REPLY) are received and processed by the kernel
Example #4: task wait / wakeup
●
Determine the stack trace
of a sleeping process and
the stack trace of the
process that wakes up a
sleeping process
Is tracing safe?
eBPF: performance overhead - use case #1
●
user-space deamon using an eBPF program attached to a
function (via kprobe)
●
kernel is updated, function doesn’t exist anymore
●
daemon starts to use an older/slower non-BPF method
●
5% performance regression
eBPF: performance overhead - use case #2
●
kernel function mapped to a 2MB huge page
●
eBPF program attached to that function (via kprobe)
●
setting the kprobe causes the function to be remapped to a
regular 4KB page
●
increased TLB misses
●
2% performance regression
eBPF: compile once, run everywhere?
●
… not exactly! :-(
●
eBPF programs are compiled on the target system
immediately before they are loaded
●
Linux headers are needed to understand kernel data
structures
●
structure randomization is a problem
●
BTF (BPF type format) has been created
●
kernel data description embedded in the kernel (no longer
any need to ship kernel headers around!)
Conclusion
●
Virtualization is your friend to speed up kernel
development
●
Real-time tracing can be an effective way to study and
understand how the kernel works
●
Kernel development can be challenging... but fun! :)
References
●
Brendan Gregg blog
●
http://brendangregg.com/blog/
●
BCC tools
●
https://github.com/iovisor/bcc
●
virtme
●
https://github.com/amluto/virtme
●
crashtest
●
https://github.com/arighi/crashtest
●
kernel-crash-tools
●
https://github.com/arighi/kernel-crash-tools
Thank you
Andrea Righi
andrea.righi@canonical.com
www.canonical.com
twitter: @arighi

Kernel bug hunting

  • 1.
    Kernel bug hunting Modena,27 November 2019 Andrea Righi andrea.righi@canonical.com www.canonical.com twitter: @arighi
  • 2.
    Linux kernel iscomplex ● 25.590.567 lines of code right now find -type f -name '*.[chS]' -exec wc -l {} ; | awk 'BEGIN{sum=0}{sum+=$1}END{print sum}' ● 229 patches last week week git log --oneline v5.4-rc7..v5.4-rc8 | wc -l ● 195 files changed, 3398 insertions(+), 4081 deletions(-) git diff --stat v5.4-rc7..v5.4-rc8 | tail -1
  • 3.
  • 4.
    Kernel bugs ● Kernel panic ● Fatalerror, system becomes unusable ● Kernel oops ● Non-fatal error, some functionalities can be compromised ● Wrong result ● Fatal error from user’s perspective ● Security vulnerability ● Side-channel attack, data leakage, … ● Performance regression ● Everything is correct, but slower...
  • 5.
    Debugging techniques ● blinking LED ● printk()/ dump_stack() ● procfs ● SysReq key (Documentation/admin-guide/sysrq.rst) ● debugger (i.e., kgdb, …) ● virtualization ● profiling / tracing
  • 6.
    Kernel debugging hands-on ● Virtualizationcan help to track down kernel bugs ● virtme ● Run the kernel inside a qemu/kvm instance, virtualizing the running system ● Generate crash dump ● Analyze system data offline (after the crash) ● crash test kernel module ● https://github.com/arighi/crashtest ● Simple scripts to speed up kernel development (wrappers around virtme/qemu/crash): ● https://github.com/arighi/kernel-crash-tools
  • 7.
  • 8.
    Profiling vs tracing ● Profiling ● Createa periodic timed interrupt that collects the current program counter, function address and the entire stack back trace ● Tracing ● Record times and invocations of specific events
  • 9.
  • 10.
    Tracing example: strace ● strace(1):system call tracer in Linux ● It uses the ptrace() system call that pauses the target process for each syscall so that the debugger can read the state ● And it’s doing this twice: when the syscall begins and when it ends!
  • 11.
    strace overhead ### Regularexecution ### $ dd if=/dev/zero of=/dev/null bs=1 count=500k 512000+0 records in 512000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 0,501455 s, 1.0 MB/s ### Strace execution (tracing a syscall that is never called) ### $ strace -e trace=accept dd if=/dev/zero of=/dev/null bs=1 count=500k 512000+0 records in 512000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 44.0216 s, 11,6 kB/s +++ exited with 0 +++
  • 12.
  • 13.
  • 14.
    eBPF history ● Initially itwas BPF: Berkeley Packet Filter ● It has its roots in BSD in the very early 1990’s ● Originally designed as a mechanism for fast filtering network packets ● 3.15: Linux introduced eBPF: extended Berkeley Packet Filter ● More efficient / more generic than the original BPF ● 3.18: eBPF VM exposed to user-space ● 4.9: eBPF programs can be attached to perf_events ● 4.10: eBPF programs can be attached to cgroups ● 4.15: eBPF LSM hooks
  • 15.
    eBPF features ● Highly efficientVM that lives in the kernel ● Inject safe sanboxed bytecode into the kernel ● Attach code to kernel functions / events ● In-kernel JIT compiler ● Dynamically translate eBPF bytecode into native opcodes ● eBPF makes kernel programmable without having to cross kernel/user-space boundaries ● Access in-kernel data structures directly without the risk of crashing, hanging or breaking the kernel in any way
  • 16.
    eBPF as aVM ● Example assembly of a simple eBPF filter ● Load 16-bit quantity from offset 12 in the packet to the accumulator (ethernet type) ● Compare the value to see if the packet is an IP packet ● If the packet is IP, return TRUE (packet is accepted) ● otherwise return 0 (packet is rejected) ● Only 4 VM instructions to filter IP packets! ldh [12] jeq #ETHERTYPE_IP, l1, l2 l1: ret #TRUE l2: ret #0
  • 17.
  • 18.
    How many eBPFprograms are running in your laptop? ● Run this: ls -la /proc/*/fd | grep bpf-prog | wc -l
  • 19.
    Flame graphs ● CPU flamegraphs ● x-axis sample population ● y-axis ● stack depth ● Wider boxes = More samples = More CPU time = More overhead! Flame Graph Search s.. sun/nio/ch/SocketChannel.. org/mozi.. org.. io.. d.. tcp_v4_rcv i.. org.. vfs_write io/netty/channel/AbstractChannelHandlerContext:.fireChannelRead JavaCalls::call_virtual o.. ip_q.. org/mozi.. tcp_sen.. cpu.. org/.. [unknown] io/netty/channel/AbstractCha.. JavaCalls::call_virtual org/mozilla/javascript/gen/file__root_vert_x_2_1_.. ip.. io/netty/channel/nio/AbstractNioByteChannel$NioByteUnsafe:.read io/netty/channel/nio/Abstr.. t.. s.. o.. JavaCalls::call_helper Interpreter ip_.. __do_softirq ip_local_out ep_p.. org/mozilla/javas.. s.. org/mozilla/javascript/gen/file__root_vert_x_2_1_5_sys_mods_io.. system_ca.. s.. _.. [unkn.. __.. sta.. sun.. _.. tcp_transmit_skb do_softirq org/m.. __.. JavaThread::thread_main_inner io/netty/channel/ChannelDupl.. io/netty/channel/DefaultCha.. java ip_rcv_fi.. t.. G.. o.. tcp_v4_.. __tcp.. do_sync_write v.. x.. call_stub ip_finish_out.. net_rx_act.. io/netty/channel/ChannelOut.. wrk tcp_write_xmit ip_.. loc.. syste.. i.. aeProcessEvents system_call_fastpath org.. do.. local_bh_en.. or.. swapper org/mozilla/javascript/gen/file__root_vert_x_2_1_5.. JavaThread::run sun/nio/ch/FileDispatch.. [.. __tcp_push_pendi.. tc.. sun/re.. tcp_.. tcp_w.. process_ba.. io/ne.. [.. inet_se.. ip..or.. thread_entry Interpreter org/vertx/java/core/http/impl/DefaultHttpServer$ServerHandler:.doM.. tcp_rcv.. vfs_write ip_queue_xmit sock_aio.. aeMain _.. org/vertx/java/core/net/impl/VertxHandler:.channelRead org/moz.. __netif_r.. __netif_r.. io/netty/channel/AbstractCh.. do_softirq_.. org/mozilla/javascript/gen/file__root_vert_x_2_1_5_sys_mods_io_.. io/netty/channel/nio/NioEventLoop:.processSelectedKeys org/mozilla/javas.. Interpreter so.. _.. [unknown] io/netty/channel/AbstractCh.. or.. io/netty/channel/AbstractChannelHandlerContext:.fireChannelRead ip_local_.. __.. org/vertx/java/core/net/impl.. io/.. sock_aio_write ip_rcv tcp_sendmsg e.. do.. thread_main ip_.. io.. ip_output io/netty/channel/nio/NioEventLoop:.processSelectedKeysOptimized java_start org/vertx/java/core/http/impl/ServerConnection:.handleRequests.. pr.. socke.. sys_e.. org/moz.. or.. io/netty/channel/AbstractCha.. io/netty/handler/codec/ByteToMessageDecoder:.channelRead x.. io/netty/channel/AbstractCha.. h.. start_thread ne.. inet_sendmsg start_thread r.. ip_local_.. org/mozilla/javas.. o.. sys_write socket_wri.. i.. io/netty/handler/codec/ByteT.. org/.. do_sync_.. sys_write
  • 20.
    BCC tracing tools ● BPFCompiler Collection https://github.com/iovisor/bcc ● Front-end to eBPF ● BCC makes eBPF programs easier to write ● Include C wrapper around LLVM ● Python ● Lua ● C++ ● C helper libs ● Lots of pre-defined tools available
  • 21.
    Example #1: traceexec() ● Intercept all the processes executed in the system
  • 22.
    Example #2: keylogger ● Identifywhere and how keyboard characters are received and processed by the kernel
  • 23.
    Example #3: ping ● Identifywhere ICMP packets (ECHO_REQUEST / ECHO_REPLY) are received and processed by the kernel
  • 24.
    Example #4: taskwait / wakeup ● Determine the stack trace of a sleeping process and the stack trace of the process that wakes up a sleeping process
  • 25.
  • 26.
    eBPF: performance overhead- use case #1 ● user-space deamon using an eBPF program attached to a function (via kprobe) ● kernel is updated, function doesn’t exist anymore ● daemon starts to use an older/slower non-BPF method ● 5% performance regression
  • 27.
    eBPF: performance overhead- use case #2 ● kernel function mapped to a 2MB huge page ● eBPF program attached to that function (via kprobe) ● setting the kprobe causes the function to be remapped to a regular 4KB page ● increased TLB misses ● 2% performance regression
  • 28.
    eBPF: compile once,run everywhere? ● … not exactly! :-( ● eBPF programs are compiled on the target system immediately before they are loaded ● Linux headers are needed to understand kernel data structures ● structure randomization is a problem ● BTF (BPF type format) has been created ● kernel data description embedded in the kernel (no longer any need to ship kernel headers around!)
  • 29.
    Conclusion ● Virtualization is yourfriend to speed up kernel development ● Real-time tracing can be an effective way to study and understand how the kernel works ● Kernel development can be challenging... but fun! :)
  • 30.
    References ● Brendan Gregg blog ● http://brendangregg.com/blog/ ● BCCtools ● https://github.com/iovisor/bcc ● virtme ● https://github.com/amluto/virtme ● crashtest ● https://github.com/arighi/crashtest ● kernel-crash-tools ● https://github.com/arighi/kernel-crash-tools
  • 31.