© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Without Stop Machine
The Next Step of Kernel Live Patching
Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com>
Linux Technology Research Center
Yokohama Research Lab. Hitachi Ltd.,
LinuxCon NA 2014
(Aug. 2014)
1
© Hitachi, Ltd. 2014. All rights reserved.
Speaker
• Masami Hiramatsu
– A researcher, working for Hitachi
• Researching many RAS features
– A linux kprobes-related maintainer
• Ftrace dynamic kernel event (a.k.a. kprobe-tracer)
• Perf probe (a tool to set up the dynamic events)
• X86 instruction decoder (in kernel)
2
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
Kpatch internal
Kpatch without stop_machine
Conclusion and Discussion
Note: this presentation is only focusing on the kernel-module side of kpatch.
More generic design and implementation, please attend to -
kpatch: Have Your Security And Eat It Too! – Josh Poimboeuf
Aug. 22. pm2:30
3
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal
Kpatch Overview
Active Safeness check
Stop_machine
Kpatch without stop_machine
Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion 4
© Hitachi, Ltd. 2014. All rights reserved.
What’s Kpatch?
• Kpatch is a LIVE patching function for kernel
– This applys a binary patch to kernel on-line
– Patching is done without shutdown
• Only for a small and critical issues
– Not for major kernel update
5
© Hitachi, Ltd. 2014. All rights reserved.
Why Live Patching Required?
• Live patching is important for appliances for
mission critical systems
– Some embedded appliances are hard to maintain
frequently
• Those are distributed widely in country side
• Not in the big data center!
– Some appliances can’t accept 10ms downtime
• Factory control system etc.
6
© Hitachi, Ltd. 2014. All rights reserved.
But for Major Updates?
• M.C. systems have periodic maintenance
– Major fixes can be applied and rebooted
– In between the maintenance, live patching will be used
• Live patching and major update are complement
each other
– Live patching temporarily fixes small critical incidents
– Major update permanently fixes all bugs
7
Planned
Maintenance
Major UpdateLive Patching
criticalminor
© Hitachi, Ltd. 2014. All rights reserved.
The History of Live patching on Linux
• Live patching is not new
8
Pannus Live Patch (2004-2006)
http://pannus.sourceforge.net/
Livepatch (2005)
http://ukai.jp/Slides/2005/1202-b2con/mop/livepatch.html
ksplice (2007-)
https://www.ksplice.com/
kGraft (2014)
kpatch (2014)
2004
For
applic
ation
For
kernel
2014
Use ftrace to replace function
Will be supported by major distributors
Build from scratch
Acquired and supported by Oracle
Developed for CGL
No distribution support
Ptrace and mmap based one
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal
Kpatch Overview
Active Safeness check
Stop_machine
Kpatch without stop_machine
Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion 9
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch: Overview
• Kpatch has 2 components
– Kpatch build: Build a binary patch module
– Kpatch.ko: The kernel module of Kpatch
10
Original
src
Patched
src
Patch
Build Current
vmlinux
Build Patched
vmlinux
Per-function
diff
Patch
module
A module which
contains new
functions
Done by Kpatch build
Running
linux
Kpatch.ko
Done by kpatch.ko
Do patch
New functions
On old functions
Today’s talk
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch: How to Patch
• Kpatch uses Ftrace to patch
– Hook the target function entry with registers
– Change regs->ip to new function (change the flow)
11
Call foo()Call foo()
Call fentryCall fentry
Save RegsSave Regs
Call ftrace_opsCall ftrace_ops Get new IP from
Hash table
Get new IP from
Hash table
Restore RegsRestore Regs
Change regs->ipChange regs->ip
ReturnReturn
……
ReturnReturn
……
foo()
New foo()
Ret to regs->ipRet to regs->ip
……
Ftrace hooks
Foo() entry
Find the new
IP from hash
table
Ftrace returns
To new foo()
ReturnReturn
Foo() is called even
After patching.
Function pointer is
Available 
ftrace
kpatch.ko
© Hitachi, Ltd. 2014. All rights reserved.
Conflict of Old and New Functions
• Kpatch will update the execution path of
a function
– Q: What happen if the patched function is under
executed?
– A: Old and new functions are executed at the
same time
!!This should not happen!!
• Kpatch ensures the old functions are not
executed when patching
– “Active Safeness Check”
12
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check
• Executing functions are on the stack
• And IP register points current function too
• Active Safeness Check
– Do stack dump to check the target functions are
not executed, for each thread.
– Need to be done when the process is stopped.
– stop_machine is used 13
Func1Func1
Func2Func2
Func3Func3
Func1+XX
Func2+YY
Stack at here
Func1+XX
Stack at here
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
14
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Call old
func1
Call old
func2
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
15
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Call old
func1
Call old
func2
All running processes and
interrupts are stopped
Call Stop Machine
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
16
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Safeness
Check
Call old
func1
Call old
func2
Walk through the all
Thread and check
Old funcs on stacks
All running processes and
interrupts are stopped
Call Stop Machine
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness Check With Stop_machine
• Kpatch uses stop_machine to check stacks
17
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Update hash
table
Safeness
Check
Call old
func1
Call old
func2
Call New
func2
Call New
func2
Call New
func1
Call New
func1
Walk through the all
Thread and check
Old funcs on stacks
Now switch to
New functionsAll running processes and
interrupts are stopped
Call Stop Machine Return
© Hitachi, Ltd. 2014. All rights reserved.
Stop_machine: Pros and Cons
• Pros
– Safe, simple and easy to review, Good for the 1st
version
• Cons
– Stop_machine stops all processes a while
• It is critical for control/network appliances
– In virtual environment, this takes longer time
• We need to wait all VCPUs are scheduled on the host
machine
18
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal
Kpatch and Ftrace
Kpatch and Safeness check
Stop_machine
Kpatch without stop_machine
Live Patching Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion 19
© Hitachi, Ltd. 2014. All rights reserved.
Live Patching Rules
• Live patching must follow the rules
1. All the new functions in a patch must be applied
at once
●
We need an atomic operation
2. After switching new function, the old function
must not be executed
●
We have to ensure no threads runs on old
functions
●
And no threads sleeps on them
20
© Hitachi, Ltd. 2014. All rights reserved.
Two actions for the solution
1. Introduce an atomic reference counter
2. Active safeness check at the context switch
21
© Hitachi, Ltd. 2014. All rights reserved.
Atomic Function Reference Counter
1. Introduce an atomic reference counter
– Without stop_machine, functions can be called
while patching
• Ensure no one actually runs functions -> refcounter
• Increment the refcounter at entry
• Decrement the refcounter at exit
– If refcounter is 0, update ALL function paths
• We are sure there is no users
22
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry+1
Start patcing
Refcnt = 1
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Prepare new
Hash table
Safeness
Check
Call old
func1
Call old
func2
Call old
func1
+1 +1
+1
-1-1
-1
+1
Start patcing
Refcnt = 1
Each function call
Inc/dec refcnt
While patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
25
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Prepare new
Hash table
Safeness
Check
Call old
func1
Call old
func2
Call old
func3
Call old
func3
Call old
func1
+1 +1
+1
+1
+1
-1
-1-1
-1
-1
+1
Patching process
Is over, but refcnt
Is not 0
Start patcing
Refcnt = 1
Each function call
Inc/dec refcnt
While patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter
• Patching(switching) controlled by refcount
26
Time
Patching
Process
Process1
Process2
Process3
Add
Ftrace entry
Prepare new
Hash table
Safeness
Check
Call old
func1
Call old
func2
Call old
func3
Call old
func3
Call old
func1
Call New
func2
Call New
func2
Call New
func1
Call New
func1
+1 +1
+1
+1
+1
-1
-1-1 -1
-1
-1
+1
Patching process
Is over, but refcnt
Is not 0
Refcnt is 0
Patch enabled on
all funcs
Start patcing
Refcnt = 1
Don’t count
refcnt anymore
Each function call
Inc/dec refcnt
While patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter
– Need to stop counting before and after patching
– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
27
func call
00
Do not hook function
Entry/exit before patching
refcnt
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter
– Need to stop counting before and after patching
– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
28
1 2 1 0
atomic_dec_if_positiveatomic_inc_not_zero
func call
atomic_inc
forcibly inc refcnt
Patching
atomic_dec
func call
00
refcnt
Do not hook function
Entry/exit before patching
© Hitachi, Ltd. 2014. All rights reserved.
Kpatch Reference Counter (cont.)
• Control the reference counter
– Need to stop counting before and after patching
– Use atomic_inc_not_zero/dec_if_positive
• These are stopped automatically if counter == 0
29
1 2 1 0 0 0
atomic_dec_if_
positive
->do nothing
atomic_inc_not_zero
-> do nothing
func call
atomic_dec_if_positiveatomic_inc_not_zero
func call
atomic_inc
Patching
atomic_dec
func call
0
atomic_dec_if_
positive
->do nothing
0
refcnt
Do not hook function
Entry/exit before patching
© Hitachi, Ltd. 2014. All rights reserved.
Active Safeness check without stop_machine
2. Active safeness check at the context switch
– To find threads sleeping(or going to sleep) on the
functions
– For all running processes, hook the context
switch and check stack entries safely.
– For the sleeping tasks, we can check it safely.
30
© Hitachi, Ltd. 2014. All rights reserved.
Safeness Check without Stop_machine
• 2 stages safeness checking
31
Patching
Process
stack check if task
is not running
rculock
rcuunlock
check over all threads
List up
Running
threads
Time
Stage1: Check sleeping tasks
Running
process A
Running
process B
wait on runqRun
Run
context switch
wait on runq
© Hitachi, Ltd. 2014. All rights reserved.
Safeness Check without Stop_machine
• 2 stages safeness checking
32
Patching
Process
stack check if task
is not running
rculock
rcuunlock
check over all threads
running pid
running pid
running pid
List up
Running
threads
hook
stack check
for current process
Update
Wait for all running
task is checked
Time
…
Stage1: Check sleeping tasks Stage2: Check running tasks
Running
process A
Running
process B
wait on runqRun
Run
context switch
wait on runq
© Hitachi, Ltd. 2014. All rights reserved.
How to add refcnt and context switch hook?
• To hook the function entry/return
– Use kretprobe to hook it
– For each function entry/return, inc/dec refcount
• To hook the context switch
– Use kprobe to hook it
– Do safeness check (on stack) and update running pid list
• Both are dynamic probe
– After checking the safeness, all kretprobes/kprobes are r
emoved from the target functions
– We have minimal overhead
33
© Hitachi, Ltd. 2014. All rights reserved.
Demo
• Demonstrating kpatching with/without stop_machine
– Using ftrace to trace stop_machine()
34
(Setup ftrace)
# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
(Run the kpatch)
# kpatch load kpatch-test-patch.ko
(Check the result)
# echo 0 > /sys/kernel/debug/tracing/tracing_on
# cat /sys/kernel/debug/tracing/trace
(Setup ftrace)
# echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
(Run the kpatch)
# kpatch load kpatch-test-patch.ko
(Check the result)
# echo 0 > /sys/kernel/debug/tracing/tracing_on
# cat /sys/kernel/debug/tracing/trace
© Hitachi, Ltd. 2014. All rights reserved.
Demo result
• With stop_machine
• Without stop_machine
35
cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) ! 6410.455 us | stop_machine();
cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) ! 6410.455 us | stop_machine();
cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
No stop_machine is executed
© Hitachi, Ltd. 2014. All rights reserved.
Agenda: Kpatch Without Stop Machine
Background Story
What is kpatch?
Live patching requirements
Major updates and Minor updates
Kpatch internal
Kpatch Overview
Kpatch and Safeness check
Stop_machine
Kpatch without stop_machine
Live Patcing Rules
Kpatch Reference Counter
Safeness check without stop_machine
Conclusion and Discussion 36
© Hitachi, Ltd. 2014. All rights reserved.
Conclusion
• Succeed to get rid of stop_machine() from
kpatch
– This is a proof of concept of stop_machine-free
kpatch
• This means kpatch CAN BE ready for mission critical
systems
• But still under discussion stage
• Upstreaming could be a long way
– At first, push current stop_machine-based kpatch
to upstream
– Stop_machine-free will be the next step
37
© Hitachi, Ltd. 2014. All rights reserved.
Current Issues
• Possible to miscount the function reference
– Kretprobe has no error notification
• Kretprobe can be failed to handle the function return because of return-
address buffer shortage
• Possible to fail patching with big patch
– We have to monitor all the functions are safe in the patch
– Big patch has many patched functions
• Some of them can be always used in the system
• Incremental patching could be better
• Module unloading using stop_machine
– This will happen if we replace old patch with new one
– Incremental patching can avoid this.
38
© Hitachi, Ltd. 2014. All rights reserved.
Future work
• Use a generic return-hook mechanism
– Kretprobe is for PoC, not for general use
• It can’t detect the failure of hooking
– Should be more safe (e.g. miss-hook handler)
• Context switch hook can be more general
– Tracepoint/traceevent makes it better.
– This requires kpatch as embedded feature
• Upstreaming
39
© Hitachi, Ltd. 2014. All rights reserved.
Resources and Links
• Get rid of the stop_machine from kpatch
• https://github.com/dynup/kpatch/issues/138
• My no-stopmachine branch
• https://github.com/mhiramathitachi/kpatch/tree/no-stop
machine-v1
– This requires IPMODIFY flag patchset for kernel
• http://thread.gmane.org/gmane.linux.kernel/1757201
40
© Hitachi, Ltd. 2014. All rights reserved.
Questions?
© Hitachi, Ltd. 2014. All rights reserved.
Trademarks
43
• Linux is a trademark of Linus Torvalds in the
United States, other countries, or both.
• Other company, product, or service names m
ay be trademarks or service marks of others.

Linux conna kpatch-without-stopmachine-fixed

  • 1.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Without Stop Machine The Next Step of Kernel Live Patching Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Linux Technology Research Center Yokohama Research Lab. Hitachi Ltd., LinuxCon NA 2014 (Aug. 2014) 1
  • 2.
    © Hitachi, Ltd.2014. All rights reserved. Speaker • Masami Hiramatsu – A researcher, working for Hitachi • Researching many RAS features – A linux kprobes-related maintainer • Ftrace dynamic kernel event (a.k.a. kprobe-tracer) • Perf probe (a tool to set up the dynamic events) • X86 instruction decoder (in kernel) 2
  • 3.
    © Hitachi, Ltd.2014. All rights reserved. Agenda: Kpatch Without Stop Machine Background Story Kpatch internal Kpatch without stop_machine Conclusion and Discussion Note: this presentation is only focusing on the kernel-module side of kpatch. More generic design and implementation, please attend to - kpatch: Have Your Security And Eat It Too! – Josh Poimboeuf Aug. 22. pm2:30 3
  • 4.
    © Hitachi, Ltd.2014. All rights reserved. Agenda: Kpatch Without Stop Machine Background Story What is kpatch? Live patching requirements Major updates and Minor updates Kpatch internal Kpatch Overview Active Safeness check Stop_machine Kpatch without stop_machine Live Patcing Rules Kpatch Reference Counter Safeness check without stop_machine Conclusion and Discussion 4
  • 5.
    © Hitachi, Ltd.2014. All rights reserved. What’s Kpatch? • Kpatch is a LIVE patching function for kernel – This applys a binary patch to kernel on-line – Patching is done without shutdown • Only for a small and critical issues – Not for major kernel update 5
  • 6.
    © Hitachi, Ltd.2014. All rights reserved. Why Live Patching Required? • Live patching is important for appliances for mission critical systems – Some embedded appliances are hard to maintain frequently • Those are distributed widely in country side • Not in the big data center! – Some appliances can’t accept 10ms downtime • Factory control system etc. 6
  • 7.
    © Hitachi, Ltd.2014. All rights reserved. But for Major Updates? • M.C. systems have periodic maintenance – Major fixes can be applied and rebooted – In between the maintenance, live patching will be used • Live patching and major update are complement each other – Live patching temporarily fixes small critical incidents – Major update permanently fixes all bugs 7 Planned Maintenance Major UpdateLive Patching criticalminor
  • 8.
    © Hitachi, Ltd.2014. All rights reserved. The History of Live patching on Linux • Live patching is not new 8 Pannus Live Patch (2004-2006) http://pannus.sourceforge.net/ Livepatch (2005) http://ukai.jp/Slides/2005/1202-b2con/mop/livepatch.html ksplice (2007-) https://www.ksplice.com/ kGraft (2014) kpatch (2014) 2004 For applic ation For kernel 2014 Use ftrace to replace function Will be supported by major distributors Build from scratch Acquired and supported by Oracle Developed for CGL No distribution support Ptrace and mmap based one
  • 9.
    © Hitachi, Ltd.2014. All rights reserved. Agenda: Kpatch Without Stop Machine Background Story What is kpatch? Live patching requirements Major updates and Minor updates Kpatch internal Kpatch Overview Active Safeness check Stop_machine Kpatch without stop_machine Live Patcing Rules Kpatch Reference Counter Safeness check without stop_machine Conclusion and Discussion 9
  • 10.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch: Overview • Kpatch has 2 components – Kpatch build: Build a binary patch module – Kpatch.ko: The kernel module of Kpatch 10 Original src Patched src Patch Build Current vmlinux Build Patched vmlinux Per-function diff Patch module A module which contains new functions Done by Kpatch build Running linux Kpatch.ko Done by kpatch.ko Do patch New functions On old functions Today’s talk
  • 11.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch: How to Patch • Kpatch uses Ftrace to patch – Hook the target function entry with registers – Change regs->ip to new function (change the flow) 11 Call foo()Call foo() Call fentryCall fentry Save RegsSave Regs Call ftrace_opsCall ftrace_ops Get new IP from Hash table Get new IP from Hash table Restore RegsRestore Regs Change regs->ipChange regs->ip ReturnReturn …… ReturnReturn …… foo() New foo() Ret to regs->ipRet to regs->ip …… Ftrace hooks Foo() entry Find the new IP from hash table Ftrace returns To new foo() ReturnReturn Foo() is called even After patching. Function pointer is Available  ftrace kpatch.ko
  • 12.
    © Hitachi, Ltd.2014. All rights reserved. Conflict of Old and New Functions • Kpatch will update the execution path of a function – Q: What happen if the patched function is under executed? – A: Old and new functions are executed at the same time !!This should not happen!! • Kpatch ensures the old functions are not executed when patching – “Active Safeness Check” 12
  • 13.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness Check • Executing functions are on the stack • And IP register points current function too • Active Safeness Check – Do stack dump to check the target functions are not executed, for each thread. – Need to be done when the process is stopped. – stop_machine is used 13 Func1Func1 Func2Func2 Func3Func3 Func1+XX Func2+YY Stack at here Func1+XX Stack at here
  • 14.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness Check With Stop_machine • Kpatch uses stop_machine to check stacks 14 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Call old func1 Call old func2
  • 15.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness Check With Stop_machine • Kpatch uses stop_machine to check stacks 15 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Call old func1 Call old func2 All running processes and interrupts are stopped Call Stop Machine
  • 16.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness Check With Stop_machine • Kpatch uses stop_machine to check stacks 16 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Safeness Check Call old func1 Call old func2 Walk through the all Thread and check Old funcs on stacks All running processes and interrupts are stopped Call Stop Machine
  • 17.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness Check With Stop_machine • Kpatch uses stop_machine to check stacks 17 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Update hash table Safeness Check Call old func1 Call old func2 Call New func2 Call New func2 Call New func1 Call New func1 Walk through the all Thread and check Old funcs on stacks Now switch to New functionsAll running processes and interrupts are stopped Call Stop Machine Return
  • 18.
    © Hitachi, Ltd.2014. All rights reserved. Stop_machine: Pros and Cons • Pros – Safe, simple and easy to review, Good for the 1st version • Cons – Stop_machine stops all processes a while • It is critical for control/network appliances – In virtual environment, this takes longer time • We need to wait all VCPUs are scheduled on the host machine 18
  • 19.
    © Hitachi, Ltd.2014. All rights reserved. Agenda: Kpatch Without Stop Machine Background Story What is kpatch? Live patching requirements Major updates and Minor updates Kpatch internal Kpatch and Ftrace Kpatch and Safeness check Stop_machine Kpatch without stop_machine Live Patching Rules Kpatch Reference Counter Safeness check without stop_machine Conclusion and Discussion 19
  • 20.
    © Hitachi, Ltd.2014. All rights reserved. Live Patching Rules • Live patching must follow the rules 1. All the new functions in a patch must be applied at once ● We need an atomic operation 2. After switching new function, the old function must not be executed ● We have to ensure no threads runs on old functions ● And no threads sleeps on them 20
  • 21.
    © Hitachi, Ltd.2014. All rights reserved. Two actions for the solution 1. Introduce an atomic reference counter 2. Active safeness check at the context switch 21
  • 22.
    © Hitachi, Ltd.2014. All rights reserved. Atomic Function Reference Counter 1. Introduce an atomic reference counter – Without stop_machine, functions can be called while patching • Ensure no one actually runs functions -> refcounter • Increment the refcounter at entry • Decrement the refcounter at exit – If refcounter is 0, update ALL function paths • We are sure there is no users 22
  • 23.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter • Patching(switching) controlled by refcount Time Patching Process Process1 Process2 Process3 Add Ftrace entry+1 Start patcing Refcnt = 1
  • 24.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter • Patching(switching) controlled by refcount Time Patching Process Process1 Process2 Process3 Add Ftrace entry Prepare new Hash table Safeness Check Call old func1 Call old func2 Call old func1 +1 +1 +1 -1-1 -1 +1 Start patcing Refcnt = 1 Each function call Inc/dec refcnt While patching
  • 25.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter • Patching(switching) controlled by refcount 25 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Prepare new Hash table Safeness Check Call old func1 Call old func2 Call old func3 Call old func3 Call old func1 +1 +1 +1 +1 +1 -1 -1-1 -1 -1 +1 Patching process Is over, but refcnt Is not 0 Start patcing Refcnt = 1 Each function call Inc/dec refcnt While patching
  • 26.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter • Patching(switching) controlled by refcount 26 Time Patching Process Process1 Process2 Process3 Add Ftrace entry Prepare new Hash table Safeness Check Call old func1 Call old func2 Call old func3 Call old func3 Call old func1 Call New func2 Call New func2 Call New func1 Call New func1 +1 +1 +1 +1 +1 -1 -1-1 -1 -1 -1 +1 Patching process Is over, but refcnt Is not 0 Refcnt is 0 Patch enabled on all funcs Start patcing Refcnt = 1 Don’t count refcnt anymore Each function call Inc/dec refcnt While patching
  • 27.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter (cont.) • Control the reference counter – Need to stop counting before and after patching – Use atomic_inc_not_zero/dec_if_positive • These are stopped automatically if counter == 0 27 func call 00 Do not hook function Entry/exit before patching refcnt
  • 28.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter (cont.) • Control the reference counter – Need to stop counting before and after patching – Use atomic_inc_not_zero/dec_if_positive • These are stopped automatically if counter == 0 28 1 2 1 0 atomic_dec_if_positiveatomic_inc_not_zero func call atomic_inc forcibly inc refcnt Patching atomic_dec func call 00 refcnt Do not hook function Entry/exit before patching
  • 29.
    © Hitachi, Ltd.2014. All rights reserved. Kpatch Reference Counter (cont.) • Control the reference counter – Need to stop counting before and after patching – Use atomic_inc_not_zero/dec_if_positive • These are stopped automatically if counter == 0 29 1 2 1 0 0 0 atomic_dec_if_ positive ->do nothing atomic_inc_not_zero -> do nothing func call atomic_dec_if_positiveatomic_inc_not_zero func call atomic_inc Patching atomic_dec func call 0 atomic_dec_if_ positive ->do nothing 0 refcnt Do not hook function Entry/exit before patching
  • 30.
    © Hitachi, Ltd.2014. All rights reserved. Active Safeness check without stop_machine 2. Active safeness check at the context switch – To find threads sleeping(or going to sleep) on the functions – For all running processes, hook the context switch and check stack entries safely. – For the sleeping tasks, we can check it safely. 30
  • 31.
    © Hitachi, Ltd.2014. All rights reserved. Safeness Check without Stop_machine • 2 stages safeness checking 31 Patching Process stack check if task is not running rculock rcuunlock check over all threads List up Running threads Time Stage1: Check sleeping tasks Running process A Running process B wait on runqRun Run context switch wait on runq
  • 32.
    © Hitachi, Ltd.2014. All rights reserved. Safeness Check without Stop_machine • 2 stages safeness checking 32 Patching Process stack check if task is not running rculock rcuunlock check over all threads running pid running pid running pid List up Running threads hook stack check for current process Update Wait for all running task is checked Time … Stage1: Check sleeping tasks Stage2: Check running tasks Running process A Running process B wait on runqRun Run context switch wait on runq
  • 33.
    © Hitachi, Ltd.2014. All rights reserved. How to add refcnt and context switch hook? • To hook the function entry/return – Use kretprobe to hook it – For each function entry/return, inc/dec refcount • To hook the context switch – Use kprobe to hook it – Do safeness check (on stack) and update running pid list • Both are dynamic probe – After checking the safeness, all kretprobes/kprobes are r emoved from the target functions – We have minimal overhead 33
  • 34.
    © Hitachi, Ltd.2014. All rights reserved. Demo • Demonstrating kpatching with/without stop_machine – Using ftrace to trace stop_machine() 34 (Setup ftrace) # echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter # echo function_graph > /sys/kernel/debug/tracing/current_tracer (Run the kpatch) # kpatch load kpatch-test-patch.ko (Check the result) # echo 0 > /sys/kernel/debug/tracing/tracing_on # cat /sys/kernel/debug/tracing/trace (Setup ftrace) # echo stop_machine > /sys/kernel/debug/tracing/set_ftrace_filter # echo function_graph > /sys/kernel/debug/tracing/current_tracer (Run the kpatch) # kpatch load kpatch-test-patch.ko (Check the result) # echo 0 > /sys/kernel/debug/tracing/tracing_on # cat /sys/kernel/debug/tracing/trace
  • 35.
    © Hitachi, Ltd.2014. All rights reserved. Demo result • With stop_machine • Without stop_machine 35 cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) ! 6410.455 us | stop_machine(); cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) ! 6410.455 us | stop_machine(); cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | No stop_machine is executed
  • 36.
    © Hitachi, Ltd.2014. All rights reserved. Agenda: Kpatch Without Stop Machine Background Story What is kpatch? Live patching requirements Major updates and Minor updates Kpatch internal Kpatch Overview Kpatch and Safeness check Stop_machine Kpatch without stop_machine Live Patcing Rules Kpatch Reference Counter Safeness check without stop_machine Conclusion and Discussion 36
  • 37.
    © Hitachi, Ltd.2014. All rights reserved. Conclusion • Succeed to get rid of stop_machine() from kpatch – This is a proof of concept of stop_machine-free kpatch • This means kpatch CAN BE ready for mission critical systems • But still under discussion stage • Upstreaming could be a long way – At first, push current stop_machine-based kpatch to upstream – Stop_machine-free will be the next step 37
  • 38.
    © Hitachi, Ltd.2014. All rights reserved. Current Issues • Possible to miscount the function reference – Kretprobe has no error notification • Kretprobe can be failed to handle the function return because of return- address buffer shortage • Possible to fail patching with big patch – We have to monitor all the functions are safe in the patch – Big patch has many patched functions • Some of them can be always used in the system • Incremental patching could be better • Module unloading using stop_machine – This will happen if we replace old patch with new one – Incremental patching can avoid this. 38
  • 39.
    © Hitachi, Ltd.2014. All rights reserved. Future work • Use a generic return-hook mechanism – Kretprobe is for PoC, not for general use • It can’t detect the failure of hooking – Should be more safe (e.g. miss-hook handler) • Context switch hook can be more general – Tracepoint/traceevent makes it better. – This requires kpatch as embedded feature • Upstreaming 39
  • 40.
    © Hitachi, Ltd.2014. All rights reserved. Resources and Links • Get rid of the stop_machine from kpatch • https://github.com/dynup/kpatch/issues/138 • My no-stopmachine branch • https://github.com/mhiramathitachi/kpatch/tree/no-stop machine-v1 – This requires IPMODIFY flag patchset for kernel • http://thread.gmane.org/gmane.linux.kernel/1757201 40
  • 41.
    © Hitachi, Ltd.2014. All rights reserved. Questions?
  • 43.
    © Hitachi, Ltd.2014. All rights reserved. Trademarks 43 • Linux is a trademark of Linus Torvalds in the United States, other countries, or both. • Other company, product, or service names m ay be trademarks or service marks of others.