SlideShare a Scribd company logo
1 of 71
Download to read offline
uniprof: Transparent Unikernel
Performance Profiling & Debugging
Florian Schmidt, Research Scientist, NEC Europe Ltd.
2
Unikernels?
▌Faster, smaller, better!
3
Unikernels?
▌Faster, smaller, better!
▌But ever heard this? Unikernels are hard to debug.
Kernel debugging is horrible!
cliparts:clipproject.info
4
Unikernels?
▌Faster, smaller, better!
▌But ever heard this?
▌Then you might say
Unikernels are hard to debug.
Kernel debugging is horrible!
But that’s not really true!
Unikernels are a single linked binary.
They have a shared address space.
You can just use gdb!
cliparts:clipproject.info
5
Unikernels?
▌Faster, smaller, better!
▌But ever heard this?
▌Then you might say
▌And while that is true…
▌… we are admittedly lacking tools
Unikernels are hard to debug.
Kernel debugging is horrible!
But that’s not really true!
Unikernels are a single linked binary.
They have a shared address space.
You can just use gdb!
cliparts:clipproject.info
6
Unikernels?
▌Faster, smaller, better!
▌But ever heard this?
▌Then you might say
▌And while that is true…
▌… we are admittedly lacking tools
▌Such as effective profilers
Unikernels are hard to debug.
Kernel debugging is horrible!
But that’s not really true!
Unikernels are a single linked binary.
They have a shared address space.
You can just use gdb!
cliparts:clipproject.info
7
Enter uniprof
▌Goals:
Performance profiler
No changes to profiled code necessary
Minimal overhead
8
Enter uniprof
▌Goals:
Performance profiler
No changes to profiled code necessary
Minimal overhead
 Useful in production environments
9
Enter uniprof
▌Goals:
Performance profiler
No changes to profiled code necessary
Minimal overhead
 Useful in production environments
▌So, a stack profiler
Collect stack traces at regular intervals
call_main+0x278
main+0x1c
schedule+0x3a
monotonic_clock+0x1a
10
Enter uniprof
▌Goals:
Performance profiler
No changes to profiled code necessary
Minimal overhead
 Useful in production environments
▌So, a stack profiler
Collect stack traces at regular intervals
Many of them
call_main+0x278
main+0x1c
schedule+0x3a
monotonic_clock+0x1a
call_main+0x278
main+0x1c
netfront_rx+0xa
call_main+0x278
main+0x1c
netfront_rx+0xa
netfront_get_responses+0x1c
netfrontif_rx_handler+0x20
netfrontif_transmit+0x1a0
netfront_xmit_pbuf+0xa4
call_main+0x278
main+0x1c
blkfront_aio_poll+0x32
11
Enter uniprof
▌Goals:
Performance profiler
No changes to profiled code necessary
Minimal overhead
 Useful in production environments
▌So, a stack profiler
Collect stack traces at regular intervals
Many of them
Analyze which code paths show up often
• Either because they take a long time
• Or because they are hit often
Point towards potential bottlenecks
call_main+0x278
main+0x1c
schedule+0x3a
monotonic_clock+0x1a
call_main+0x278
main+0x1c
netfront_rx+0xa
call_main+0x278
main+0x1c
netfront_rx+0xa
netfront_get_responses+0x1c
netfrontif_rx_handler+0x20
netfrontif_transmit+0x1a0
netfront_xmit_pbuf+0xa4
call_main+0x278
main+0x1c
blkfront_aio_poll+0x32
12
xenctx
▌Turns out, a stack profiler for Xen already exists
Well, kinda
13
xenctx
▌Turns out, a stack profiler for Xen already exists
Well, kinda
▌xenctx is bundled with Xen
Introspection tool
Option to print call stack
$ xenctx -f -s <symbol table file> <DOMID>
[...]
Call Trace:
[<0000000000004868>] three+0x58 <--
00000000000ffea0: [<00000000000044f2>] two+0x52
00000000000ffef0: [<00000000000046a6>] one+0x12
00000000000fff40: [<000000000002ff66>]
00000000000fff80: [<0000000000012018>] call_main+0x278
14
xenctx
▌Turns out, a stack profiler for Xen already exists
Well, kinda
▌xenctx is bundled with Xen
Introspection tool
Option to print call stack
▌So if we run this over and over, we have a stack profiler
Well, kinda
$ xenctx -f -s <symbol table file> <DOMID>
[...]
Call Trace:
[<0000000000004868>] three+0x58 <--
00000000000ffea0: [<00000000000044f2>] two+0x52
00000000000ffef0: [<00000000000046a6>] one+0x12
00000000000fff40: [<000000000002ff66>]
00000000000fff80: [<0000000000012018>] call_main+0x278
15
xenctx
▌Downside: xenctx is slow
Very slow: 3ms+ per trace
Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)
Can’t really blame it, not designed as a fast stack profiler
16
xenctx
▌Downside: xenctx is slow
Very slow: 3ms+ per trace
Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)
Can’t really blame it, not designed as a fast stack profiler
▌Performance isn’t just a nice-to-have
We interrupt the guest all the time
Can’t walk stack while guest is running: race conditions
High overhead can influence results!
Low overhead is imperative for use on production unikernels
17
xenctx
▌Downside: xenctx is slow
Very slow: 3ms+ per trace
Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s)
Can’t really blame it, not designed as a fast stack profiler
▌Performance isn’t just a nice-to-have
We interrupt the guest all the time
Can’t walk stack while guest is running: race conditions
High overhead can influence results!
Low overhead is imperative for use on production unikernels
▌First question: extend xenctx or write something from scratch?
Spoiler: look at the talk title
More insight when I come to the evaluation
18
What do we need?
19
What do we need?
▌Registers (for FP, IP)
This is pretty easy: getvcpucontext() hypercall
20
What do we need?
▌Registers (for FP, IP)
This is pretty easy: getvcpucontext() hypercall
▌Access to stack memory (to read return addresses and next FPs)
This is the complicated step
We need to do address resolution
21
What do we need?
▌Registers (for FP, IP)
This is pretty easy: getvcpucontext() hypercall
▌Access to stack memory (to read return addresses and next FPs)
This is the complicated step
We need to do address resolution
• Memory introspection requires mapping memory over
• We’re looking at (uni)kernel code
• But there’s still a virtual  (guest) physical resolution
22
What do we need?
▌Registers (for FP, IP)
This is pretty easy: getvcpucontext() hypercall
▌Access to stack memory (to read return addresses and next FPs)
This is the complicated step
We need to do address resolution
• Memory introspection requires mapping memory over
• We’re looking at (uni)kernel code
• But there’s still a virtual  (guest) physical resolution
• We can’t use hardware support (PVH), because we’re looking in from outside
• So we need to manually walk page tables
23
What do we need?
▌Registers (for FP, IP)
This is pretty easy: getvcpucontext() hypercall
▌Access to stack memory (to read return addresses and next FPs)
This is the complicated step
We need to do address resolution
• Memory introspection requires mapping memory over
• We’re looking at (uni)kernel code
• But there’s still a virtual  (guest) physical resolution
• We can’t use hardware support (PVH), because we’re looking in from outside
• So we need to manually walk page tables
▌Symbol table (to resolve function names)
Thankfully, this is easy again: extract symbols from ELF with nm
24
Stack
Local variables
RegistersIP
FP
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
25
Stack
Local variables
RegistersIP
FP
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
26
Stack
Local variables
RegistersIP
FP
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xcaIP
27
Stack
Local variables
RegistersIP
FP
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xcaIP
28
Stack
Local variables
RegistersIP
FP
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xcaIP
29
Stack
Local variables
RegistersIP
FP
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xca
two +0xc1
IP
FP+1word
30
Stack
Local variables
RegistersIP
FP
function one() {
[…]
two();
[…]
}
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xca
two +0xc1
IP
FP+1word
31
Stack
Local variables
RegistersIP
FP
function one() {
[…]
two();
[…]
}
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xca
two +0xc1
one +0x0d
IP
FP+1word
*FP+1word
32
Stack
Local variables
RegistersIP
FP
function one() {
[…]
two();
[…]
}
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xca
two +0xc1
one +0x0d
IP
FP+1word
*FP+1word
33
Stack
Local variables
RegistersIP
FP
function one() {
[…]
two();
[…]
}
function two() {
[…]
three();
}
function three() {
[…]
}
…
… …
Frame pointer
NULL
Return address
Other registers
Local variables
Frame pointer
Return address
Other registers
Local variables
Stack trace:
three +0xca
two +0xc1
one +0x0d
[done]
IP
FP+1word
*FP+1word
**FP==NULL
34
Walking the page tables (x86-64)
CR3
virtual address
35
Walking the page tables (x86-64)
CR3
virtual address
36
Walking the page tables (x86-64)
CR3
virtual address
37
Walking the page tables (x86-64)
CR3
L1
virtual address
38
Walking the page tables (x86-64)
CR3
L1
virtual address
39
Walking the page tables (x86-64)
CR3
L1 L2
virtual address
40
Walking the page tables (x86-64)
CR3
L1 L2 L3
virtual address
41
Walking the page tables (x86-64)
CR3
L1 L2 L3 L4
virtual address
42
Walking the page tables (x86-64)
CR3
L1 L2 L3 L4
(guest) physical address
virtual address
43
Walking the page tables (x86-64)
CR3
L1 L2 L3 L4
(guest) physical address
▌So many maps:
5 per entry * stack depth
map! map! map! map!
map!
virtual address
44
Walking the page tables (x86-64)
CR3
L1 L2 L3 L4
(guest) physical address
▌So many maps:
5 per entry * stack depth
▌Then again, page table locations don’t change…
Neither do stack locations (exception: lots of thread spawning)
Effective caching
map! map! map! map!
map!
virtual address
45
Walking the page tables (x86-64)
CR3
L1 L2 L3 L4
(guest) physical address
▌So many maps:
5 per entry * stack depth
▌Then again, page table locations don’t change…
Neither do stack locations (exception: lots of thread spawning)
Effective caching
virtual address
map! map! map! map!
map!
46
Create Symbol Table
▌Stack only contains addresses
▌Symbol resolution necessary
47
Create Symbol Table
▌Stack only contains addresses
▌Symbol resolution necessary
▌Trivial
Virtual addresses mapped 1:1 into unikernel address space
nm is your friend $ nm -n <ELF> > symtab
$ head symtab
0000000000000000 T _start
0000000000000000 T _text
0000000000000008 a RSP_OFFSET
0000000000000017 t stack_start
00000000000000fc a KERNEL_CS_MASK
0000000000001000 t shared_info
0000000000002000 t hypercall_page
0000000000003000 t error_entry
000000000000304f t error_call_handler
0000000000003069 t hypervisor_callback
48
Create Symbol Table
▌Stack only contains addresses
▌Symbol resolution necessary
▌Trivial
Virtual addresses mapped 1:1 into unikernel address space
nm is your friend
▌Needs unstripped binary
You’re welcome to strip it afterwards
$ nm -n <ELF> > symtab
$ head symtab
0000000000000000 T _start
0000000000000000 T _text
0000000000000008 a RSP_OFFSET
0000000000000017 t stack_start
00000000000000fc a KERNEL_CS_MASK
0000000000001000 t shared_info
0000000000002000 t hypercall_page
0000000000003000 t error_entry
000000000000304f t error_call_handler
0000000000003069 t hypervisor_callback
49
What do we get?
50
What do we get? Flamegraphs!
▌ Y Axis: call trace
 Bottom: main function, each layer: one call depth
▌ X Axis: relative run time
 Call paths are aggregated, no same call path twice in graph
https://github.com/brendangregg/flamegraph
51
What do we get? Flamegraphs!
▌ Y Axis: call trace
 Bottom: main function, each layer: one call depth
▌ X Axis: relative run time
 Call paths are aggregated, no same call path twice in graph
▌ In this example: netfront functions “heavy hitters”
 netfront_xmit_pbuf
 netfront_rx
 but also blkfront_aio_poll
1
3 2
1
3 2
2
1
3
https://github.com/brendangregg/flamegraph
52
What do we get? Flamegraphs!
▌ Y Axis: call trace
 Bottom: main function, each layer: one call depth
▌ X Axis: relative run time
 Call paths are aggregated, no same call path twice in graph
▌ In this example: netfront functions “heavy hitters”
 netfront_xmit_pbuf
 netfront_rx
 but also blkfront_aio_poll
1
3 2
1
3 2
2
1
3
Yep, it’s a MiniOS*
doing network
communication
*with lwip for TCP/IP
https://github.com/brendangregg/flamegraph
53
Try 1: improving xenctx performance
▌xenctx translates and maps memory addresses every stack walk
 Huge overhead
 Solution: cache mapped memory and virtualmachine translations
54
Try 1: improving xenctx performance
▌xenctx translates and maps memory addresses every stack walk
 Huge overhead
 Solution: cache mapped memory and virtualmachine translations
▌xenctx resolves symbols via linear search
 Solution: use binary search
55
Try 1: improving xenctx performance
▌xenctx translates and maps memory addresses every stack walk
 Huge overhead
 Solution: cache mapped memory and virtualmachine translations
▌xenctx resolves symbols via linear search
 Solution: use binary search
 (Or, even better, do resolutions offline after tracing)
56
Try 1: improving xenctx performance
▌xenctx translates and maps memory addresses every stack walk
 Huge overhead
 Solution: cache mapped memory and virtualmachine translations
▌xenctx resolves symbols via linear search
 Solution: use binary search
 (Or, even better, do resolutions offline after tracing)
57
Try 1: improving xenctx performance
▌xenctx translates and maps memory addresses every stack walk
 Huge overhead
 Solution: cache mapped memory and virtualmachine translations
▌xenctx resolves symbols via linear search
 Solution: use binary search
 (Or, even better, do resolutions offline after tracing)
At this point, I abandoned xenctx and (re)wrote uniprof from scratch.
58
Try 2: uniprof
▌100-fold improvement is nice! But we can do better:
Xen 4.7 introduced low-level libraries (libxencall, libxenforeigmemory)
Another significant reduction by ~ factor of 3
59
Try 2: uniprof
▌100-fold improvement is nice! But we can do better:
Xen 4.7 introduced low-level libraries (libxencall, libxenforeigmemory)
Another significant reduction by ~ factor of 3
▌End result: overhead of ~0.1% @101 samples/s
60
Performance on ARM
▌uniprof supports ARM (xenctx doesn’t)
Main challenge: different page table design
61
Performance on ARM
▌uniprof supports ARM (xenctx doesn’t)
Main challenge: different page table design
▌ARM: much slower, overhead higher
But the CPU is much slower, too (Intel Xeon @3.7GHz vs. Cortex A20 @1GHz)
So fewer samples/s needed for same effective resolution
62
No Frame Pointer? No Problem!
▌Stack walking relies on frame pointer
Optimizations can reuse FP as general-purpose register (-fomit-frame-pointer)
63
No Frame Pointer? No Problem!
▌Stack walking relies on frame pointer
Optimizations can reuse FP as general-purpose register (-fomit-frame-pointer)
▌But we can do without FPs
Use stack unwinding information
• It’s already included if you use C++ (for exception handling)
• It doesn’t change performance
• Only binary size
DWARF standard
$ readelf –S <ELF>
There are 13 section headers, starting at offset 0x40d58:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[...]
[ 4] .eh_frame PROGBITS 0000000000035860 00036860
00000000000066f8 0000000000000000 A 0 0 8
[ 5] .eh_frame_hdr PROGBITS 000000000003bf58 0003cf58
000000000000128c 0000000000000000 A 0 0 4
[...]
64
Unwinding without Frame Pointers
▌How does it work?
65
Unwinding without Frame Pointers
▌How does it work?
66
Unwinding without Frame Pointers
▌How does it work?
▌Lookup table
For every program address
• The current frame size
• Locations of registers to restore (GP and IP)
67
Unwinding without Frame Pointers
▌How does it work?
▌Lookup table
For every program address
• The current frame size
• Locations of registers to restore (GP and IP)
Important for exception handling
• Exit functions immediately until handler is found
68
Unwinding without Frame Pointers
▌How does it work?
▌Lookup table
For every program address
• The current frame size
• Locations of registers to restore (GP and IP)
Important for exception handling
• Exit functions immediately until handler is found
▌Index to quickly find table entry
69
Unwinding without Frame Pointers
▌How does it work?
▌Lookup table
For every program address
• The current frame size
• Locations of registers to restore (GP and IP)
Important for exception handling
• Exit functions immediately until handler is found
▌Index to quickly find table entry
▌Several library implementations
uniprof uses libunwind
Actually, a libunwind patched for Xen guest introspection support
Might be useful for other tools?
70
Performance: uniprof w/ libunwind
▌Performance lower than with frame pointer
Reason: libunwind does more than we need (full register reconstruction etc.)
▌Different library or own implementation promising
But “good enough” for many cases
And a good area for future work
71
Enter the title.
Thank you!
Questions?
uniprof: https://github.com/cnplab/uniprof
libunwind-xen: https://github.com/cnplab/libunwind
FlameGraphs: https://github.com/brendangregg/flamegraph

More Related Content

What's hot

05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR mattersAlexandre Moneger
 
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingSaumil Shah
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W mattersAlexandre Moneger
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingJonathan Salwan
 
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...CODE BLUE
 
1300 david oswald id and ip theft with side-channel attacks
1300 david oswald   id and ip theft with side-channel attacks1300 david oswald   id and ip theft with side-channel attacks
1300 david oswald id and ip theft with side-channel attacksPositive Hack Days
 
ROP 輕鬆談
ROP 輕鬆談ROP 輕鬆談
ROP 輕鬆談hackstuff
 
Introduction to Debuggers
Introduction to DebuggersIntroduction to Debuggers
Introduction to DebuggersSaumil Shah
 
Design and implementation_of_shellcodes
Design and implementation_of_shellcodesDesign and implementation_of_shellcodes
Design and implementation_of_shellcodesAmr Ali
 
Linux Shellcode disassembling
Linux Shellcode disassemblingLinux Shellcode disassembling
Linux Shellcode disassemblingHarsh Daftary
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflowsjohseg
 
深入淺出C語言
深入淺出C語言深入淺出C語言
深入淺出C語言Simen Li
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例National Cheng Kung University
 
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]RootedCON
 
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]RootedCON
 
Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programminghybr1s
 
Power of linked list
Power of linked listPower of linked list
Power of linked listPeter Hlavaty
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)Alexandre Moneger
 

What's hot (20)

05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters05 - Bypassing DEP, or why ASLR matters
05 - Bypassing DEP, or why ASLR matters
 
Dive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented ProgrammingDive into ROP - a quick introduction to Return Oriented Programming
Dive into ROP - a quick introduction to Return Oriented Programming
 
07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters07 - Bypassing ASLR, or why X^W matters
07 - Bypassing ASLR, or why X^W matters
 
Course lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented ProgrammingCourse lecture - An introduction to the Return Oriented Programming
Course lecture - An introduction to the Return Oriented Programming
 
Unix executable buffer overflow
Unix executable buffer overflowUnix executable buffer overflow
Unix executable buffer overflow
 
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
 
1300 david oswald id and ip theft with side-channel attacks
1300 david oswald   id and ip theft with side-channel attacks1300 david oswald   id and ip theft with side-channel attacks
1300 david oswald id and ip theft with side-channel attacks
 
ROP 輕鬆談
ROP 輕鬆談ROP 輕鬆談
ROP 輕鬆談
 
Introduction to Debuggers
Introduction to DebuggersIntroduction to Debuggers
Introduction to Debuggers
 
Design and implementation_of_shellcodes
Design and implementation_of_shellcodesDesign and implementation_of_shellcodes
Design and implementation_of_shellcodes
 
Linux Shellcode disassembling
Linux Shellcode disassemblingLinux Shellcode disassembling
Linux Shellcode disassembling
 
Return oriented programming (ROP)
Return oriented programming (ROP)Return oriented programming (ROP)
Return oriented programming (ROP)
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflows
 
深入淺出C語言
深入淺出C語言深入淺出C語言
深入淺出C語言
 
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
 
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
Ilfak Guilfanov - Decompiler internals: Microcode [rooted2018]
 
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
 
Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programming
 
Power of linked list
Power of linked listPower of linked list
Power of linked list
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 

Similar to XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging - Florian Schmidt, NEC

What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...HostedbyConfluent
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Kernel TLV
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...DefconRussia
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHackito Ergo Sum
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyPriyanka Aash
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyPriyanka Aash
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROPSaumil Shah
 
iptables 101- bottom-up
iptables 101- bottom-upiptables 101- bottom-up
iptables 101- bottom-upHungWei Chiu
 
AllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW SecurityAllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW SecurityAllBits BVBA (freelancer)
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Anne Nicolas
 
XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64The Linux Foundation
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure TechExeter
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionGeorg Wicherski
 
Potapenko, vyukov forewarned is forearmed. a san and tsan
Potapenko, vyukov   forewarned is forearmed. a san and tsanPotapenko, vyukov   forewarned is forearmed. a san and tsan
Potapenko, vyukov forewarned is forearmed. a san and tsanDefconRussia
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videosCharles Lefebvre
 

Similar to XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging - Florian Schmidt, NEC (20)

What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
 
An introduction to ROP
An introduction to ROPAn introduction to ROP
An introduction to ROP
 
iptables 101- bottom-up
iptables 101- bottom-upiptables 101- bottom-up
iptables 101- bottom-up
 
opt-mem-trx
opt-mem-trxopt-mem-trx
opt-mem-trx
 
Exploitation Crash Course
Exploitation Crash CourseExploitation Crash Course
Exploitation Crash Course
 
AllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW SecurityAllBits presentation - Lower Level SW Security
AllBits presentation - Lower Level SW Security
 
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
Kernel Recipes 2018 - 10 years of automated evolution in the Linux kernel - J...
 
XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64XS Boston 2008 Paravirt Ops in Linux IA64
XS Boston 2008 Paravirt Ops in Linux IA64
 
The trials and tribulations of providing engineering infrastructure
 The trials and tribulations of providing engineering infrastructure  The trials and tribulations of providing engineering infrastructure
The trials and tribulations of providing engineering infrastructure
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode Detection
 
Potapenko, vyukov forewarned is forearmed. a san and tsan
Potapenko, vyukov   forewarned is forearmed. a san and tsanPotapenko, vyukov   forewarned is forearmed. a san and tsan
Potapenko, vyukov forewarned is forearmed. a san and tsan
 
Eusecwest
EusecwestEusecwest
Eusecwest
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videos
 

More from The Linux Foundation

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleThe Linux Foundation
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...The Linux Foundation
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...The Linux Foundation
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...The Linux Foundation
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather ReportThe Linux Foundation
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...The Linux Foundation
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxThe Linux Foundation
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...The Linux Foundation
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderThe Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...The Linux Foundation
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...The Linux Foundation
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixThe Linux Foundation
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdThe Linux Foundation
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...The Linux Foundation
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DThe Linux Foundation
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsThe Linux Foundation
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...The Linux Foundation
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...The Linux Foundation
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...The Linux Foundation
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEThe Linux Foundation
 

More from The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Recently uploaded

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

XPDDS17: uniprof: Transparent Unikernel Performance Profiling and Debugging - Florian Schmidt, NEC

  • 1. uniprof: Transparent Unikernel Performance Profiling & Debugging Florian Schmidt, Research Scientist, NEC Europe Ltd.
  • 3. 3 Unikernels? ▌Faster, smaller, better! ▌But ever heard this? Unikernels are hard to debug. Kernel debugging is horrible! cliparts:clipproject.info
  • 4. 4 Unikernels? ▌Faster, smaller, better! ▌But ever heard this? ▌Then you might say Unikernels are hard to debug. Kernel debugging is horrible! But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! cliparts:clipproject.info
  • 5. 5 Unikernels? ▌Faster, smaller, better! ▌But ever heard this? ▌Then you might say ▌And while that is true… ▌… we are admittedly lacking tools Unikernels are hard to debug. Kernel debugging is horrible! But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! cliparts:clipproject.info
  • 6. 6 Unikernels? ▌Faster, smaller, better! ▌But ever heard this? ▌Then you might say ▌And while that is true… ▌… we are admittedly lacking tools ▌Such as effective profilers Unikernels are hard to debug. Kernel debugging is horrible! But that’s not really true! Unikernels are a single linked binary. They have a shared address space. You can just use gdb! cliparts:clipproject.info
  • 7. 7 Enter uniprof ▌Goals: Performance profiler No changes to profiled code necessary Minimal overhead
  • 8. 8 Enter uniprof ▌Goals: Performance profiler No changes to profiled code necessary Minimal overhead  Useful in production environments
  • 9. 9 Enter uniprof ▌Goals: Performance profiler No changes to profiled code necessary Minimal overhead  Useful in production environments ▌So, a stack profiler Collect stack traces at regular intervals call_main+0x278 main+0x1c schedule+0x3a monotonic_clock+0x1a
  • 10. 10 Enter uniprof ▌Goals: Performance profiler No changes to profiled code necessary Minimal overhead  Useful in production environments ▌So, a stack profiler Collect stack traces at regular intervals Many of them call_main+0x278 main+0x1c schedule+0x3a monotonic_clock+0x1a call_main+0x278 main+0x1c netfront_rx+0xa call_main+0x278 main+0x1c netfront_rx+0xa netfront_get_responses+0x1c netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 netfront_xmit_pbuf+0xa4 call_main+0x278 main+0x1c blkfront_aio_poll+0x32
  • 11. 11 Enter uniprof ▌Goals: Performance profiler No changes to profiled code necessary Minimal overhead  Useful in production environments ▌So, a stack profiler Collect stack traces at regular intervals Many of them Analyze which code paths show up often • Either because they take a long time • Or because they are hit often Point towards potential bottlenecks call_main+0x278 main+0x1c schedule+0x3a monotonic_clock+0x1a call_main+0x278 main+0x1c netfront_rx+0xa call_main+0x278 main+0x1c netfront_rx+0xa netfront_get_responses+0x1c netfrontif_rx_handler+0x20 netfrontif_transmit+0x1a0 netfront_xmit_pbuf+0xa4 call_main+0x278 main+0x1c blkfront_aio_poll+0x32
  • 12. 12 xenctx ▌Turns out, a stack profiler for Xen already exists Well, kinda
  • 13. 13 xenctx ▌Turns out, a stack profiler for Xen already exists Well, kinda ▌xenctx is bundled with Xen Introspection tool Option to print call stack $ xenctx -f -s <symbol table file> <DOMID> [...] Call Trace: [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278
  • 14. 14 xenctx ▌Turns out, a stack profiler for Xen already exists Well, kinda ▌xenctx is bundled with Xen Introspection tool Option to print call stack ▌So if we run this over and over, we have a stack profiler Well, kinda $ xenctx -f -s <symbol table file> <DOMID> [...] Call Trace: [<0000000000004868>] three+0x58 <-- 00000000000ffea0: [<00000000000044f2>] two+0x52 00000000000ffef0: [<00000000000046a6>] one+0x12 00000000000fff40: [<000000000002ff66>] 00000000000fff80: [<0000000000012018>] call_main+0x278
  • 15. 15 xenctx ▌Downside: xenctx is slow Very slow: 3ms+ per trace Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s) Can’t really blame it, not designed as a fast stack profiler
  • 16. 16 xenctx ▌Downside: xenctx is slow Very slow: 3ms+ per trace Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s) Can’t really blame it, not designed as a fast stack profiler ▌Performance isn’t just a nice-to-have We interrupt the guest all the time Can’t walk stack while guest is running: race conditions High overhead can influence results! Low overhead is imperative for use on production unikernels
  • 17. 17 xenctx ▌Downside: xenctx is slow Very slow: 3ms+ per trace Doesn’t sound like much, but really adds up (e.g., 100 samples/s = 300ms/s) Can’t really blame it, not designed as a fast stack profiler ▌Performance isn’t just a nice-to-have We interrupt the guest all the time Can’t walk stack while guest is running: race conditions High overhead can influence results! Low overhead is imperative for use on production unikernels ▌First question: extend xenctx or write something from scratch? Spoiler: look at the talk title More insight when I come to the evaluation
  • 18. 18 What do we need?
  • 19. 19 What do we need? ▌Registers (for FP, IP) This is pretty easy: getvcpucontext() hypercall
  • 20. 20 What do we need? ▌Registers (for FP, IP) This is pretty easy: getvcpucontext() hypercall ▌Access to stack memory (to read return addresses and next FPs) This is the complicated step We need to do address resolution
  • 21. 21 What do we need? ▌Registers (for FP, IP) This is pretty easy: getvcpucontext() hypercall ▌Access to stack memory (to read return addresses and next FPs) This is the complicated step We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution
  • 22. 22 What do we need? ▌Registers (for FP, IP) This is pretty easy: getvcpucontext() hypercall ▌Access to stack memory (to read return addresses and next FPs) This is the complicated step We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • We can’t use hardware support (PVH), because we’re looking in from outside • So we need to manually walk page tables
  • 23. 23 What do we need? ▌Registers (for FP, IP) This is pretty easy: getvcpucontext() hypercall ▌Access to stack memory (to read return addresses and next FPs) This is the complicated step We need to do address resolution • Memory introspection requires mapping memory over • We’re looking at (uni)kernel code • But there’s still a virtual  (guest) physical resolution • We can’t use hardware support (PVH), because we’re looking in from outside • So we need to manually walk page tables ▌Symbol table (to resolve function names) Thankfully, this is easy again: extract symbols from ELF with nm
  • 24. 24 Stack Local variables RegistersIP FP … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace:
  • 25. 25 Stack Local variables RegistersIP FP function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace:
  • 26. 26 Stack Local variables RegistersIP FP function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xcaIP
  • 27. 27 Stack Local variables RegistersIP FP function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xcaIP
  • 28. 28 Stack Local variables RegistersIP FP function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xcaIP
  • 29. 29 Stack Local variables RegistersIP FP function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xca two +0xc1 IP FP+1word
  • 30. 30 Stack Local variables RegistersIP FP function one() { […] two(); […] } function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xca two +0xc1 IP FP+1word
  • 31. 31 Stack Local variables RegistersIP FP function one() { […] two(); […] } function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xca two +0xc1 one +0x0d IP FP+1word *FP+1word
  • 32. 32 Stack Local variables RegistersIP FP function one() { […] two(); […] } function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xca two +0xc1 one +0x0d IP FP+1word *FP+1word
  • 33. 33 Stack Local variables RegistersIP FP function one() { […] two(); […] } function two() { […] three(); } function three() { […] } … … … Frame pointer NULL Return address Other registers Local variables Frame pointer Return address Other registers Local variables Stack trace: three +0xca two +0xc1 one +0x0d [done] IP FP+1word *FP+1word **FP==NULL
  • 34. 34 Walking the page tables (x86-64) CR3 virtual address
  • 35. 35 Walking the page tables (x86-64) CR3 virtual address
  • 36. 36 Walking the page tables (x86-64) CR3 virtual address
  • 37. 37 Walking the page tables (x86-64) CR3 L1 virtual address
  • 38. 38 Walking the page tables (x86-64) CR3 L1 virtual address
  • 39. 39 Walking the page tables (x86-64) CR3 L1 L2 virtual address
  • 40. 40 Walking the page tables (x86-64) CR3 L1 L2 L3 virtual address
  • 41. 41 Walking the page tables (x86-64) CR3 L1 L2 L3 L4 virtual address
  • 42. 42 Walking the page tables (x86-64) CR3 L1 L2 L3 L4 (guest) physical address virtual address
  • 43. 43 Walking the page tables (x86-64) CR3 L1 L2 L3 L4 (guest) physical address ▌So many maps: 5 per entry * stack depth map! map! map! map! map! virtual address
  • 44. 44 Walking the page tables (x86-64) CR3 L1 L2 L3 L4 (guest) physical address ▌So many maps: 5 per entry * stack depth ▌Then again, page table locations don’t change… Neither do stack locations (exception: lots of thread spawning) Effective caching map! map! map! map! map! virtual address
  • 45. 45 Walking the page tables (x86-64) CR3 L1 L2 L3 L4 (guest) physical address ▌So many maps: 5 per entry * stack depth ▌Then again, page table locations don’t change… Neither do stack locations (exception: lots of thread spawning) Effective caching virtual address map! map! map! map! map!
  • 46. 46 Create Symbol Table ▌Stack only contains addresses ▌Symbol resolution necessary
  • 47. 47 Create Symbol Table ▌Stack only contains addresses ▌Symbol resolution necessary ▌Trivial Virtual addresses mapped 1:1 into unikernel address space nm is your friend $ nm -n <ELF> > symtab $ head symtab 0000000000000000 T _start 0000000000000000 T _text 0000000000000008 a RSP_OFFSET 0000000000000017 t stack_start 00000000000000fc a KERNEL_CS_MASK 0000000000001000 t shared_info 0000000000002000 t hypercall_page 0000000000003000 t error_entry 000000000000304f t error_call_handler 0000000000003069 t hypervisor_callback
  • 48. 48 Create Symbol Table ▌Stack only contains addresses ▌Symbol resolution necessary ▌Trivial Virtual addresses mapped 1:1 into unikernel address space nm is your friend ▌Needs unstripped binary You’re welcome to strip it afterwards $ nm -n <ELF> > symtab $ head symtab 0000000000000000 T _start 0000000000000000 T _text 0000000000000008 a RSP_OFFSET 0000000000000017 t stack_start 00000000000000fc a KERNEL_CS_MASK 0000000000001000 t shared_info 0000000000002000 t hypercall_page 0000000000003000 t error_entry 000000000000304f t error_call_handler 0000000000003069 t hypervisor_callback
  • 50. 50 What do we get? Flamegraphs! ▌ Y Axis: call trace  Bottom: main function, each layer: one call depth ▌ X Axis: relative run time  Call paths are aggregated, no same call path twice in graph https://github.com/brendangregg/flamegraph
  • 51. 51 What do we get? Flamegraphs! ▌ Y Axis: call trace  Bottom: main function, each layer: one call depth ▌ X Axis: relative run time  Call paths are aggregated, no same call path twice in graph ▌ In this example: netfront functions “heavy hitters”  netfront_xmit_pbuf  netfront_rx  but also blkfront_aio_poll 1 3 2 1 3 2 2 1 3 https://github.com/brendangregg/flamegraph
  • 52. 52 What do we get? Flamegraphs! ▌ Y Axis: call trace  Bottom: main function, each layer: one call depth ▌ X Axis: relative run time  Call paths are aggregated, no same call path twice in graph ▌ In this example: netfront functions “heavy hitters”  netfront_xmit_pbuf  netfront_rx  but also blkfront_aio_poll 1 3 2 1 3 2 2 1 3 Yep, it’s a MiniOS* doing network communication *with lwip for TCP/IP https://github.com/brendangregg/flamegraph
  • 53. 53 Try 1: improving xenctx performance ▌xenctx translates and maps memory addresses every stack walk  Huge overhead  Solution: cache mapped memory and virtualmachine translations
  • 54. 54 Try 1: improving xenctx performance ▌xenctx translates and maps memory addresses every stack walk  Huge overhead  Solution: cache mapped memory and virtualmachine translations ▌xenctx resolves symbols via linear search  Solution: use binary search
  • 55. 55 Try 1: improving xenctx performance ▌xenctx translates and maps memory addresses every stack walk  Huge overhead  Solution: cache mapped memory and virtualmachine translations ▌xenctx resolves symbols via linear search  Solution: use binary search  (Or, even better, do resolutions offline after tracing)
  • 56. 56 Try 1: improving xenctx performance ▌xenctx translates and maps memory addresses every stack walk  Huge overhead  Solution: cache mapped memory and virtualmachine translations ▌xenctx resolves symbols via linear search  Solution: use binary search  (Or, even better, do resolutions offline after tracing)
  • 57. 57 Try 1: improving xenctx performance ▌xenctx translates and maps memory addresses every stack walk  Huge overhead  Solution: cache mapped memory and virtualmachine translations ▌xenctx resolves symbols via linear search  Solution: use binary search  (Or, even better, do resolutions offline after tracing) At this point, I abandoned xenctx and (re)wrote uniprof from scratch.
  • 58. 58 Try 2: uniprof ▌100-fold improvement is nice! But we can do better: Xen 4.7 introduced low-level libraries (libxencall, libxenforeigmemory) Another significant reduction by ~ factor of 3
  • 59. 59 Try 2: uniprof ▌100-fold improvement is nice! But we can do better: Xen 4.7 introduced low-level libraries (libxencall, libxenforeigmemory) Another significant reduction by ~ factor of 3 ▌End result: overhead of ~0.1% @101 samples/s
  • 60. 60 Performance on ARM ▌uniprof supports ARM (xenctx doesn’t) Main challenge: different page table design
  • 61. 61 Performance on ARM ▌uniprof supports ARM (xenctx doesn’t) Main challenge: different page table design ▌ARM: much slower, overhead higher But the CPU is much slower, too (Intel Xeon @3.7GHz vs. Cortex A20 @1GHz) So fewer samples/s needed for same effective resolution
  • 62. 62 No Frame Pointer? No Problem! ▌Stack walking relies on frame pointer Optimizations can reuse FP as general-purpose register (-fomit-frame-pointer)
  • 63. 63 No Frame Pointer? No Problem! ▌Stack walking relies on frame pointer Optimizations can reuse FP as general-purpose register (-fomit-frame-pointer) ▌But we can do without FPs Use stack unwinding information • It’s already included if you use C++ (for exception handling) • It doesn’t change performance • Only binary size DWARF standard $ readelf –S <ELF> There are 13 section headers, starting at offset 0x40d58: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [...] [ 4] .eh_frame PROGBITS 0000000000035860 00036860 00000000000066f8 0000000000000000 A 0 0 8 [ 5] .eh_frame_hdr PROGBITS 000000000003bf58 0003cf58 000000000000128c 0000000000000000 A 0 0 4 [...]
  • 64. 64 Unwinding without Frame Pointers ▌How does it work?
  • 65. 65 Unwinding without Frame Pointers ▌How does it work?
  • 66. 66 Unwinding without Frame Pointers ▌How does it work? ▌Lookup table For every program address • The current frame size • Locations of registers to restore (GP and IP)
  • 67. 67 Unwinding without Frame Pointers ▌How does it work? ▌Lookup table For every program address • The current frame size • Locations of registers to restore (GP and IP) Important for exception handling • Exit functions immediately until handler is found
  • 68. 68 Unwinding without Frame Pointers ▌How does it work? ▌Lookup table For every program address • The current frame size • Locations of registers to restore (GP and IP) Important for exception handling • Exit functions immediately until handler is found ▌Index to quickly find table entry
  • 69. 69 Unwinding without Frame Pointers ▌How does it work? ▌Lookup table For every program address • The current frame size • Locations of registers to restore (GP and IP) Important for exception handling • Exit functions immediately until handler is found ▌Index to quickly find table entry ▌Several library implementations uniprof uses libunwind Actually, a libunwind patched for Xen guest introspection support Might be useful for other tools?
  • 70. 70 Performance: uniprof w/ libunwind ▌Performance lower than with frame pointer Reason: libunwind does more than we need (full register reconstruction etc.) ▌Different library or own implementation promising But “good enough” for many cases And a good area for future work
  • 71. 71 Enter the title. Thank you! Questions? uniprof: https://github.com/cnplab/uniprof libunwind-xen: https://github.com/cnplab/libunwind FlameGraphs: https://github.com/brendangregg/flamegraph