SlideShare a Scribd company logo
1 of 37
Download to read offline
ProfilingyourApplications
usingtheLinuxPerfTools
/Kevin Funk KDAB
emBO++ Conf 2017, Bochum
Agenda
Perf setup
Benchmarking
Profiling
More Topics?
Intermission:WhoamI?
Software Engineering Consultant at KDAB since 2010
FOSS enthusiast working on Qt/C++ at KDE since 2006
Lead developer of the KDevelop IDE
mainly on the C/C++ support backed by Clang
as well as cross-platform support
Setup
Hardware
Linux Kernel prerequisites
Building user-space perf
Cross-compiling
Permissions
Hardware
Hardware performance counters
Working PMU
LinuxKernelPrerequisites
$ uname -r # should be at least 3.7
4.7.1-1-ARCH
$ zgrep PERF /proc/config.gz
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_PERF_REGS=y
...
BuildingUser-spaceperf
git clone https://github.com/torvalds/linux.git
cd linux/tools/perf
export CC=gcc # clang is not supported
make
Dependencies
Auto-detecting system features:
... dwarf: [ on ] # for symbol resolution
... dwarf_getlocations: [ on ] # for symbol resolution
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ] # for syscall tracing
... libbfd: [ on ] # for symbol resolution
... libelf: [ on ] # for symbol resolution
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ] # for perl bindings
... libpython: [ on ] # for python bindings
... libslang: [ on ] # for TUI
... libcrypto: [ on ] # for JITed probe points
... libunwind: [ on ] # for unwinding
... libdw-dwarf-unwind: [ on ] # for unwinding
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
Cross-compiling
make prefix=somepath ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
Common pitfalls:
CC must not contain any flags
CFLAGS is ignored, use EXTRA_CFLAGS
prefix path ignored for include and library paths
Dependency issues:
linux/tools/build/feature/test-$FEATURE.make.output
Permissions
#!/bin/bash
sudo mount -o remount,mode=755 /sys/kernel/debug
sudo mount -o remount,mode=755 /sys/kernel/debug/tracing
echo "0" | sudo tee /proc/sys/kernel/kptr_restrict
echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid
sudo chown root:tracing /sys/kernel/debug/tracing/uprobe_events
sudo chmod g+rw /sys/kernel/debug/tracing/uprobe_events
Benchmarking
Be scientific!
Take variance into account
Compare before/after measurements
perf stat
$ perf stat -r 5 -o baseline.txt -- ./ex_branches
$ cat baseline.txt
Performance counter stats for './ex_branches' (5 runs):
807.951072 task-clock:u (msec) # 0.999 CPUs utilized ( +- 1.97% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
520 page-faults:u # 0.643 K/sec ( +- 0.15% )
2,487,366,239 cycles:u # 3.079 GHz ( +- 1.97% )
1,484,737,283 instructions:u # 0.60 insn per cycle ( +- 0.00% )
329,602,843 branches:u # 407.949 M/sec ( +- 0.00% )
80,476,858 branch-misses:u # 24.42% of all branches ( +- 0.06% )
0.808952447 seconds time elapsed ( +- 1.97% )
Kernelvs.Userspace
Use event modifiers to separate domains:
$ perf stat -r 5 --event=cycles:{k,u} -- ./ex_qdatetime
Performance counter stats for './ex_qdatetime' (5 runs):
13,337,722 cycles:k ( +- 3.82% )
9,745,474 cycles:u ( +- 1.58% )
0.008018321 seconds time elapsed ( +- 4.02% )
See man perf list for more.
perf list
$ perf list
List of pre-defined events (to be used in -e):
branch-misses [Hardware event]
cache-misses [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
...
alignment-faults [Software event]
context-switches OR cs [Software event]
page-faults OR faults [Software event]
...
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
...
syscalls:sys_enter_futex [Tracepoint event]
syscalls:sys_exit_futex [Tracepoint event]
...
Profiling
CPU profiling
Sleep-time profiling
perf top
System-wide live profiling:
$ perf top
Samples: 12K of event 'cycles:ppp', Event count (approx.): 5456372201
Overhead Shared Object Symbol
13.11% libQt5Core.so.5.7.0 [.] QHashData::nextNode
5.08% libQt5Core.so.5.7.0 [.] operator==
2.90% libQt5Core.so.5.7.0 [.] 0x000000000012f0d1
2.33% libQt5DBus.so.5.7.0 [.] 0x000000000002281f
1.62% libQt5DBus.so.5.7.0 [.] 0x0000000000022810
...
StatisticalProfiling
Sampling the call stack is crucial!
UnwindingandCallStacks
frame pointers (fp)
debug information (dwarf)
Last Branch Record (lbr)
Recommendation
On embedded: enable frame pointers
On the desktop: rely on DWARF
On Intel: play with LBR
perf record
Profile new application and its children:
$ perf record --call-graph dwarf -- ./lab_mandelbrot -b 5
[ perf record: Woken up 256 times to write data ]
[ perf record: Captured and wrote 64.174 MB perf.data (7963 samples) ]
perf record
Attach to running process:
$ perf record --call-graph dwarf --pid $(pidof ...)
# wait for some time, then quit with CTRL + C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.904 MB perf.data (70 samples) ]
perf record
Profile whole system for some time:
$ perf record -a -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.498 MB perf.data (2731 samples) ]
perf report
perf report
Top-down inclusive cost report:
$ perf report
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Children Self Command Shared Object Symbol
- 93.67% 31.76% lab_mandelbrot lab_mandelbrot [.] main
- 72.22% main
+ 28.42% hypot
__hypot_finite
19.87% __muldc3
3.45% __muldc3@plt
2.19% cabs@plt
+ 1.85% QColor::rgb
1.61% QImage::width@plt
1.26% QImage::height@plt
0.97% QColor::fromHsvF
+ 0.90% QApplicationPrivate::init
0.66% QImage::setPixel
+ 21.44% _start
+ 83.34% 0.00% lab_mandelbrot libc-2.24.so [.] __libc_start_main
+ 83.33% 0.00% lab_mandelbrot lab_mandelbrot [.] _start
...
perf report
Bottom-up self cost report:
$ perf report --no-children
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Command Shared Object Symbol
- 31.76% lab_mandelbrot lab_mandelbrot [.] main
- main
- __libc_start_main
_start
- 23.31% lab_mandelbrot libm-2.24.so [.] __hypot_finite
- __hypot_finite
- 22.56% hypot
main
__libc_start_main
_start
- 23.04% lab_mandelbrot libgcc_s.so.1 [.] __muldc3
- __muldc3
+ main
- 5.90% lab_mandelbrot libm-2.24.so [.] hypot
+ hypot
...
perf report
Show file and line numbers:
$ perf report --no-children -s dso,sym,srcline
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Shared Object Symbol Source:Line
- 7.82% lab_mandelbrot [.] main mandelbrot.h:41
+ main
- 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945
__muldc3
main
__libc_start_main
_start
- 7.46% lab_mandelbrot [.] main complex:1326
- main
+ __libc_start_main
- 6.94% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1944
__muldc3
main
__libc_start_main
_start
...
perf report
Show file and line numbers in backtraces:
$ perf report --no-children -s dso,sym,srcline -g address
Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769
Overhead Shared Object Symbol Source:Line
- 7.82% lab_mandelbrot [.] main mandelbrot.h:41
- 2.84% main mandelbrot.h:41
__libc_start_main +241
_start +4194346
2.58% main mandelbrot.h:41
- 2.01% main mandelbrot.h:41
__libc_start_main +241
_start +4194346
- 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945
+ 3.93% __muldc3 libgcc2.c:1945
+ 3.72% __muldc3 libgcc2.c:1945
- 7.46% lab_mandelbrot [.] main complex:1326
- 4.65% main complex:1326
__libc_start_main +241
_start +4194346
2.81% main complex:1326
...
perf config
Configure default output format:
[report]
children = false
sort_order = dso,sym,srcline
[call-graph]
record-mode = dwarf
print-type = graph
order = caller
sort-key = address
man perf config
FlameGraphs
perf script report stackcollapse | flamegraph.pl > graph.svg
Flame Graph Search
_start
__hypot_finite
__muldc3
__libc_start_main
hypot
main
c..
lab_mandelbrot
Moretopics?
Cross-machineReporting
When recording machine has symbols available:
# on first machine:
$ perf record ...
$ perf archive
Now please run:
$ tar xvf perf.data.tar.bz2 -C ~/.debug
wherever you need to run 'perf report' on.
# on second machine:
$ rsync machine1:path/to/perf.data{,tar.bz2} .
$ tar xf perf.data.tar.bz2 -C ~/.debug
$ perf report
Cross-machineReporting
When reporting machine has symbols available:
# on first machine:
$ perf record ...
# on second machine:
$ rsync machine1:path/to/perf.data .
$ perf report --symfs /path/to/sysroot
Sleep-timeProfiling
#!/bin/bash
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record 
--event sched:sched_stat_sleep/call-graph=fp/ 
--event sched:sched_process_exit/call-graph=fp/ 
--event sched:sched_switch/call-graph=dwarf/ 
--output perf.data.raw $@
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
Sleep-timeProfiling
$ perf-sleep-record ./ex_sleep
$ perf report
Samples: 24 of event 'sched:sched_switch', Event count (approx.): 8883195296
Overhead Trace output
- 100.00% ex_sleep:24938 [120] S ==> swapper/7:0 [120]
- 90.07% main main.cpp:10
QThread::sleep +11
0x1521ed
__nanosleep .:0
entry_SYSCALL_64_fastpath entry_64.o:0
sys_nanosleep +18446744071576748154
hrtimer_nanosleep +18446744071576748225
do_nanosleep hrtimer.c:0
schedule +18446744071576748092
__schedule core.c:0
+ 9.02% main main.cpp:11
+ 0.91% main main.cpp:6
perf script
Convert perf.data to callgrind format:
$ perf record --call-graph dwarf ...
$ perf script report callgrind > perf.callgrind
$ kcachegrind perf.callgrind
github.com/milianw/linux/.../callgrind.py
perf script
Convert perf.data to callgrind format:
Questions?
kevin.funk@kdab.com
https://www.kdab.com/
We offer trainings and workshops!Debugging and Profiling
More perf work from my colleague:
github.com/milianw/linux/tree/milian/perf
git clone -b milian/perf https://github.com/milianw/linux.git

More Related Content

What's hot

Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceSUSE Labs Taipei
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance AnalysisBrendan Gregg
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux InstrumentationDarkStarSword
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisBuland Singh
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 

What's hot (20)

Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Linux Instrumentation
Linux InstrumentationLinux Instrumentation
Linux Instrumentation
 
Embedded linux network device driver development
Embedded linux network device driver developmentEmbedded linux network device driver development
Embedded linux network device driver development
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 

Viewers also liked

A possible future of resource constrained software development
A possible future of resource constrained software developmentA possible future of resource constrained software development
A possible future of resource constrained software developmentemBO_Conference
 
Programming at Compile Time
Programming at Compile TimeProgramming at Compile Time
Programming at Compile TimeemBO_Conference
 
standardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygenstandardese - a WIP next-gen Doxygen
standardese - a WIP next-gen DoxygenemBO_Conference
 
'Embedding' a meta state machine
'Embedding' a meta state machine'Embedding' a meta state machine
'Embedding' a meta state machineemBO_Conference
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsemBO_Conference
 
Data-driven HAL generation
Data-driven HAL generationData-driven HAL generation
Data-driven HAL generationemBO_Conference
 

Viewers also liked (6)

A possible future of resource constrained software development
A possible future of resource constrained software developmentA possible future of resource constrained software development
A possible future of resource constrained software development
 
Programming at Compile Time
Programming at Compile TimeProgramming at Compile Time
Programming at Compile Time
 
standardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygenstandardese - a WIP next-gen Doxygen
standardese - a WIP next-gen Doxygen
 
'Embedding' a meta state machine
'Embedding' a meta state machine'Embedding' a meta state machine
'Embedding' a meta state machine
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 
Data-driven HAL generation
Data-driven HAL generationData-driven HAL generation
Data-driven HAL generation
 

Similar to Profiling your Applications using the Linux Perf Tools

Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018Fabio Janiszevski
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018Brendan Gregg
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenLex Yu
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionYunong Xiao
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightLinaro
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and toolszhang hua
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafkaDori Waldman
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka Dori Waldman
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Ontico
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 

Similar to Profiling your Applications using the Linux Perf Tools (20)

Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Deep learning - the conf br 2018
Deep learning - the conf br 2018Deep learning - the conf br 2018
Deep learning - the conf br 2018
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
ATO Linux Performance 2018
ATO Linux Performance 2018ATO Linux Performance 2018
ATO Linux Performance 2018
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Crash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_TizenCrash_Report_Mechanism_In_Tizen
Crash_Report_Mechanism_In_Tizen
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In Production
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
Swift profiling middleware and tools
Swift profiling middleware and toolsSwift profiling middleware and tools
Swift profiling middleware and tools
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Spark streaming with kafka
Spark streaming with kafkaSpark streaming with kafka
Spark streaming with kafka
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Profiling your Applications using the Linux Perf Tools

  • 3. Intermission:WhoamI? Software Engineering Consultant at KDAB since 2010 FOSS enthusiast working on Qt/C++ at KDE since 2006 Lead developer of the KDevelop IDE mainly on the C/C++ support backed by Clang as well as cross-platform support
  • 4. Setup Hardware Linux Kernel prerequisites Building user-space perf Cross-compiling Permissions
  • 6. LinuxKernelPrerequisites $ uname -r # should be at least 3.7 4.7.1-1-ARCH $ zgrep PERF /proc/config.gz CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_USER_STACK_DUMP=y CONFIG_HAVE_PERF_REGS=y ...
  • 7. BuildingUser-spaceperf git clone https://github.com/torvalds/linux.git cd linux/tools/perf export CC=gcc # clang is not supported make
  • 8. Dependencies Auto-detecting system features: ... dwarf: [ on ] # for symbol resolution ... dwarf_getlocations: [ on ] # for symbol resolution ... glibc: [ on ] ... gtk2: [ on ] ... libaudit: [ on ] # for syscall tracing ... libbfd: [ on ] # for symbol resolution ... libelf: [ on ] # for symbol resolution ... libnuma: [ on ] ... numa_num_possible_cpus: [ on ] ... libperl: [ on ] # for perl bindings ... libpython: [ on ] # for python bindings ... libslang: [ on ] # for TUI ... libcrypto: [ on ] # for JITed probe points ... libunwind: [ on ] # for unwinding ... libdw-dwarf-unwind: [ on ] # for unwinding ... zlib: [ on ] ... lzma: [ on ] ... get_cpuid: [ on ] ... bpf: [ on ]
  • 9. Cross-compiling make prefix=somepath ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- Common pitfalls: CC must not contain any flags CFLAGS is ignored, use EXTRA_CFLAGS prefix path ignored for include and library paths Dependency issues: linux/tools/build/feature/test-$FEATURE.make.output
  • 10. Permissions #!/bin/bash sudo mount -o remount,mode=755 /sys/kernel/debug sudo mount -o remount,mode=755 /sys/kernel/debug/tracing echo "0" | sudo tee /proc/sys/kernel/kptr_restrict echo "-1" | sudo tee /proc/sys/kernel/perf_event_paranoid sudo chown root:tracing /sys/kernel/debug/tracing/uprobe_events sudo chmod g+rw /sys/kernel/debug/tracing/uprobe_events
  • 11. Benchmarking Be scientific! Take variance into account Compare before/after measurements
  • 12. perf stat $ perf stat -r 5 -o baseline.txt -- ./ex_branches $ cat baseline.txt Performance counter stats for './ex_branches' (5 runs): 807.951072 task-clock:u (msec) # 0.999 CPUs utilized ( +- 1.97% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 520 page-faults:u # 0.643 K/sec ( +- 0.15% ) 2,487,366,239 cycles:u # 3.079 GHz ( +- 1.97% ) 1,484,737,283 instructions:u # 0.60 insn per cycle ( +- 0.00% ) 329,602,843 branches:u # 407.949 M/sec ( +- 0.00% ) 80,476,858 branch-misses:u # 24.42% of all branches ( +- 0.06% ) 0.808952447 seconds time elapsed ( +- 1.97% )
  • 13. Kernelvs.Userspace Use event modifiers to separate domains: $ perf stat -r 5 --event=cycles:{k,u} -- ./ex_qdatetime Performance counter stats for './ex_qdatetime' (5 runs): 13,337,722 cycles:k ( +- 3.82% ) 9,745,474 cycles:u ( +- 1.58% ) 0.008018321 seconds time elapsed ( +- 4.02% ) See man perf list for more.
  • 14. perf list $ perf list List of pre-defined events (to be used in -e): branch-misses [Hardware event] cache-misses [Hardware event] cpu-cycles OR cycles [Hardware event] instructions [Hardware event] ref-cycles [Hardware event] ... alignment-faults [Software event] context-switches OR cs [Software event] page-faults OR faults [Software event] ... sched:sched_stat_sleep [Tracepoint event] sched:sched_stat_iowait [Tracepoint event] sched:sched_stat_runtime [Tracepoint event] ... syscalls:sys_enter_futex [Tracepoint event] syscalls:sys_exit_futex [Tracepoint event] ...
  • 16. perf top System-wide live profiling: $ perf top Samples: 12K of event 'cycles:ppp', Event count (approx.): 5456372201 Overhead Shared Object Symbol 13.11% libQt5Core.so.5.7.0 [.] QHashData::nextNode 5.08% libQt5Core.so.5.7.0 [.] operator== 2.90% libQt5Core.so.5.7.0 [.] 0x000000000012f0d1 2.33% libQt5DBus.so.5.7.0 [.] 0x000000000002281f 1.62% libQt5DBus.so.5.7.0 [.] 0x0000000000022810 ...
  • 18. UnwindingandCallStacks frame pointers (fp) debug information (dwarf) Last Branch Record (lbr)
  • 19. Recommendation On embedded: enable frame pointers On the desktop: rely on DWARF On Intel: play with LBR
  • 20. perf record Profile new application and its children: $ perf record --call-graph dwarf -- ./lab_mandelbrot -b 5 [ perf record: Woken up 256 times to write data ] [ perf record: Captured and wrote 64.174 MB perf.data (7963 samples) ]
  • 21. perf record Attach to running process: $ perf record --call-graph dwarf --pid $(pidof ...) # wait for some time, then quit with CTRL + C [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.904 MB perf.data (70 samples) ]
  • 22. perf record Profile whole system for some time: $ perf record -a -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.498 MB perf.data (2731 samples) ]
  • 24. perf report Top-down inclusive cost report: $ perf report Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Children Self Command Shared Object Symbol - 93.67% 31.76% lab_mandelbrot lab_mandelbrot [.] main - 72.22% main + 28.42% hypot __hypot_finite 19.87% __muldc3 3.45% __muldc3@plt 2.19% cabs@plt + 1.85% QColor::rgb 1.61% QImage::width@plt 1.26% QImage::height@plt 0.97% QColor::fromHsvF + 0.90% QApplicationPrivate::init 0.66% QImage::setPixel + 21.44% _start + 83.34% 0.00% lab_mandelbrot libc-2.24.so [.] __libc_start_main + 83.33% 0.00% lab_mandelbrot lab_mandelbrot [.] _start ...
  • 25. perf report Bottom-up self cost report: $ perf report --no-children Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Command Shared Object Symbol - 31.76% lab_mandelbrot lab_mandelbrot [.] main - main - __libc_start_main _start - 23.31% lab_mandelbrot libm-2.24.so [.] __hypot_finite - __hypot_finite - 22.56% hypot main __libc_start_main _start - 23.04% lab_mandelbrot libgcc_s.so.1 [.] __muldc3 - __muldc3 + main - 5.90% lab_mandelbrot libm-2.24.so [.] hypot + hypot ...
  • 26. perf report Show file and line numbers: $ perf report --no-children -s dso,sym,srcline Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Shared Object Symbol Source:Line - 7.82% lab_mandelbrot [.] main mandelbrot.h:41 + main - 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945 __muldc3 main __libc_start_main _start - 7.46% lab_mandelbrot [.] main complex:1326 - main + __libc_start_main - 6.94% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1944 __muldc3 main __libc_start_main _start ...
  • 27. perf report Show file and line numbers in backtraces: $ perf report --no-children -s dso,sym,srcline -g address Samples: 8K of event 'cycles:ppp', Event count (approx.): 8164367769 Overhead Shared Object Symbol Source:Line - 7.82% lab_mandelbrot [.] main mandelbrot.h:41 - 2.84% main mandelbrot.h:41 __libc_start_main +241 _start +4194346 2.58% main mandelbrot.h:41 - 2.01% main mandelbrot.h:41 __libc_start_main +241 _start +4194346 - 7.79% libgcc_s.so.1 [.] __muldc3 libgcc2.c:1945 + 3.93% __muldc3 libgcc2.c:1945 + 3.72% __muldc3 libgcc2.c:1945 - 7.46% lab_mandelbrot [.] main complex:1326 - 4.65% main complex:1326 __libc_start_main +241 _start +4194346 2.81% main complex:1326 ...
  • 28. perf config Configure default output format: [report] children = false sort_order = dso,sym,srcline [call-graph] record-mode = dwarf print-type = graph order = caller sort-key = address man perf config
  • 29. FlameGraphs perf script report stackcollapse | flamegraph.pl > graph.svg Flame Graph Search _start __hypot_finite __muldc3 __libc_start_main hypot main c.. lab_mandelbrot
  • 31. Cross-machineReporting When recording machine has symbols available: # on first machine: $ perf record ... $ perf archive Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on. # on second machine: $ rsync machine1:path/to/perf.data{,tar.bz2} . $ tar xf perf.data.tar.bz2 -C ~/.debug $ perf report
  • 32. Cross-machineReporting When reporting machine has symbols available: # on first machine: $ perf record ... # on second machine: $ rsync machine1:path/to/perf.data . $ perf report --symfs /path/to/sysroot
  • 33. Sleep-timeProfiling #!/bin/bash echo 1 | sudo tee /proc/sys/kernel/sched_schedstats perf record --event sched:sched_stat_sleep/call-graph=fp/ --event sched:sched_process_exit/call-graph=fp/ --event sched:sched_switch/call-graph=dwarf/ --output perf.data.raw $@ echo 0 | sudo tee /proc/sys/kernel/sched_schedstats perf inject --sched-stat --input perf.data.raw --output perf.data
  • 34. Sleep-timeProfiling $ perf-sleep-record ./ex_sleep $ perf report Samples: 24 of event 'sched:sched_switch', Event count (approx.): 8883195296 Overhead Trace output - 100.00% ex_sleep:24938 [120] S ==> swapper/7:0 [120] - 90.07% main main.cpp:10 QThread::sleep +11 0x1521ed __nanosleep .:0 entry_SYSCALL_64_fastpath entry_64.o:0 sys_nanosleep +18446744071576748154 hrtimer_nanosleep +18446744071576748225 do_nanosleep hrtimer.c:0 schedule +18446744071576748092 __schedule core.c:0 + 9.02% main main.cpp:11 + 0.91% main main.cpp:6
  • 35. perf script Convert perf.data to callgrind format: $ perf record --call-graph dwarf ... $ perf script report callgrind > perf.callgrind $ kcachegrind perf.callgrind github.com/milianw/linux/.../callgrind.py
  • 36. perf script Convert perf.data to callgrind format:
  • 37. Questions? kevin.funk@kdab.com https://www.kdab.com/ We offer trainings and workshops!Debugging and Profiling More perf work from my colleague: github.com/milianw/linux/tree/milian/perf git clone -b milian/perf https://github.com/milianw/linux.git