SlideShare a Scribd company logo
Speeding up development by setting up a
kernel build farm
Willy Tarreau
HAProxy Technologies
Kernel Recipes 2016
Outline
● Target
● Context
● Solutions
● Pitfalls
● Improvements
Target
● Developers who build a lot of code
● Maintainers who backport lots of patches
● People who have to debug and bisect
● Developers having to use very slow laptops
(“ultrabooks”)
● Those who like to have fun with clusters
● Others ?
Observations
● Backporting fixes into old kernels is not trivial
● It often causes build failures
● Need to build a lot to validate backports
● Build time dominates in a backport session
● Not always building from the same place
● I spend a lot of time building distros as well...
How is time spent
● Lots of time spent between keyboard and chair
● Few short build sessions (< 10s)
● Many medium build sessions (~ 1mn)
● Few long build sessions (>15mn)
● You never know how long it takes
The goal is in fact to reduce the wait time!
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Ideas on how to reduce build time
● Stop testing backports and rely on previews :-)
● Release more often with less patches
● Buy a bigger machine
● Use a compiler cache
● Distribute the build over several machines
Distributing the build ?
For this, you need :
● A distributable workload (not kidding)
● Multiple machines
● The exact same compiler everywhere
● Some way to submit the job to these machines
● A low enough latency
Distributable workloads
Quite a few requirements :
● Strong support for parallel builds (make -j)
● No dependency on the build node...
● ...or ability to replicate the build environment
● Many more C files to build than machines
● Homogenous build times (hence sizes)
=> The kernel fits pretty well here.
Distributing the build ?
For this, you need :
● A distributable workload (not kidding)
● Multiple machines
● The exact same compiler everywhere
● Some way to submit the job to these machines
● A low enough latency
Distributing the build ?
For this, you need :
● A distributable workload (not kidding)
● Multiple machines
● The exact same compiler everywhere
● Some way to submit the job to these machines
● A low enough latency
Same compiler everywhere ?
It is absolutelely mandatory
=> Use crosstool-ng for this even for the local node
● Reusable, archivable configurations
● Builds relocatable toolchains that run everywhere
● Supports Canadian Cross compilers
● Supports bare-metal compilers
● Use of arch-vendor-os-abi naming allows many different toolchains to
coexist
Hint: set the gcc version in the vendor field, eg x86_64-gcc47-linux-gnu
Distributing the build ?
For this, you need :
● A distributable workload (not kidding)
● Multiple machines
● The exact same compiler everywhere
● Some way to submit the job to these machines
● A low enough latency
Distributed build controller
This component will be responsible for distributing the load across all the
machines. There are a few prerequisites :
● Must not be intrusive in the build process :
– No extra steps
– No patching
● Must present a very low overhead
● Must support cross-compilers
● Must be smart enough to fall back on the local node if any risk
● Should ignore unreachable machines
=> distcc is exactly all this
A few words on distcc
● Works either as a wrapper or in masqueraded mode :
$ ln -s /usr/bin/distcc x86_64-gcc47-linux-gnu-gcc
$ make -j 20 CC=$PWD/x86_64-gcc47-linux-gnu-gcc
● Automatically detects options involving the local node.
● Can use environment variables for nodes list
● Supports per-node usage limits
● Uses file-based node locking (no deamon)
● Detects dead nodes and can sometimes retry locally
Warnings on distcc
● Will not use remote nodes if gcov profiling is
enabled
=> sed -i ‘s,.*(CONFIG_GCOV_KERNEL).*,1=n,’ .config
● Pre-processing and linking are local and
consume some time (approx 20-30%)
How to compare performance
● Depends on number of files, CPU count,
parallelism, etc...
● Need to use a portable project as a reference to
compare all machines, and stick to that version
● Count the project’s line count to establish a metric
in lines per second.
– Hint: replace “gcc -c” with “gcc -S”, build, and count
non-empty lines not starting with “#”.
– Eg: 458870 lines for haproxy-6bcb0a84.
How to compare performance (...)
● Set cpufreq governor to “performance”
● Run multiple times (get at least 3 similar results)
● For more details on the method, please consult
the wiki link at the end.
Suitable machines
The first question is :
What do we want to optimize ?
● Build performance for a given cost ?
● Number of nodes (ie switch ports) ?
● Hardware cost for a given performance level ?
● Hardware size / weight ?
● Power consumption / cooling / noise ?
Suitable machines (...)
The second question is :
What impacts performance ?
● CPU architecture and family (AMD Phenom and later, intel core
i3/i5/i7, ARM cortex A17 & A53 are efficient)
●
DRAM latency (target high frequency and large busses)
●
CPU caches sizes and latencies
● Storage access time if any (building in /dev/shm is better)
● Compiler build options (save ~10% by playing with ct-ng)
Suitable machines (...)
● The “BogoLocs” metric models expected performance
numbers and avoids uselessly testing unsuitable hardware :
Suitable machines (...)
● BogoLocs only uses ramlat output
● It is surprizingly accurate for a prediction
● It reveals is that the most important speed limiting
factors are the following, from most to least important :
– DRAM latency (time to load a cache line)
– L3 cache latency (if any)
– L2 cache latency (if any)
– CPU frequency
– Core count.
Optimizing for performance
Always ensure the build nodes are properly used :
● Top/vmstat should almost always show ~100% CPU
● Local machine should almost never stay around 100% CPU
● Network must not be saturated
Real world example :
● Controller: Thinkpad T430s (core i5 2*3.1 GHz). Avg 70% CPU
● Build nodes: 4*CS-008 (RK3288 4*1.8 GHz). 95% CPU
● Network: 180 Mbps avg, peaks at 450 Mbps.
● Theorical limit for this controller: ~6 build nodes.
Optimizing for performance (...)
PC-type machines are the fastest and can be tuned :
● Fill all memory channels with fastest DRAM available (20-50% gains possible)
● Enable Hyperthreading when available (~50% gain max)
● Overclocking is a possibility (~15% gain max)
●
Add more RAM to improve file caching
●
Less gain over 8 cores due to memory bottlenecks
… but they (are expensive) and (are huge or noisy (or both)) and (draw a lot of
power) but they are versatile.
Hint: chose motherboards, CPUs and RAM designed for gamers.
Optimizing for the number of nodes
Use case : limited connectivity, limited build
parallelism, boss willing to buy only one server, etc.
=> Need for highest performance per node
Dual-socket high-frequency, 8-core x86 machines with
all RAM slots populated have a huge memory
bandwidth and are not very expensive.
=> often bigger than the typical PC and draw more
power.
Optimizing for hardware costs
In 2016, a quad-core 1.2 GHz machine costs $8.
Eg: NanoPI NEO :
http://www.friendlyarm.com/index.php?route=product/product&path=69&product_id=132
Optimizing for hardware costs (...)
But does it really work ?
● Yes it does, but it will be around 16 times slower
than a $800 PC.
● It may be even slower once it starts heating and
throttles
● It is limited to 100 Mbps per node (ok for this
one)
… and there are hidden costs
Hidden costs of very small hardware
● Shipping costs : ~$5 per machine
● Connectivity : a switch costs at least ~$2 per port
● Ethernet cable : ~$1 per cable
● A heatsink is needed to avoid throttling : $1
● A micro-SD card is needed to boot : $3
● A USB power supply is needed : $2
=> Total: ~$22 per machine for 1/16 of a $800 PC =
$350 per PC equivalent, not very interesting
Pitfalls of very small hardware
● Large files will take 15-20 times as long as they take on
a PC. A single large file can extend the build time.
● Quickly require some enclosures or rack mount options
● Heating, heating, heating...
● Stability issues : many vendors sell overclocked CPUs
(eg: some OrangePI claim 1.6 GHz for this 1.2 GHz
CPU)
● Sometimes advertised frequency cannot be reached
Mid-range hardware
● Often sold as “small PCs”
● $25-$60 price range
● CPUs up to quad-cortex A9 at 2 GHz
● Eg: NanoPI2-Fire and Odroid-C2:
Mid-range hardware (...)
Pros:
● Very interesting form factor, better cooling
● High performance density
● Some available in 8 A53 cores (eg: NanoPI-M3)
Cons:
● Comparatively more expensive per board
● Some platforms heat a lot (eg: Allwinner chips)
● Often forced to run their own kernel
High-range hardware
● Often sold as “set-top boxes” (STB)
● $50-$200 price range
● CPUs: 4-8 A17/A53 at 1.5-2.0 GHz
● Eg: MiQi, RKM-v5, CS-008 (clone) :
High-range hardware (...)
Pros:
● Very high performance density (up to ¼ of a $800 PC)
● Gigabit Ethernet connectivity (not always)
● On-board eMMC storage
● Often ultimately gets supported in mainline (eg: MiQi)
Cons:
● Often sold on Android, porting can be a pain (but fun)
● Setting up a chroot in android doesn’t always work due to
selinux
High-range hardware (...)
Warnings:
● Varying build quality (eg: CS-008 clones with fake RAM
chips)
● Varying software quality for cheaper clones
● Kernel lying on the real frequency
● Advertised power draw is lower than reality, need for a
strong USB power supply.
● Thermal throttling may happen on heating, heatsinks
are always needed even if sold without
Personal choice
My personal preference goes to these ones :
1. MiQi ($39, 4xA17 1.8 GHz, 1GB 64-bit DDR3-1600)
=> ~25k LoC/s
2. NanoPI-M3 ($35, 8xA53 1.4 GHz, 1GB 32-bit DDR3-1600)
=> ~21k LoC/s
3. Odroid-C2 ($35, 4xA53 1.5 GHz, 2GB 32-bit DDR3-912)
=> ~16k LoC/s
4. RKMv5/T034/CS008 ($60, 4xA17 1.6GHz, 2GB 32-bit DDR2-800)
=> ~18k LoC/s
Current Implementation
4 CS-008 at
1.8 GHz, 5-
port GigE
switch, 5x2A
PSU. Total
cost: $280 for
16 A17 cores
and 8 GB
RAM. Builds
allmodconfig
in 14 mn vs
43 mn on
laptop.
Future research
● Experiment with distcc’s “pump” mode
● Use haproxy in front of distccd :
● Avoid the switch for the portable version :
– Try WiFi with LZO (unsuited without, 180-450 Mbps for 4
machines).
– Try USB-NET on OTG ports with LZO over a USB hub (some USB
power supplies include the hub).
More or less important hints
● Don’t focus on kernel optimization/support as long as it is able to quickly and
reliably execute gcc.
● Any lightweight distro works (even a chroot inside Android if selinux is not too
strict). Using “formilux” here (~10 MB).
● Check with “top” that there’s no CPU hog on the machine (lightdm, systemd)
● Use “ramlat” to measure the performance impact of graphics mode. Disable,
reduce, rebuild without support, or kill Xorg.
● Unlocked core i7 are amazing when stuffed with fast DRAM
● Xeons and Atoms are not really worth the price.
● Cortex A17 is very strong (half a core i5 at 2/3 the frequency) but expensive
● Cortex A53 is 2/3 its performance and much cheaper but often provided with slow
RAM and more cores
● Cooling: it’s up to you (passive, active, ...)
More or less important hints (...)
● configure leds as heartbeat to spot dead machines
● On a desk, daisy-chain clusters : 3 per 5-port
switch, 6 per 8-port switch.
● Overclocking: sometimes OK, especially for build
testing, or on “gamers” machines.
● Use power-meters to ensure the power usage
remains below 10W per node for fanless operation
More or less important hints (...)
● configure leds as heartbeat to spot dead machines
● On a desk, daisy-chain clusters : 3 per 5-port switch, 6 per 8-port
switch:
● Overclocking: sometimes OK, especially for build testing, or on
“gamers” machines.
● Use power-meters to ensure the power usage remains below 10W
per node for fanless operation
Useful links
● Distcc: https://github.com/distcc
● Crosstool-ng: http://crosstool-ng.org/
● Wiki page listing test reports of devices :
http://wiki.ant-computing.com/Choosing_a_processor_for_a_buil
d_farm
● mqmaker’s MiQi board :
https://forum.mqmaker.com/t/announcing-miqi-a-credit-card-size
d-computer/371
● FriendlyARM’s NanoPI boards : http://www.friendlyarm.com/
● Hardkernel’s Odroid-C2 board :
http://www.hardkernel.com/main/products/prdt_info.php
● Rikomagic’s RKM-v5 device : http://www.rikomagic.co.uk/
That's all folks!
Thanks!
Questions / Comments / Jokes ?
Contact: Willy Tarreau <willy@haproxy.com>
HAProxy is hiring talented people. Contact us!

More Related Content

What's hot

Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
libfetion
 
SystemV vs systemd
SystemV vs systemdSystemV vs systemd
SystemV vs systemd
All Things Open
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debugging
The Linux Foundation
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Anne Nicolas
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
Kernel TLV
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
Tetsuyuki Kobayashi
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backporting
Linaro
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
Jiann-Fuh Liaw
 
IRQs: the Hard, the Soft, the Threaded and the Preemptible
IRQs: the Hard, the Soft, the Threaded and the PreemptibleIRQs: the Hard, the Soft, the Threaded and the Preemptible
IRQs: the Hard, the Soft, the Threaded and the Preemptible
Alison Chaiken
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
Linaro
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
Kernel TLV
 
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones
Kernel Recipes 2015 - The Dronecode Project – A step in open source dronesKernel Recipes 2015 - The Dronecode Project – A step in open source drones
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones
Anne Nicolas
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Jian-Hong Pan
 
Linux Kernel Debugging
Linux Kernel DebuggingLinux Kernel Debugging
Linux Kernel Debugging
GlobalLogic Ukraine
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
SeongJae Park
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at Facebook
Anne Nicolas
 

What's hot (20)

Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
SystemV vs systemd
SystemV vs systemdSystemV vs systemd
SystemV vs systemd
 
From printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debuggingFrom printk to QEMU: Xen/Linux Kernel debugging
From printk to QEMU: Xen/Linux Kernel debugging
 
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
 
LAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backportingLAS16-101: Efficient kernel backporting
LAS16-101: Efficient kernel backporting
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
IRQs: the Hard, the Soft, the Threaded and the Preemptible
IRQs: the Hard, the Soft, the Threaded and the PreemptibleIRQs: the Hard, the Soft, the Threaded and the Preemptible
IRQs: the Hard, the Soft, the Threaded and the Preemptible
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421The Linux Kernel Scheduler (For Beginners) - SFO17-421
The Linux Kernel Scheduler (For Beginners) - SFO17-421
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones
Kernel Recipes 2015 - The Dronecode Project – A step in open source dronesKernel Recipes 2015 - The Dronecode Project – A step in open source drones
Kernel Recipes 2015 - The Dronecode Project – A step in open source drones
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Linux Kernel Debugging
Linux Kernel DebuggingLinux Kernel Debugging
Linux Kernel Debugging
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at Facebook
 

Similar to Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm

Developping drivers on small machines
Developping drivers on small machinesDevelopping drivers on small machines
Developping drivers on small machines
Anne Nicolas
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
ScyllaDB
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
Domino Data Lab
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Community
 
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapUWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
edlangley
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 
C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computing
IPALab
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVMAchieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
StorPool Storage
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
PetteriTeikariPhD
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Anne Nicolas
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Torsten Seemann
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
Red_Hat_Storage
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
basisspace
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
rkr10
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
HostedbyConfluent
 

Similar to Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm (20)

Developping drivers on small machines
Developping drivers on small machinesDevelopping drivers on small machines
Developping drivers on small machines
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Multicore
MulticoreMulticore
Multicore
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapUWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computing
 
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVMAchieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
 
Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy Tarreau
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 

More from Anne Nicolas

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
Anne Nicolas
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Anne Nicolas
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
Anne Nicolas
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Anne Nicolas
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Anne Nicolas
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Anne Nicolas
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Anne Nicolas
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
Anne Nicolas
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Anne Nicolas
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Anne Nicolas
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
Anne Nicolas
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
Anne Nicolas
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
Anne Nicolas
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
Anne Nicolas
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Anne Nicolas
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Anne Nicolas
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
Anne Nicolas
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Anne Nicolas
 

More from Anne Nicolas (20)

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 

Recently uploaded

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 

Recently uploaded (20)

A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 

Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm

  • 1. Speeding up development by setting up a kernel build farm Willy Tarreau HAProxy Technologies Kernel Recipes 2016
  • 2. Outline ● Target ● Context ● Solutions ● Pitfalls ● Improvements
  • 3. Target ● Developers who build a lot of code ● Maintainers who backport lots of patches ● People who have to debug and bisect ● Developers having to use very slow laptops (“ultrabooks”) ● Those who like to have fun with clusters ● Others ?
  • 4. Observations ● Backporting fixes into old kernels is not trivial ● It often causes build failures ● Need to build a lot to validate backports ● Build time dominates in a backport session ● Not always building from the same place ● I spend a lot of time building distros as well...
  • 5. How is time spent ● Lots of time spent between keyboard and chair ● Few short build sessions (< 10s) ● Many medium build sessions (~ 1mn) ● Few long build sessions (>15mn) ● You never know how long it takes The goal is in fact to reduce the wait time!
  • 6. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 7. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 8. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 9. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 10. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 11. Ideas on how to reduce build time ● Stop testing backports and rely on previews :-) ● Release more often with less patches ● Buy a bigger machine ● Use a compiler cache ● Distribute the build over several machines
  • 12. Distributing the build ? For this, you need : ● A distributable workload (not kidding) ● Multiple machines ● The exact same compiler everywhere ● Some way to submit the job to these machines ● A low enough latency
  • 13. Distributable workloads Quite a few requirements : ● Strong support for parallel builds (make -j) ● No dependency on the build node... ● ...or ability to replicate the build environment ● Many more C files to build than machines ● Homogenous build times (hence sizes) => The kernel fits pretty well here.
  • 14. Distributing the build ? For this, you need : ● A distributable workload (not kidding) ● Multiple machines ● The exact same compiler everywhere ● Some way to submit the job to these machines ● A low enough latency
  • 15. Distributing the build ? For this, you need : ● A distributable workload (not kidding) ● Multiple machines ● The exact same compiler everywhere ● Some way to submit the job to these machines ● A low enough latency
  • 16. Same compiler everywhere ? It is absolutelely mandatory => Use crosstool-ng for this even for the local node ● Reusable, archivable configurations ● Builds relocatable toolchains that run everywhere ● Supports Canadian Cross compilers ● Supports bare-metal compilers ● Use of arch-vendor-os-abi naming allows many different toolchains to coexist Hint: set the gcc version in the vendor field, eg x86_64-gcc47-linux-gnu
  • 17. Distributing the build ? For this, you need : ● A distributable workload (not kidding) ● Multiple machines ● The exact same compiler everywhere ● Some way to submit the job to these machines ● A low enough latency
  • 18. Distributed build controller This component will be responsible for distributing the load across all the machines. There are a few prerequisites : ● Must not be intrusive in the build process : – No extra steps – No patching ● Must present a very low overhead ● Must support cross-compilers ● Must be smart enough to fall back on the local node if any risk ● Should ignore unreachable machines => distcc is exactly all this
  • 19. A few words on distcc ● Works either as a wrapper or in masqueraded mode : $ ln -s /usr/bin/distcc x86_64-gcc47-linux-gnu-gcc $ make -j 20 CC=$PWD/x86_64-gcc47-linux-gnu-gcc ● Automatically detects options involving the local node. ● Can use environment variables for nodes list ● Supports per-node usage limits ● Uses file-based node locking (no deamon) ● Detects dead nodes and can sometimes retry locally
  • 20. Warnings on distcc ● Will not use remote nodes if gcov profiling is enabled => sed -i ‘s,.*(CONFIG_GCOV_KERNEL).*,1=n,’ .config ● Pre-processing and linking are local and consume some time (approx 20-30%)
  • 21. How to compare performance ● Depends on number of files, CPU count, parallelism, etc... ● Need to use a portable project as a reference to compare all machines, and stick to that version ● Count the project’s line count to establish a metric in lines per second. – Hint: replace “gcc -c” with “gcc -S”, build, and count non-empty lines not starting with “#”. – Eg: 458870 lines for haproxy-6bcb0a84.
  • 22. How to compare performance (...) ● Set cpufreq governor to “performance” ● Run multiple times (get at least 3 similar results) ● For more details on the method, please consult the wiki link at the end.
  • 23. Suitable machines The first question is : What do we want to optimize ? ● Build performance for a given cost ? ● Number of nodes (ie switch ports) ? ● Hardware cost for a given performance level ? ● Hardware size / weight ? ● Power consumption / cooling / noise ?
  • 24. Suitable machines (...) The second question is : What impacts performance ? ● CPU architecture and family (AMD Phenom and later, intel core i3/i5/i7, ARM cortex A17 & A53 are efficient) ● DRAM latency (target high frequency and large busses) ● CPU caches sizes and latencies ● Storage access time if any (building in /dev/shm is better) ● Compiler build options (save ~10% by playing with ct-ng)
  • 25. Suitable machines (...) ● The “BogoLocs” metric models expected performance numbers and avoids uselessly testing unsuitable hardware :
  • 26. Suitable machines (...) ● BogoLocs only uses ramlat output ● It is surprizingly accurate for a prediction ● It reveals is that the most important speed limiting factors are the following, from most to least important : – DRAM latency (time to load a cache line) – L3 cache latency (if any) – L2 cache latency (if any) – CPU frequency – Core count.
  • 27. Optimizing for performance Always ensure the build nodes are properly used : ● Top/vmstat should almost always show ~100% CPU ● Local machine should almost never stay around 100% CPU ● Network must not be saturated Real world example : ● Controller: Thinkpad T430s (core i5 2*3.1 GHz). Avg 70% CPU ● Build nodes: 4*CS-008 (RK3288 4*1.8 GHz). 95% CPU ● Network: 180 Mbps avg, peaks at 450 Mbps. ● Theorical limit for this controller: ~6 build nodes.
  • 28. Optimizing for performance (...) PC-type machines are the fastest and can be tuned : ● Fill all memory channels with fastest DRAM available (20-50% gains possible) ● Enable Hyperthreading when available (~50% gain max) ● Overclocking is a possibility (~15% gain max) ● Add more RAM to improve file caching ● Less gain over 8 cores due to memory bottlenecks … but they (are expensive) and (are huge or noisy (or both)) and (draw a lot of power) but they are versatile. Hint: chose motherboards, CPUs and RAM designed for gamers.
  • 29. Optimizing for the number of nodes Use case : limited connectivity, limited build parallelism, boss willing to buy only one server, etc. => Need for highest performance per node Dual-socket high-frequency, 8-core x86 machines with all RAM slots populated have a huge memory bandwidth and are not very expensive. => often bigger than the typical PC and draw more power.
  • 30. Optimizing for hardware costs In 2016, a quad-core 1.2 GHz machine costs $8. Eg: NanoPI NEO : http://www.friendlyarm.com/index.php?route=product/product&path=69&product_id=132
  • 31. Optimizing for hardware costs (...) But does it really work ? ● Yes it does, but it will be around 16 times slower than a $800 PC. ● It may be even slower once it starts heating and throttles ● It is limited to 100 Mbps per node (ok for this one) … and there are hidden costs
  • 32. Hidden costs of very small hardware ● Shipping costs : ~$5 per machine ● Connectivity : a switch costs at least ~$2 per port ● Ethernet cable : ~$1 per cable ● A heatsink is needed to avoid throttling : $1 ● A micro-SD card is needed to boot : $3 ● A USB power supply is needed : $2 => Total: ~$22 per machine for 1/16 of a $800 PC = $350 per PC equivalent, not very interesting
  • 33. Pitfalls of very small hardware ● Large files will take 15-20 times as long as they take on a PC. A single large file can extend the build time. ● Quickly require some enclosures or rack mount options ● Heating, heating, heating... ● Stability issues : many vendors sell overclocked CPUs (eg: some OrangePI claim 1.6 GHz for this 1.2 GHz CPU) ● Sometimes advertised frequency cannot be reached
  • 34. Mid-range hardware ● Often sold as “small PCs” ● $25-$60 price range ● CPUs up to quad-cortex A9 at 2 GHz ● Eg: NanoPI2-Fire and Odroid-C2:
  • 35. Mid-range hardware (...) Pros: ● Very interesting form factor, better cooling ● High performance density ● Some available in 8 A53 cores (eg: NanoPI-M3) Cons: ● Comparatively more expensive per board ● Some platforms heat a lot (eg: Allwinner chips) ● Often forced to run their own kernel
  • 36. High-range hardware ● Often sold as “set-top boxes” (STB) ● $50-$200 price range ● CPUs: 4-8 A17/A53 at 1.5-2.0 GHz ● Eg: MiQi, RKM-v5, CS-008 (clone) :
  • 37. High-range hardware (...) Pros: ● Very high performance density (up to ¼ of a $800 PC) ● Gigabit Ethernet connectivity (not always) ● On-board eMMC storage ● Often ultimately gets supported in mainline (eg: MiQi) Cons: ● Often sold on Android, porting can be a pain (but fun) ● Setting up a chroot in android doesn’t always work due to selinux
  • 38. High-range hardware (...) Warnings: ● Varying build quality (eg: CS-008 clones with fake RAM chips) ● Varying software quality for cheaper clones ● Kernel lying on the real frequency ● Advertised power draw is lower than reality, need for a strong USB power supply. ● Thermal throttling may happen on heating, heatsinks are always needed even if sold without
  • 39. Personal choice My personal preference goes to these ones : 1. MiQi ($39, 4xA17 1.8 GHz, 1GB 64-bit DDR3-1600) => ~25k LoC/s 2. NanoPI-M3 ($35, 8xA53 1.4 GHz, 1GB 32-bit DDR3-1600) => ~21k LoC/s 3. Odroid-C2 ($35, 4xA53 1.5 GHz, 2GB 32-bit DDR3-912) => ~16k LoC/s 4. RKMv5/T034/CS008 ($60, 4xA17 1.6GHz, 2GB 32-bit DDR2-800) => ~18k LoC/s
  • 40. Current Implementation 4 CS-008 at 1.8 GHz, 5- port GigE switch, 5x2A PSU. Total cost: $280 for 16 A17 cores and 8 GB RAM. Builds allmodconfig in 14 mn vs 43 mn on laptop.
  • 41. Future research ● Experiment with distcc’s “pump” mode ● Use haproxy in front of distccd : ● Avoid the switch for the portable version : – Try WiFi with LZO (unsuited without, 180-450 Mbps for 4 machines). – Try USB-NET on OTG ports with LZO over a USB hub (some USB power supplies include the hub).
  • 42. More or less important hints ● Don’t focus on kernel optimization/support as long as it is able to quickly and reliably execute gcc. ● Any lightweight distro works (even a chroot inside Android if selinux is not too strict). Using “formilux” here (~10 MB). ● Check with “top” that there’s no CPU hog on the machine (lightdm, systemd) ● Use “ramlat” to measure the performance impact of graphics mode. Disable, reduce, rebuild without support, or kill Xorg. ● Unlocked core i7 are amazing when stuffed with fast DRAM ● Xeons and Atoms are not really worth the price. ● Cortex A17 is very strong (half a core i5 at 2/3 the frequency) but expensive ● Cortex A53 is 2/3 its performance and much cheaper but often provided with slow RAM and more cores ● Cooling: it’s up to you (passive, active, ...)
  • 43. More or less important hints (...) ● configure leds as heartbeat to spot dead machines ● On a desk, daisy-chain clusters : 3 per 5-port switch, 6 per 8-port switch. ● Overclocking: sometimes OK, especially for build testing, or on “gamers” machines. ● Use power-meters to ensure the power usage remains below 10W per node for fanless operation
  • 44. More or less important hints (...) ● configure leds as heartbeat to spot dead machines ● On a desk, daisy-chain clusters : 3 per 5-port switch, 6 per 8-port switch: ● Overclocking: sometimes OK, especially for build testing, or on “gamers” machines. ● Use power-meters to ensure the power usage remains below 10W per node for fanless operation
  • 45. Useful links ● Distcc: https://github.com/distcc ● Crosstool-ng: http://crosstool-ng.org/ ● Wiki page listing test reports of devices : http://wiki.ant-computing.com/Choosing_a_processor_for_a_buil d_farm ● mqmaker’s MiQi board : https://forum.mqmaker.com/t/announcing-miqi-a-credit-card-size d-computer/371 ● FriendlyARM’s NanoPI boards : http://www.friendlyarm.com/ ● Hardkernel’s Odroid-C2 board : http://www.hardkernel.com/main/products/prdt_info.php ● Rikomagic’s RKM-v5 device : http://www.rikomagic.co.uk/
  • 46. That's all folks! Thanks! Questions / Comments / Jokes ? Contact: Willy Tarreau <willy@haproxy.com> HAProxy is hiring talented people. Contact us!