Breaking the RPiDocker Challenge
Nicolas De Loof
Yoann Dubreuil
Damien Duportal
RPiDocker
Challenge
3
—Author Name
“Let’s break the challenge.”
4
Methodology
“Measure and automate all
the things.”
Damien Duportal
@DamienDuportal
1 - Measure and automate all the things
Measures :
● sysstat for post mortem
● node-collector from Prometheus.io for “real time”
Provisionning :
● Basic shell script published on Damien’s Github
Yoann Dubreuil
@YoannDubreuil
“Brainstorm for ideas,
then test everything
in arbitrary order”
Nicolas De loof
@ndeloof
“... and have some beer”
Nicolas & Yoann : Where to start ?
● first naïve try
○ only 38 containers :-
○ but 70 on a RPi1 #WTF?
● figure out RPi2 limits without Docker
○ web server footprint
○ network namespace footprint
● get some help !
○ let’s collaborate with @DamienDuportal (aka “French mafia”)
2 - Systemd tuning
Docker daemon run as root
… but still has some limits set by systemd (so the 38 containers...)
LimitSIGPENDING=infinity
LimitNOFILE=infinity
LimitAS=infinity
LimitNPROC=infinity
LimitSTACK=?
● Default stack size is 8Mb
○ a stack consume 8Mb of process VM space (8 * 4 * 38 = 1,2 Gb)
=> tweak LIMITSTACK for ~ 1800 / 2000 containers
3 - Lower the container footprint
● Tried with custom compiled nginx for ARM with few extensions
~ 80 containers
● Footprint is too big per container. Reading carefully Hypriot Blog : "rpi-
nano-httpd" : written in ARM assembly code, already highly optimized
➢ 1 page for code
➢ 1 page for data
➢ 1 page for stack
➢ 1 page for vsdo
=> 16kb memory footprint per process !
~150 containers
● launched 27.000 on a RPi2
network namespace RPi2 limit
● launched web server in a dedicated network namespace
ip netns exec <NS_NUMBER> httpd
● RPi2 limit is ~ 1.100 network namespace
=> To break the challenge, we needed to run without network isolation
--net=host
Reached ~ 1000 containers
4 - Speed up testing !
launching thousands of containers on a RPi2 takes
hours if not days!
● everything in memory with zram devices
○ swap (ratio 5:1)
○ /var/lib/docker on ext4 FS (ratio 10:1)
● swap as early as possible to keep free memory (vm.swappiness = 100)
● more CPU for GO with GOMAXPROCS=4
● reduce kernel perf event slowdown
○ kernel.perf_cpu_time_max_percent = 1
● USB external disk vs low perf, I/O limited SD card
5 - Docker tuning
● Disable proxy process : no use here
● No logging : --log-driver=none
● Disable network / port forwarding
--bridge=none --iptables=false --ipv6=false --ip-
forward=false --ip-masq=false
--userland-proxy=false -sig-proxy=false
● reduce Golang memory consumption
○ launched docker with GODEBUG=gctrace=1 GOGC=1
6 - System tuning
● limit memory consumption
○ reduce GPU memory to 16Mb (can’t do less)
○ blacklisted non required Linux modules
● remove some Linux limits
○ vm.overcommit = 1
○ kernel.pid_max = 32768
○ kernel.threads-max = 14812
● reduce thread stack size
○ smallest working thread stack size: 24kb
●
Did not work
● Btrfs
○ not working properly : strange web server 404 failures after ~20
successful launchs
○ stick with overlayfs
● LXC driver
○ way sloooooooower
○ 4 threads per container anyway
● Go 1.5
○ compiled Docker with Go 1.5 for “better GC”, had no significant impact
Challenge
Completed
● We started 2499containers !
● RAM on RPi2 was not exhausted but Docker daemon crashed
docker[307]: runtime: program exceeds 10000-thread limit
Why is there a limit ?
4 threads per container
● 10.000 threads for a Go application => 2500 containers max
Need to understand why Docker do need 4 threads per container
(hey, lot’s of Docker core contributors here, time to ask !)
Worked around this with runtime.debug.SetMaxThread(12000)
● hack not eligible for RpiDocker challenge, was just to confirm
● can run ~2740webserver containers, before actual OOM
“Collaboration (and beer)
were the keys to break this
challenge !.”
Thank you!
@ndeloof @YoannDubreuil @DamienDuportal

Breaking the RpiDocker challenge

  • 1.
    Breaking the RPiDockerChallenge Nicolas De Loof Yoann Dubreuil Damien Duportal
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    “Measure and automateall the things.” Damien Duportal @DamienDuportal
  • 7.
    1 - Measureand automate all the things Measures : ● sysstat for post mortem ● node-collector from Prometheus.io for “real time” Provisionning : ● Basic shell script published on Damien’s Github
  • 8.
    Yoann Dubreuil @YoannDubreuil “Brainstorm forideas, then test everything in arbitrary order”
  • 9.
    Nicolas De loof @ndeloof “...and have some beer”
  • 10.
    Nicolas & Yoann: Where to start ? ● first naïve try ○ only 38 containers :- ○ but 70 on a RPi1 #WTF? ● figure out RPi2 limits without Docker ○ web server footprint ○ network namespace footprint ● get some help ! ○ let’s collaborate with @DamienDuportal (aka “French mafia”)
  • 11.
    2 - Systemdtuning Docker daemon run as root … but still has some limits set by systemd (so the 38 containers...) LimitSIGPENDING=infinity LimitNOFILE=infinity LimitAS=infinity LimitNPROC=infinity LimitSTACK=? ● Default stack size is 8Mb ○ a stack consume 8Mb of process VM space (8 * 4 * 38 = 1,2 Gb) => tweak LIMITSTACK for ~ 1800 / 2000 containers
  • 12.
    3 - Lowerthe container footprint ● Tried with custom compiled nginx for ARM with few extensions ~ 80 containers ● Footprint is too big per container. Reading carefully Hypriot Blog : "rpi- nano-httpd" : written in ARM assembly code, already highly optimized ➢ 1 page for code ➢ 1 page for data ➢ 1 page for stack ➢ 1 page for vsdo => 16kb memory footprint per process ! ~150 containers ● launched 27.000 on a RPi2
  • 13.
    network namespace RPi2limit ● launched web server in a dedicated network namespace ip netns exec <NS_NUMBER> httpd ● RPi2 limit is ~ 1.100 network namespace => To break the challenge, we needed to run without network isolation --net=host Reached ~ 1000 containers
  • 14.
    4 - Speedup testing ! launching thousands of containers on a RPi2 takes hours if not days! ● everything in memory with zram devices ○ swap (ratio 5:1) ○ /var/lib/docker on ext4 FS (ratio 10:1) ● swap as early as possible to keep free memory (vm.swappiness = 100) ● more CPU for GO with GOMAXPROCS=4 ● reduce kernel perf event slowdown ○ kernel.perf_cpu_time_max_percent = 1 ● USB external disk vs low perf, I/O limited SD card
  • 15.
    5 - Dockertuning ● Disable proxy process : no use here ● No logging : --log-driver=none ● Disable network / port forwarding --bridge=none --iptables=false --ipv6=false --ip- forward=false --ip-masq=false --userland-proxy=false -sig-proxy=false ● reduce Golang memory consumption ○ launched docker with GODEBUG=gctrace=1 GOGC=1
  • 16.
    6 - Systemtuning ● limit memory consumption ○ reduce GPU memory to 16Mb (can’t do less) ○ blacklisted non required Linux modules ● remove some Linux limits ○ vm.overcommit = 1 ○ kernel.pid_max = 32768 ○ kernel.threads-max = 14812 ● reduce thread stack size ○ smallest working thread stack size: 24kb ●
  • 17.
    Did not work ●Btrfs ○ not working properly : strange web server 404 failures after ~20 successful launchs ○ stick with overlayfs ● LXC driver ○ way sloooooooower ○ 4 threads per container anyway ● Go 1.5 ○ compiled Docker with Go 1.5 for “better GC”, had no significant impact
  • 18.
    Challenge Completed ● We started2499containers ! ● RAM on RPi2 was not exhausted but Docker daemon crashed docker[307]: runtime: program exceeds 10000-thread limit
  • 19.
    Why is therea limit ? 4 threads per container ● 10.000 threads for a Go application => 2500 containers max Need to understand why Docker do need 4 threads per container (hey, lot’s of Docker core contributors here, time to ask !) Worked around this with runtime.debug.SetMaxThread(12000) ● hack not eligible for RpiDocker challenge, was just to confirm ● can run ~2740webserver containers, before actual OOM
  • 20.
    “Collaboration (and beer) werethe keys to break this challenge !.”
  • 21.