This document discusses Facebook's migration to systemd at scale. It covers why Facebook migrated from SysV init scripts to systemd, how they performed the migration and rollout, their configuration management approach using Chef, examples of systemd features and custom Chef cookbooks developed for systemd, case studies of issues encountered and solutions, and how Facebook works with the systemd upstream project.
11. • Migration == porting services, testing, reprovisioning
• Consultant approach
• Documentation, tutorials and examples
• Remove roadblocks
• Advocacy and selling points
Migrating the fleet
Working with service owners
12. • Track latest upstream release, staying as close as
possible
• Backport from Fedora rawhide
• Add back compat-libs (http://tinyurl.com/compat-libs)
• Package dependencies
• Rework RPM macros
Migrating the fleet
Building systemd
13. • Test in-place upgrade, reboot, reprovision
• Canaries and phase-based rollout via Chef
• Staging yum repository for provisioning testing
• About a week from 0% to 100%
Migrating the fleet
New release rollout
15. Chef
• Automates server configuration
• Every box converges every 15m with a random
splay
• Commit to production time is about 30m
• Attribute-driven API model
• Our model: http://tinyurl.com/facebook-chef
• Our cookbooks: http://tinyurl.com/facebook-
Configuration management framework
16. package ‘foo’ do
action :upgrade
end
template ‘/etc/sysconfig/foo’ do
source ‘foo.erb’
ower ‘root’
group ‘root’
mode ‘0644’
notifies :restart, ‘service[foo]’
end
service ‘foo’ do
action [:enable, :start]
end
Chef
Example
[Unit]
Description=Foo service
[Service]
Environment=FOO_ARGS=--bar
EnvironmentFile=-/etc/sysconfig/foo
ExecStart=/usr/local/bin/foo $FOO_ARGS
StandardOutput=syslog
Restart=always
[Install]
WantedBy=multi-user.target
17. • ohai: loginctl (#766), hostnamectl and machine_id
(#867)
• chef: masking support for service resource (#4307)
• chef: user services support (#4661)
• chef: systemd_unit resource (#4700, @nathwill)
Chef
systemd support
18. • Install, upgrade and configure systemd and its
components
• fb_systemd_reload
• fb_systemd_run
• http://tinyurl.com/fb-systemd
Chef cookbooks
fb_systemd
19. • Simple API to setup systemd timers
• Straightforward replacement for cron
• Timespec generation functions
• http://tinyurl.com/fb-timers
Chef cookbooks
fb_timers
20. # Run a command every 15 minutes
node.default['fb_timers']['jobs']['my_custom_job'] = {
'calendar' => FB::Systemd::Calendar.every(15).minutes,
'command' => '/usr/local/bin/foobar.sh',
}
# More complex example with other fields you can set
node.default['fb_timers']['jobs']['more_complex_job'] = {
'calendar' => FB::Systemd::Calendar.every.weekday,
'command' => '/usr/local/bin/foobar.sh thing1 thing2',
'timeout' => '1d',
'accuracy' => '1h',
'persistent' => true,
'splay' => '0.5h',
}
Chef cookbooks
fb_timers example
22. Case studies
dbus cannot die
• dbus-daemon doesn’t support being restarted
• Hack: trick dbus.service into reloading systemd
ExecStartPost=-/usr/lib/systemd/scripts/dbus-restart-hack.sh
23. Case studies
logind scale issues in systemd < 229
• ~30s delay to ssh session establishment
• ~2000 sessions, coming and going every few
seconds
• Workaround: lower dbus timeout for systemd-logind
• Fixed in systemd 230 (thank you!)
24. Case studies
$PATH defaults
• systemd doesn’t include /sbin and /bin in PATH by
default
• Workaround
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
bin:/sbin:/bin
25. Case studies
Filesystem namespaces
• Problem: making sure sshd and our container
manager are started in the same filesystem
namespace
• Proposed fix: UseRootFilesystemNamespace in
#4145
26. Case studies
Logging
• Journald setup: 10MB in memory logging
• StandardOutput=syslog + chef rules for logrotate
• Problem: SIGPIPE, journald slowing down
27. Case studies
Timers
• Problem: chef not running on some hosts
• Lingering processes keep the service active
• Never killed due to
• Workaround:
• Proper fix: run misbehaving binaries under
fb_systemd_run
TimeoutStopSec=900
TimeoutStartSec=0
28. Case studies
TasksMax defaults
• Used to default to 512 – way too low for our usecase
• New relative default is much better
• DefaultTasksMax=infinity in system.conf and
user.conf
• UserTasksMax=infinity in logind.conf
29. Case studies
cgroup2
• New resource management framework in the kernel
• Controllers for memory / CPU / IO
• Natively supported by systemd
• @htejun PRs: http://tinyurl.com/systemd-cgroup2
• Plan: cap system.slice, use workload.slice for actual
work
31. • Follow upstream development
• Keep local delta as small as possible
• Develop patches on master
• Use the tools available for testing (mkosi, nspawn)
• Send PRs and bug reports, and encourage others to
do so
Working with upstream