Multi-tenant Puppet
Automation for everyone
JJ
John Jawed, github.com./johnj
Dogs, anything with an ocean
3
Gap up
Gap up
Linear
Exponential
Change
function of time
2014
118,000 hosts
13,000 environments
fewer puppetmasters
baremetal, VM, containers
Cha-cha-cha-changes
unavoidable
happen everywhere
Oops
changes does not always go according to plan
48 minutes
Goals
performance & scale
policy
seamless on boarding
Bottlenecks? Try giving up.
capacity, abilities
paradigms (epoll vs select)
insanity
Classification Catalog Reports/Facts
average puppet run 8 seconds
Classification
node_terminus = /enc_script.rb
320ms - loading gems, files, certs
only 100ms for API call to ENC
Optimize: ENC run time as close to 100ms as
possible
Classification
paradigm shift
from
exec /enc_script.rb fqdn
to
write fqdn to ENC workers
Classification
a little dash of bash
node_terminus = /enc_handler.sh
$ cat enc_handler.sh!
...!
echo $1 | nc -U /unix.sock!
...!
Classification
a little go go
William Kennedy’s workpool
(github.com./goinggo/workpool)
go server listening on /unix.sock
workpool routes requests to an idle
worker
Classification
exec/exit to listen/process
$ cat /enc_script.rb!
…!
while certname = $stdin.gets do!
enc(certname)!
end!
…!
Classification
PPM calls node_terminus
node_terminus writes request
to socket
go handles request, workpool
routes
Classification
end result
gets close to 100ms goal – 110ms
CPU usage – no constant bootstrapping
frees up resources, puppet master process
at scale, 200ms per run adds up quickly (30 for
every 60 seconds of CPU time)
catalogs
Catalog compilation – low hanging fruit, difficult
Catalog
source: http://www.isrubyfastyet.com
agents
everything is SSL, that is good
everything is SSL, that is expensive
use yum.puppetlabs.com. or apt.puppetlabs.com.
to make sure you run 3.7+
runtime savings: 40%
Catalog
post run woes
after agent runs, the real fun begins
puppetmaster and agent both wait for
report processors to finish
slow report collection will cause your
infrastructure to fall over – some just avoid it
Reports/Facts
foreman
foreman report/fact processing – need to spread
read I/O
fact processing is read heavy, reports are write
heavy
ruby activerecord: makara
postgresql: local read slaves, pg_shard
Reports/Facts
reports
4k run reports per minute
using pg_shard:
psql> SELECT master_create_distributed_table(table_name := ’reports',
partition_column := ‘report_id');
psql> SELECT master_create_worker_shards(table_name := ‘reports',
shard_count := 365);
Reports/Facts
facts
most of the workload is read I/O, kept local
facts updated immediately after puppet runs
Master DB loadavg 2
Reports/Facts
Classification Catalog Reports/Facts
average puppet run 2 seconds
runinterval is not your friend
pvc
Open source, github.com./johnj/pvc
Basis of orchestration in 2014
pvc
pvc.conf
pvc
host_endpoint=your.pvcbackend.com./host !
!
simple is hard
“Simple can be harder than complex: You have
to work hard to get your thinking clean to make
it simple. But it’s worth it in the end because
once you get there, you can move mountains.”
- Steve Jobs
Host Infrastructure
Host events
most systems have audit frameworks
files (inotify)
processes (audit)
network
puppet needs react to these events
osquery
osquery
services, files, and any resource that can be
tracked as a host event
event information can also be recorded (doorman,
zentral, etc)
event info is stored in tables (sqlite)
file monitoring
{!
"file_paths": {!
"homes": [!
"/root/.ssh/%%",!
"/home/%/.ssh/%%"!
],!
”binaries": [!
"/usr/bin/%%",!
"/sbin/%%"!
],!
"etc": [!
"/etc/%%"!
],!
"tmp": [!
"/tmp/%%"!
]!
}!
}!
Infrastructure events
code releases, package upgrades,
access changes
puppet needs to be told to run when these
events occur
pvc and foreman
foreman’s puppetrun API to set flag
pvc queries foreman to trigger run
logical separation with host groups
runinterval is an after thought
puppet runs instantly when it needs to
runinterval can be 3 minutes or 3 hours
frees up puppet masters, allows more resources
for other things
your infrastructure is still kept honest
git
I pummel people with questions, because I need to know
what they're thinking, what they're trying to achieve, what
they believe the final outcome is going to be.
Tim Gunn
PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.

PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.