PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.

Multi-tenant Puppet
Automation for everyone

JJ
John Jawed, github.com./johnj
Dogs, anything with an ocean

3
Gap up
Gap up
Linear
Exponential

Change
function of time
2014
118,000 hosts
13,000 environments
fewer puppetmasters
baremetal, VM, containers

Cha-cha-cha-changes
unavoidable
happen everywhere

Oops
changes does not always go according to plan
48 minutes

Goals
performance & scale
policy
seamless on boarding

Bottlenecks? Try giving up.
capacity, abilities
paradigms (epoll vs select)
insanity

Classification Catalog Reports/Facts
average puppet run 8 seconds

Classification
node_terminus = /enc_script.rb
320ms - loading gems, files, certs
only 100ms for API call to ENC
Optimize: ENC run time as close to 100ms as
possible

Classification
paradigm shift
from
exec /enc_script.rb fqdn
to
write fqdn to ENC workers

Classification
a little dash of bash
node_terminus = /enc_handler.sh
$ cat enc_handler.sh!
...!
echo $1 | nc -U /unix.sock!
...!

Classification
a little go go
William Kennedy’s workpool
(github.com./goinggo/workpool)
go server listening on /unix.sock
workpool routes requests to an idle
worker

Classification
exec/exit to listen/process
$ cat /enc_script.rb!
…!
while certname = $stdin.gets do!
enc(certname)!
end!
…!

Classification
PPM calls node_terminus
node_terminus writes request
to socket
go handles request, workpool
routes

Classification
end result
gets close to 100ms goal – 110ms
CPU usage – no constant bootstrapping
frees up resources, puppet master process
at scale, 200ms per run adds up quickly (30 for
every 60 seconds of CPU time)

catalogs
Catalog compilation – low hanging fruit, difficult
Catalog
source: http://www.isrubyfastyet.com

agents
everything is SSL, that is good
everything is SSL, that is expensive
use yum.puppetlabs.com. or apt.puppetlabs.com.
to make sure you run 3.7+
runtime savings: 40%
Catalog

post run woes
after agent runs, the real fun begins
puppetmaster and agent both wait for
report processors to finish
slow report collection will cause your
infrastructure to fall over – some just avoid it
Reports/Facts

foreman
foreman report/fact processing – need to spread
read I/O
fact processing is read heavy, reports are write
heavy
ruby activerecord: makara
postgresql: local read slaves, pg_shard
Reports/Facts

reports
4k run reports per minute
using pg_shard:
psql> SELECT master_create_distributed_table(table_name := ’reports',
partition_column := ‘report_id');
psql> SELECT master_create_worker_shards(table_name := ‘reports',
shard_count := 365);
Reports/Facts

facts
most of the workload is read I/O, kept local
facts updated immediately after puppet runs
Master DB loadavg 2
Reports/Facts

Classification Catalog Reports/Facts
average puppet run 2 seconds

runinterval is not your friend

pvc
Open source, github.com./johnj/pvc
Basis of orchestration in 2014
pvc

pvc.conf
pvc
host_endpoint=your.pvcbackend.com./host !
!

simple is hard
“Simple can be harder than complex: You have
to work hard to get your thinking clean to make
it simple. But it’s worth it in the end because
once you get there, you can move mountains.”
- Steve Jobs

Host events
most systems have audit frameworks
files (inotify)
processes (audit)
network
puppet needs react to these events

osquery
services, files, and any resource that can be
tracked as a host event
event information can also be recorded (doorman,
zentral, etc)
event info is stored in tables (sqlite)

file monitoring
{!
"file_paths": {!
"homes": [!
"/root/.ssh/%%",!
"/home/%/.ssh/%%"!
],!
”binaries": [!
"/usr/bin/%%",!
"/sbin/%%"!
],!
"etc": [!
"/etc/%%"!
],!
"tmp": [!
"/tmp/%%"!
]!
}!
}!

Infrastructure events
code releases, package upgrades,
access changes
puppet needs to be told to run when these
events occur

pvc and foreman
foreman’s puppetrun API to set flag
pvc queries foreman to trigger run
logical separation with host groups

runinterval is an after thought
puppet runs instantly when it needs to
runinterval can be 3 minutes or 3 hours
frees up puppet masters, allows more resources
for other things
your infrastructure is still kept honest

I pummel people with questions, because I need to know
what they're thinking, what they're trying to achieve, what
they believe the final outcome is going to be.
Tim Gunn

PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.

PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.

Similar to PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc. (20)

More from Puppet

More from Puppet (20)

Recently uploaded

Recently uploaded (20)

PuppetConf 2016: Multi-Tenant Puppet at Scale – John Jawed, eBay, Inc.