0
© MIRANTIS 2013 PAGE 1© MIRANTIS 2013
Scaling Puppet
Deployments
Matthew Mosesohn
Senior Deployment Engineer
© MIRANTIS 2013 PAGE 2
Configure by hand
●
Insert media into system
●
Install OS
●
Install software
●
Configure software
●...
© MIRANTIS 2013 PAGE 3
Automate
●
PXE installation
– Imaging
– Cobbler
– Foreman
– Razor
●
Configuration
– Puppet
– Chef
–...
© MIRANTIS 2013 PAGE 4
Puppet
●
Powerful tool written in Ruby
●
Extensible
●
Built in syntax checking
●
Large community
●
...
© MIRANTIS 2013 PAGE 5
Our purpose
●
FUEL is a tool designed to deploy OpenStack
●
FUEL consists of:
– Astute: Orchestrati...
© MIRANTIS 2013 PAGE 6
Tiny example
●
1 master Cobbler and Puppet server
●
2 node OpenStack cluster
●
OS deployment: 5 min...
© MIRANTIS 2013 PAGE 7
Typical example
●
1 master Cobbler and Puppet server
●
10 node OpenStack cluster
●
OS deployment: 3...
© MIRANTIS 2013 PAGE 8
Stretching the limits
●
1 master Cobbler and Puppet server
●
100 node OpenStack cluster
●
OS deploy...
© MIRANTIS 2013 PAGE 9
How to get to 1,000?
●
Physical limitations of physical disks
●
Physical limitations of network
●
P...
© MIRANTIS 2013 PAGE 10
Approach: Scale the server!
●
Pure speed. Don't care about anything else.
●
Buy expensive system w...
© MIRANTIS 2013 PAGE 11
How crowded is your network segment?
●
More than 500 nodes on one network is bad
●
Broadcast traff...
© MIRANTIS 2013 PAGE 12
err: Could not retrieve catalog 
from remote server: Connection 
refused ­ connect(2)
© MIRANTIS 2013 PAGE 13
Puppet load
●
Catalog compile time
– 12s per node
●
Serve files: 12mb each host
●
Receive and stor...
© MIRANTIS 2013 PAGE 14
How to avoid failure
●
IPMI control of all nodes (expensive)
●
Orchestration that can reset a host...
© MIRANTIS 2013 PAGE 15
How the pros do it
●
Large US bank
●
2 Puppet CA servers
●
3 Puppet catalog masters
●
DNS round ro...
© MIRANTIS 2013 PAGE 16
Conclusion
●
Not fast enough
●
Too much data
●
Still a bottleneck
●
Expensive hardware
© MIRANTIS 2013 PAGE 17
Approach: Ditch Puppetmaster!
●
Still need to provision a base OS
●
Still need package repository
...
© MIRANTIS 2013 PAGE 18
Speed up provisioning
●
Install every nth server to serve as a provisioning
mirror all in RAM
●
TF...
© MIRANTIS 2013 PAGE 19
Package repository
●
YUM repository should be located close to
cluster
●
Mirror via Cobbler/Forema...
© MIRANTIS 2013 PAGE 20
External Node Classifiers
Arbitrary script to tell nodes
what resources to install
ENC providers i...
© MIRANTIS 2013 PAGE 21
External Node Classifiers
●
What they can provide:
– Puppet master hostname
– Environment name (pr...
© MIRANTIS 2013 PAGE 22
Getting Puppet manifests to nodes
●
How do you place manifests on a node?
●
Without relying on one...
© MIRANTIS 2013 PAGE 23
Getting Puppet manifests to nodes
●
Plain Git
– Version controlled system
– Widely implemented
– S...
© MIRANTIS 2013 PAGE 24
Getting Puppet manifests to nodes
●
Puppet Librarian
– Created by Tim “Rodjek” Sharpe from GitHub
...
© MIRANTIS 2013 PAGE 25
Getting Puppet manifests to nodes
●
RPM format
– Technique used by Sam Bashton
– Versioned as well...
© MIRANTIS 2013 PAGE 26
Getting Puppet manifests to nodes
●
RPM format magic
– Jenkins job to take GIT code with manifests...
© MIRANTIS 2013 PAGE 27
Running local is better
●
Deploying on great new
hardware
●
Faster catalog build
●
No waiting for ...
© MIRANTIS 2013 PAGE 28
What about my precious logs?!
© MIRANTIS 2013 PAGE 29
Rsyslog
●
Scaling rsyslog requires lots of disk, but they
don't have to be fast
●
Rsyslog can thro...
© MIRANTIS 2013 PAGE 30
Doing the math
Stage Before After
Bootstrap OS 10min 10min (but that's okay)
Base OS provision 8hr...
© MIRANTIS 2013 PAGE 31
References
● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html
● http://www.maste...
© MIRANTIS 2013 PAGE 32
Ref commands
puppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "...
Upcoming SlideShare
Loading in...5
×

Matthew Mosesohn - Configuration Management at Large Companies

1,775

Published on

Right from the PuppetConf, which gathered a lot of engineers at San Francisco, Matt will pass the experience of configuration management at big companies. Of course, with his own opinion and criticism, which you are welcome to discuss.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,775
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Matthew Mosesohn - Configuration Management at Large Companies "

  1. 1. © MIRANTIS 2013 PAGE 1© MIRANTIS 2013 Scaling Puppet Deployments Matthew Mosesohn Senior Deployment Engineer
  2. 2. © MIRANTIS 2013 PAGE 2 Configure by hand ● Insert media into system ● Install OS ● Install software ● Configure software ● Verify ● Done?
  3. 3. © MIRANTIS 2013 PAGE 3 Automate ● PXE installation – Imaging – Cobbler – Foreman – Razor ● Configuration – Puppet – Chef – Salt – Ansible
  4. 4. © MIRANTIS 2013 PAGE 4 Puppet ● Powerful tool written in Ruby ● Extensible ● Built in syntax checking ● Large community ● Used in many major companies, including: – Google – Cisco – PayPal – VMWare
  5. 5. © MIRANTIS 2013 PAGE 5 Our purpose ● FUEL is a tool designed to deploy OpenStack ● FUEL consists of: – Astute: Orchestration library built on Mcollective – Library: Puppet manifests – Web: Python web app to deliver a rich user experience – Cobbler: provisioning of bare metal – Bootstrap: lightweight install environment for node discovery
  6. 6. © MIRANTIS 2013 PAGE 6 Tiny example ● 1 master Cobbler and Puppet server ● 2 node OpenStack cluster ● OS deployment: 5 minutes ● Puppet configuration: 15 minutes each ● Total time: ~40 minutes
  7. 7. © MIRANTIS 2013 PAGE 7 Typical example ● 1 master Cobbler and Puppet server ● 10 node OpenStack cluster ● OS deployment: 30 minutes total ● Puppet configuration: 15 minutes each ● Total time: ~2hr 45min
  8. 8. © MIRANTIS 2013 PAGE 8 Stretching the limits ● 1 master Cobbler and Puppet server ● 100 node OpenStack cluster ● OS deployment: ?? minutes total ● Puppet configuration: 15 minutes each ● Total time: Maybe 24 hours?
  9. 9. © MIRANTIS 2013 PAGE 9 How to get to 1,000? ● Physical limitations of physical disks ● Physical limitations of network ● Puppet limitations ● Cobbler limitations ● Messaging/orchestration limitations ● Durability/patience of client applications
  10. 10. © MIRANTIS 2013 PAGE 10 Approach: Scale the server! ● Pure speed. Don't care about anything else. ● Buy expensive system with 2 SSDs in RAID-0, 12 cores, 256GB memory, and bonded NICs ● Peak I/O: ~800MB/s
  11. 11. © MIRANTIS 2013 PAGE 11 How crowded is your network segment? ● More than 500 nodes on one network is bad ● Broadcast traffic will hinder normal traffic ● One lost packet means TFTP must fail and start over ● Make a second network and set a DHCP relay ● Update your PXE server's DHCP configuration
  12. 12. © MIRANTIS 2013 PAGE 12 err: Could not retrieve catalog  from remote server: Connection  refused ­ connect(2)
  13. 13. © MIRANTIS 2013 PAGE 13 Puppet load ● Catalog compile time – 12s per node ● Serve files: 12mb each host ● Receive and store 500kb report in YAML format ● Store in PuppetDB
  14. 14. © MIRANTIS 2013 PAGE 14 How to avoid failure ● IPMI control of all nodes (expensive) ● Orchestration that can reset a host if it gets “stuck” along the way ● Staggered approach to avoid overload on master
  15. 15. © MIRANTIS 2013 PAGE 15 How the pros do it ● Large US bank ● 2 Puppet CA servers ● 3 Puppet catalog masters ● DNS round robin for catalog servers ● 2000 hosts ● Must stagger initial deployments
  16. 16. © MIRANTIS 2013 PAGE 16 Conclusion ● Not fast enough ● Too much data ● Still a bottleneck ● Expensive hardware
  17. 17. © MIRANTIS 2013 PAGE 17 Approach: Ditch Puppetmaster! ● Still need to provision a base OS ● Still need package repository ● Still need to be fast ● Still need to have some “brain” to identify servers
  18. 18. © MIRANTIS 2013 PAGE 18 Speed up provisioning ● Install every nth server to serve as a provisioning mirror all in RAM ● TFTP still must come from master server, but 30 minutes of pain for bootstrap is okay ● HTTP for OS installation can be balanced via DNS round robin to each mirror ● Provision mirror hosts last
  19. 19. © MIRANTIS 2013 PAGE 19 Package repository ● YUM repository should be located close to cluster ● Mirror via Cobbler/Foreman ● Or somewhere in your organization with fast disks
  20. 20. © MIRANTIS 2013 PAGE 20 External Node Classifiers Arbitrary script to tell nodes what resources to install ENC providers include: – Puppet Dashboard – Foreman – Hiera – LDAP – Amazon CloudFormation – YAML file carried by pigeon
  21. 21. © MIRANTIS 2013 PAGE 21 External Node Classifiers ● What they can provide: – Puppet master hostname – Environment name (production, devel, stage) – Classes to use – Puppet facts needed for installation
  22. 22. © MIRANTIS 2013 PAGE 22 Getting Puppet manifests to nodes ● How do you place manifests on a node? ● Without relying on one host, pick most robust system available
  23. 23. © MIRANTIS 2013 PAGE 23 Getting Puppet manifests to nodes ● Plain Git – Version controlled system – Widely implemented – Simple to get started – Fits into Puppet's environment structure via branches
  24. 24. © MIRANTIS 2013 PAGE 24 Getting Puppet manifests to nodes ● Puppet Librarian – Created by Tim “Rodjek” Sharpe from GitHub – Flexible manifest sources – Can specify a puppet “forge” – Can retrieve from git repositories – Dependency handling – Version specification optional – Creates a local Git repository to track changes
  25. 25. © MIRANTIS 2013 PAGE 25 Getting Puppet manifests to nodes ● RPM format – Technique used by Sam Bashton – Versioned as well – As easy to deploy as any other package – Requires clever building process
  26. 26. © MIRANTIS 2013 PAGE 26 Getting Puppet manifests to nodes ● RPM format magic – Jenkins job to take GIT code with manifests – Run puppet-lint on all puppet code – Create tarball of puppet manifests and hiera data – Wrap inside a package with a new version number – Push ready package to software repository
  27. 27. © MIRANTIS 2013 PAGE 27 Running local is better ● Deploying on great new hardware ● Faster catalog build ● No waiting for manifests or uploading reports ● No timeouts or connections refused
  28. 28. © MIRANTIS 2013 PAGE 28 What about my precious logs?!
  29. 29. © MIRANTIS 2013 PAGE 29 Rsyslog ● Scaling rsyslog requires lots of disk, but they don't have to be fast ● Rsyslog can throttle clients effectively ● Clients can hold logs until server is ready to receive ● Everybody wins
  30. 30. © MIRANTIS 2013 PAGE 30 Doing the math Stage Before After Bootstrap OS 10min 10min (but that's okay) Base OS provision 8hrs (10 concurrent) 30min to set up 20 mirrors 25-40min to install (200 concurrent) 30min to install mirrors Puppet provisioning 10d 10hr (15min x 1000 hosts, one at a time) 45 mins for all 3 controllers, one at a time 20 mins for compute nodes Totals: 12 days 2-3 hours
  31. 31. © MIRANTIS 2013 PAGE 31 References ● http://www.tomshardware.com/reviews/ssd-raid-benchmark,3485-3.html ● http://www.masterzen.fr/2012/01/08/benchmarking-puppet-stacks/ ● http://theforeman.org/manuals/1.3/index.html#3.5.5FactsandtheENC ● https://github.com/rodjek/librarian-puppet ● http://www.slideshare.net/PuppetLabs/sam-bashton
  32. 32. © MIRANTIS 2013 PAGE 32 Ref commands puppet agent --{summarize,test,debug,evaltrace,noop} | perl -pe 's/^/localtime().": "/e' Time: .... Nova paste api ini: 0.02 Package: 0.03 Notify: 0.03 Nova config: 0.10 File: 0.40 Exec: 0.56 Service: 1.39 Augeas: 1.56 Total: 11.85 Last run: 1379522172 Config retrieval: 7.73
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×