Successfully reported this slideshow.

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

John Adams Puppet Camp 2010

  1. 1. Puppet at High Scale John Adams, Twitter
  2. 2. First, the bad news! (*) (*) We’re on 0.25.4. Things may be different now.
  3. 3. Problems Sort of idempotent Ruby file transfer is inefficient Minimize use of recursion (home dirs, etc.) Single run is non-deterministic Order matters if order is specified Not specifying dependency order creates out of order delivery
  4. 4. Working Together Puppet is great with a small team Management is hard with a large number of admins Unforeseen interactions between changes No simple means of review
  5. 5. Security Anyone who can check into the tree can kill production with simple mistakes SVN access is effectively root equivalent Divergence from desired configuration through use of chattr +i puppetmanagedfile You can’t chattr +i with broken fingers.
  6. 6. Puppet DSL Puppet DSL not Ruby enough Stated as a plus, but really a minus when most engineers expect Ruby Incomplete conditionals in the DSL
  7. 7. Cron Removing configuration for a cron job leaves the cronjob behind Need to specify ensure => absent If you forget the command with absent, duplicate cronjob entries can occur The vestigal tail of these “ensure absent” lines end up living in the config long after they are needed
  8. 8. Cron + NTP NTP synchronizes the system time Cron granularity is one second Performance regression if you make puppet install many jobs across different modules, on the same zero second Introduce random delays before jobs sleep $(($RANDOM % 60)); do_something...
  9. 9. Test and “Canary” No facility in puppet for testing Monolithic design Controlled Deploys are preferable to “full change” Use representative machines first Push to cluster when everything works.
  10. 10. Machine Database Node membership to classes, and the nodes themselves in a puppet configuration are not well exposed. Once entered, parsing is the only option to retrieve the machine list and associated “roles” from the SVN tree. ldapnodes is a possible solution here.
  11. 11. Node Class Changes Still an unsolved problem Removing class definition from a node leaves all of the configuration from the class behind Have to re-kickstart the host to get to a base state
  12. 12. Why Puppet?(*) (*) the good news!
  13. 13. Configuration Management Our world is changing. The end of the “Systems Administrator” The beginning of “DevOps”
  14. 14. Configuration Management Consistent edits Trackable Changes Consistent ability to Rebuild Find Variance
  15. 15. DevOps Stop Wasting Time Start Delivering Great Ops Software Stop administering individual machines.
  16. 16. DevOps Puppet definitions are code Incorporate Cross-functional skills. Build a bridge between your developers and the ops team.
  17. 17. Let’s fix This.
  18. 18. Change Process initial Generate Review. HEAD commit Ad-hoc tests.
  19. 19. Change Process HEAD test integrate ~10% of hosts Watch for failures! TEST Test Integration
  20. 20. Change Process HEAD TEST production integrate 100% Production Final Review
  21. 21. Change Process HEAD cherry pick TEST (bypass) Production No Review.
  22. 22. Testing / Staging A test infrastructure is needed to ensure that updates don’t kill production People make mistakes Treat the puppet config as if it were code
  23. 23. Security Restrict access to SVN tree itself (through ACLs) Create a concept of an OWNER for each module and manifest subdir; restrict access. Enforce ownership during SVN checkin Enforce a proper review process
  24. 24. SVN can be smarter Post-Commit checks BIND (Verify zones, DNS, SOA++) A mistake here is a full site outage Verify puppet config Create Reviewboard Entries
  25. 25. puppet-util A script on each box to select the current branch Set the branch (by modifying facter fact + config) Show current branch Enable or Disable puppetd in emergencies or ad-hoc testing
  26. 26. =
  27. 27. Reviewboard Visualize and centralize change Keep teams informed Prevent Unknown Interactions
  28. 28. User Security Distrust puppet for creating user accounts Build them from an LDAP infrastructure Base package connects to LDAP and creates users based on group and machine role You still have to deal with RPMs creating system users
  29. 29. Machine Database No machine database in puppet We used Django, MySQL, but you could use LDAP Role membership imported to DB by parsing existing puppet definitions and special variables in the node stanza
  30. 30. Ad hoc scripting No facility in puppet for immediate execution of command on many hosts SSH in a loop is not a solution at scale Threaded SSH system through our own tool Uses Paraminko open source (Python) see also: func
  31. 31. Multiple Instances Three complete puppetmasterd instances on each puppet master machine, on different ports, pointed to different SVN branches HEAD TEST PRODUCTION
  32. 32. Handling many clients Distribute: the SVN tree (eliminate the SPOF) Use more puppet servers Rsync manifests, then run puppet Selectively update hosts (func)
  33. 33. Puppet Web Server Don’t run WEBRick (script/server) - too slow Unicorn (best choice) Passenger (mod_rails) mongrel?
  34. 34. Distributed Puppet SVN PM PM PM host host host host host host host host host host host host host host host
  35. 35. Distributed Puppet Too many clients eventually overwhelm the Master You must deploy more hosts Distribute cron jobs Randomize start times Distribute the master itself
  36. 36. Questions?

Editor's Notes

  • ×