John Adams Puppet Camp 2010

4,322 views

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,322
On SlideShare
0
From Embeds
0
Number of Embeds
76
Actions
Shares
0
Downloads
51
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide





































  • John Adams Puppet Camp 2010

    1. 1. Puppet at High Scale John Adams, Twitter
    2. 2. First, the bad news! (*) (*) We’re on 0.25.4. Things may be different now.
    3. 3. Problems Sort of idempotent Ruby file transfer is inefficient Minimize use of recursion (home dirs, etc.) Single run is non-deterministic Order matters if order is specified Not specifying dependency order creates out of order delivery
    4. 4. Working Together Puppet is great with a small team Management is hard with a large number of admins Unforeseen interactions between changes No simple means of review
    5. 5. Security Anyone who can check into the tree can kill production with simple mistakes SVN access is effectively root equivalent Divergence from desired configuration through use of chattr +i puppetmanagedfile You can’t chattr +i with broken fingers.
    6. 6. Puppet DSL Puppet DSL not Ruby enough Stated as a plus, but really a minus when most engineers expect Ruby Incomplete conditionals in the DSL
    7. 7. Cron Removing configuration for a cron job leaves the cronjob behind Need to specify ensure => absent If you forget the command with absent, duplicate cronjob entries can occur The vestigal tail of these “ensure absent” lines end up living in the config long after they are needed
    8. 8. Cron + NTP NTP synchronizes the system time Cron granularity is one second Performance regression if you make puppet install many jobs across different modules, on the same zero second Introduce random delays before jobs sleep $(($RANDOM % 60)); do_something...
    9. 9. Test and “Canary” No facility in puppet for testing Monolithic design Controlled Deploys are preferable to “full change” Use representative machines first Push to cluster when everything works.
    10. 10. Machine Database Node membership to classes, and the nodes themselves in a puppet configuration are not well exposed. Once entered, parsing is the only option to retrieve the machine list and associated “roles” from the SVN tree. ldapnodes is a possible solution here.
    11. 11. Node Class Changes Still an unsolved problem Removing class definition from a node leaves all of the configuration from the class behind Have to re-kickstart the host to get to a base state
    12. 12. Why Puppet?(*) (*) the good news!
    13. 13. Configuration Management Our world is changing. The end of the “Systems Administrator” The beginning of “DevOps”
    14. 14. Configuration Management Consistent edits Trackable Changes Consistent ability to Rebuild Find Variance
    15. 15. DevOps Stop Wasting Time Start Delivering Great Ops Software Stop administering individual machines.
    16. 16. DevOps Puppet definitions are code Incorporate Cross-functional skills. Build a bridge between your developers and the ops team.
    17. 17. Let’s fix This.
    18. 18. Change Process initial Generate Review. HEAD commit Ad-hoc tests.
    19. 19. Change Process HEAD test integrate ~10% of hosts Watch for failures! TEST Test Integration
    20. 20. Change Process HEAD TEST production integrate 100% Production Final Review
    21. 21. Change Process HEAD cherry pick TEST (bypass) Production No Review.
    22. 22. Testing / Staging A test infrastructure is needed to ensure that updates don’t kill production People make mistakes Treat the puppet config as if it were code
    23. 23. Security Restrict access to SVN tree itself (through ACLs) Create a concept of an OWNER for each module and manifest subdir; restrict access. Enforce ownership during SVN checkin Enforce a proper review process
    24. 24. SVN can be smarter Post-Commit checks BIND (Verify zones, DNS, SOA++) A mistake here is a full site outage Verify puppet config Create Reviewboard Entries
    25. 25. puppet-util A script on each box to select the current branch Set the branch (by modifying facter fact + config) Show current branch Enable or Disable puppetd in emergencies or ad-hoc testing
    26. 26. =
    27. 27. Reviewboard www.reviewboard.org Visualize and centralize change Keep teams informed Prevent Unknown Interactions
    28. 28. User Security Distrust puppet for creating user accounts Build them from an LDAP infrastructure Base package connects to LDAP and creates users based on group and machine role You still have to deal with RPMs creating system users
    29. 29. Machine Database No machine database in puppet We used Django, MySQL, but you could use LDAP Role membership imported to DB by parsing existing puppet definitions and special variables in the node stanza
    30. 30. Ad hoc scripting No facility in puppet for immediate execution of command on many hosts SSH in a loop is not a solution at scale Threaded SSH system through our own tool Uses Paraminko open source (Python) see also: func
    31. 31. Multiple Instances Three complete puppetmasterd instances on each puppet master machine, on different ports, pointed to different SVN branches HEAD TEST PRODUCTION
    32. 32. Handling many clients Distribute: the SVN tree (eliminate the SPOF) Use more puppet servers Rsync manifests, then run puppet Selectively update hosts (func)
    33. 33. Puppet Web Server Don’t run WEBRick (script/server) - too slow Unicorn (best choice) Passenger (mod_rails) mongrel?
    34. 34. Distributed Puppet SVN PM PM PM host host host host host host host host host host host host host host host
    35. 35. Distributed Puppet Too many clients eventually overwhelm the Master You must deploy more hosts Distribute cron jobs Randomize start times Distribute the master itself
    36. 36. Questions?

    ×