Managing Puppet using MCollective

69,382 views

Published on

R.I. Pienaar's talk "Managing Puppet using MCollective" at Puppet Camp Ghent, 2013 and at Puppet Camp New York 2013.

Published in: Technology
1 Comment
42 Likes
Statistics
Notes
No Downloads
Views
Total views
69,382
On SlideShare
0
From Embeds
0
Number of Embeds
40,007
Actions
Shares
0
Downloads
412
Comments
1
Likes
42
Embeds 0
No embeds

No notes for slide

Managing Puppet using MCollective

  1. Managing Puppet using MCollective Puppet Camp Ghent R.I.Pienaar
  2. Who am I?• Puppet user since 0.22.x• Architect of MCollective• Author of Extlookup and Hiera• Developer at Puppet Labs London• Blog at http://devco.net• Tweets at @ripienaar• Volcane on IRC R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  3. The Problem?• Puppet needs management just like other software• Enabling, disabling, ad-hoc runs, custom environments etc• The Puppet Master is a finite resource that needs protection• Orchestrated deploys R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  4. MCollective Puppet Agent package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present }Available on yum.puppetlabs.com and apt.puppetlabs.com http://srt.ly/mcpuppet R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  5. Obtaining The Agent Status R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  6. Obtaining Statuses$ mco puppet status* [ ============================================================> ] 11 / 11 node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago ....Summary of Applying: false = 11Summary of Daemon Running:unix text here Per node status stopped = 11Summary of Enabled: Estate wide summary enabled = 10 disabled = 1Summary of Idling: false = 11Finished processing 11 / 11 hosts in 72.05 ms R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  7. Obtaining Statuses$ mco puppet countTotal Puppet nodes: 11 Nodes currently enabled: 10 Nodes currently disabled: 1Nodes currently doing puppet runs: 5 Nodes currently stopped: 6 Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6 R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  8. Obtaining Statuses$ mco rpc puppet last_run_summary* [ ============================================================> ] 28 / 28 . . .Summary of Config Retrieval Time: Average: 20.13Summary of Total Resources: Average: 435Summary of Total Time: Average: 39.33Finished processing 28 / 28 hosts in 311.23 ms R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  9. Running Puppet R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  10. Doing Basic Runs$ mco puppet runonce * [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2593.85 ms$ mco puppet countTotal Puppet nodes: 11 Puppet 3 disable message Nodes currently enabled: 10 Nodes currently disabled: 1Nodes currently doing puppet runs: 2 Nodes currently stopped: 9 Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8Run with default configured splay and splaylimit R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  11. Doing Basic Runs$ mco puppet runonce -f * [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 msRun with no splay, still subject to enable/disable R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  12. Doing Basic Runs$ mco puppet runonce --splay --splaylimit 120* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 ms Force splay and set a custom splay limit R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  13. Tags and Environment$ mco puppet runonce --tag webserver --tag syslog --environment development* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 msSelects 2 tags in a specific Puppet Environment R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  14. Doing noop Runs$ mco puppet runonce --noop* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 ms Do a noop run, gathers reports and audit information R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  15. Doing no-noop Runs$ mco puppet runonce --tag webserver --no-noop* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 ms When puppet.conf has noop=true, do an actual run on demand R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  16. Choosing a Master$ mco puppet runonce --server secops.example.net:8134 --tag compliance* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Puppet is disabled: machine under maintenanceFinished processing 11 / 11 hosts in 2661.99 ms Does a single run against a different Puppet Master R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  17. Preventing Puppet Runs R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  18. The Big Red Button$ mco puppet disable “we f’d up, stop the train!”* [ ============================================================> ] 11 / 11node9.example.net Request Aborted Could not disable Puppet: Already disabledSummary of Enabled: disabled = 11Finished processing 11 / 11 hosts in 90.06 ms Disables Puppet, does not change currently disabled nodes reasons R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  19. The Big Green Button$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’* [ ============================================================> ] 10 / 10Summary of Enabled: enabled = 10Finished processing 10 / 10 hosts in 90.06 ms Enables all disabled Puppet nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  20. Operating On Groups Of Hosts R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  21. Selective Runs Facter fact Puppet Class$ mco puppet runonce -W “cluster=a roles::webserver”* [ ============================================================> ] 5 / 5Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: all web servers with fact cluster=a R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  22. Selective Runs Any Puppet resource$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”* [ ============================================================> ] 5 / 5Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: nodes where we manage /srv/www R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  23. Selective Runs$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”* [ ============================================================> ] 5 / 5Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: Most recent run config_version was xyz that had > 5 resource failures R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  24. Roll Out A Change Quickly$ mco puppet runall 72013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog,cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run Runs all nodes with a maximum concurrency R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  25. Roll Out A Change Quickly2013-01-19 20:58:59: Running all nodes with a concurrency of 72013-01-19 20:58:59: Discovering enabled Puppet nodes to manage2013-01-19 20:59:02: Found 11 enabled nodes Does not attempt to manage disabled nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  26. Roll Out A Change Quickly2013-01-19 20:59:02: Found 11 enabled nodes2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7 Starts the first 6 quickly but considers administrators doing 1other run at the same time R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  27. Roll Out A Change Quickly2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog,cannot run now2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run node9 was being run by an administrator or normal schedule already, skipped to next node R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  28. Roll Out A Change Quickly2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 72013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run Regularly checks the concurrency and starts more nodes soon as possible. Average node run time 34.39s, total time 55 seconds R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  29. Roll Out A Change Slowly Wait 5 minutes$ mco puppet runonce --batch 5 --batch-sleep 300* [ ============================================================> ] 11 / 11Finished processing 11 / 11 hosts in 903686.29 msDoes runonce in batches of 5, 5 minute sleep per batch. ^c after any batch to stop. 15 minute total run time. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  30. Advanced Status AndPerformance Metrics R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  31. Performance Analysis$ mco puppet summarySummary statistics for 28 nodes: Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k Distribution of various metrics. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  32. Performance AnalysisConfig Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Distribution of various metrics. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  33. Performance Analysis$ mco plot resource config_retrieval_time Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | Slow machines 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time Distribution of config retrieval time. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  34. Performance Analysis$ mco find -S "resource().config_retrieval_time > 30"dev3.example.netdev4.example.netdev7.example.netdev6.example.netdev8.example.netdev9.example.netdev10.example.net Find machines with config_retrieval_time over 30 seconds - all the dev servers. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  35. Maintenance Windows and Access Control R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  36. Puppet State As ACLpolicy default denyallow cert=manager enable disable * *allow cert=sysadmin runonce status * *allow cert=developer * environment=development * Only cert=manager can enable and disable the Puppet Agent indicating maintenance periods R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  37. Puppet State As ACLpolicy default denyallow cert=manager stop start * *allow cert=noc stop start puppet().enabled=falseallow cert=developer * environment=development * NOC can start and stop services only during a maintenance window. Manager user can always override maintenance windows. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  38. What is MCollective?• Ruby framework for writing Orchestration systems• Provides Authentication, Authorization and Auditing• No direct communication between client and nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  39. Questions? twitter: @ripienaar email: rip@puppetlabs.com blog: www.devco.net github: ripienaar freenode: Volcane R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

×