Managing Puppet using
    MCollective
     Puppet Camp Ghent
         R.I.Pienaar
Who am I?
• Puppet user since 0.22.x
• Architect of MCollective
• Author of Extlookup and Hiera
• Developer at Puppet Labs London
• Blog at http://devco.net
• Tweets at @ripienaar
• Volcane on IRC
          R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
The Problem?
• Puppet needs management just like other
  software
• Enabling, disabling, ad-hoc runs, custom
  environments etc
• The Puppet Master is a finite resource that
  needs protection
• Orchestrated deploys
             R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
MCollective Puppet Agent
 package{[“mcollective-puppet-agent”,
          “mcollective-puppet-client”]:
               ensure => present
 }



Available on yum.puppetlabs.com and apt.puppetlabs.com

                http://srt.ly/mcpuppet


                 R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Obtaining The Agent
       Status



    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Obtaining Statuses
$ mco puppet status

* [ ============================================================> ] 11 / 11

        node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago
        ....

Summary of Applying:

   false = 11

Summary of Daemon Running:
unix text here
                                                                     Per node status
   stopped = 11

Summary of Enabled:


                                                  Estate wide summary
   enabled = 10
  disabled = 1

Summary of Idling:

   false = 11


Finished processing 11 / 11 hosts in 72.05 ms




                         R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Obtaining Statuses
$ mco puppet count

Total Puppet nodes: 11

          Nodes currently enabled: 10
         Nodes currently disabled: 1

Nodes currently doing puppet runs: 5
          Nodes currently stopped: 6

       Nodes with daemons started: 10
    Nodes without daemons started: 1
       Daemons started but idling: 6




                     R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Obtaining Statuses
$ mco rpc puppet last_run_summary

* [ ============================================================> ] 28 / 28

   .
   .
   .

Summary of Config Retrieval Time:

   Average: 20.13

Summary of Total Resources:

   Average: 435

Summary of Total Time:

   Average: 39.33


Finished processing 28 / 28 hosts in 311.23 ms




                         R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Running Puppet



  R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Doing Basic Runs
$ mco puppet runonce

 * [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2593.85 ms

$ mco puppet count

Total Puppet nodes: 11
                                           Puppet 3 disable message
          Nodes currently enabled: 10
         Nodes currently disabled: 1

Nodes currently doing puppet runs: 2
          Nodes currently stopped: 9

       Nodes with daemons started: 10
    Nodes without daemons started: 1
       Daemons started but idling: 8



Run with default configured splay and splaylimit

                       R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Doing Basic Runs

$ mco puppet runonce -f

 * [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms




Run with no splay, still subject to enable/disable



                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Doing Basic Runs

$ mco puppet runonce --splay --splaylimit 120

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms




     Force splay and set a custom splay limit



                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Tags and Environment

$ mco puppet runonce --tag webserver --tag syslog --environment development

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms




Selects 2 tags in a specific Puppet Environment



                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Doing noop Runs

$ mco puppet runonce --noop

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms



        Do a noop run, gathers reports and
                audit information


                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Doing no-noop Runs

$ mco puppet runonce --tag webserver --no-noop

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms



          When puppet.conf has noop=true,
           do an actual run on demand


                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Choosing a Master

$ mco puppet runonce --server secops.example.net:8134 --tag compliance

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Puppet is disabled: 'machine under maintenance'


Finished processing 11 / 11 hosts in 2661.99 ms



        Does a single run against a different
                  Puppet Master


                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Preventing Puppet Runs



      R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
The Big Red Button
$ mco puppet disable “we f’d up, stop the train!”

* [ ============================================================> ] 11 / 11


node9.example.net                        Request Aborted
   Could not disable Puppet: Already disabled


Summary of Enabled:

   disabled = 11


Finished processing 11 / 11 hosts in 90.06 ms




  Disables Puppet, does not change currently
            disabled nodes reasons

                      R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
The Big Green Button
$ mco puppet enable -S ‘puppet().disable_message=/stop the train/’

* [ ============================================================> ] 10 / 10


Summary of Enabled:

   enabled = 10


Finished processing 10 / 10 hosts in 90.06 ms




          Enables all disabled Puppet nodes



                      R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Operating On Groups
     Of Hosts


     R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Selective Runs
      Facter fact                                             Puppet Class

$ mco puppet runonce -W “cluster=a roles::webserver”

* [ ============================================================> ] 5 / 5



Finished processing 5 / 5 hosts in 90.06 ms




                  Run using a filter:
          all web servers with fact cluster=a



                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Selective Runs
                                                    Any Puppet resource

$ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true”

* [ ============================================================> ] 5 / 5



Finished processing 5 / 5 hosts in 90.06 ms




                Run using a filter:
         nodes where we manage /srv/www



                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Selective Runs
$ mco puppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz”

* [ ============================================================> ] 5 / 5



Finished processing 5 / 5 hosts in 90.06 ms




                      Run using a filter:
             Most recent run config_version was xyz
                that had > 5 resource failures




                            R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Quickly
$ mco puppet runall 7
2013-01-19 20:58:59: Running all nodes with a concurrency of 7
2013-01-19 20:58:59: Discovering enabled Puppet nodes to manage
2013-01-19 20:59:02: Found 11 enabled nodes
2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog,
cannot run now
2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run
2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run




         Runs all nodes with a maximum concurrency



                              R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Quickly


2013-01-19 20:58:59: Running all nodes with a concurrency of 7
2013-01-19 20:58:59: Discovering enabled Puppet nodes to manage
2013-01-19 20:59:02: Found 11 enabled nodes




          Does not attempt to manage disabled nodes




                              R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Quickly

2013-01-19   20:59:02:   Found 11 enabled nodes
2013-01-19   20:59:06:   node3.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:07:   node1.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:09:   node4.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:10:   node6.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:12:   node0.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:13:   node5.example.net schedule        status: Started a background Puppet run
2013-01-19   20:59:17:   Currently 7 nodes applying        the catalog; waiting for less than 7




         Starts the first 6 quickly but considers
    administrators doing 1other run at the same time




                                  R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Quickly

2013-01-19   20:59:17:   Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19   20:59:21:   Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19   20:59:25:   node9.example.net schedule status: Puppet is currently applying a catalog,
cannot run   now
2013-01-19   20:59:29:   node8.example.net schedule status: Started a background Puppet run




   node9 was being run by an administrator or normal
         schedule already, skipped to next node




                                  R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Quickly
2013-01-19   20:59:29:   node8.example.net schedule status: Started a background Puppet run
2013-01-19   20:59:33:   Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19   20:59:38:   node2.example.net schedule status: Started a background Puppet run
2013-01-19   20:59:41:   Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19   20:59:46:   middleware.example.net schedule status: Started a background Puppet run
2013-01-19   20:59:50:   Currently 7 nodes applying the catalog; waiting for less than 7
2013-01-19   20:59:55:   node7.example.net schedule status: Started a background Puppet run




         Regularly checks the concurrency and starts
                more nodes soon as possible.

                  Average node run time 34.39s, total
                           time 55 seconds



                                 R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Roll Out A Change Slowly
                                       Wait 5 minutes


$ mco puppet runonce --batch 5 --batch-sleep 300

* [ ============================================================> ] 11 / 11



Finished processing 11 / 11 hosts in 903686.29 ms




Does runonce in batches of 5, 5 minute sleep
   per batch. ^c after any batch to stop.

                    15 minute total run time.

                         R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Advanced Status And
Performance Metrics



    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Performance Analysis
$ mco puppet summary

Summary statistics for 28 nodes:

                  Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁                                min: 332.0   max: 695.0
            Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁                                min: 0.0     max: 2.0
                Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁                                 min: 0.0     max: 0.0
               Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁                                 min: 0.0     max: 2.0
 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁                                 min: 2.7     max: 57.1
         Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁                                min: 7.0     max: 125.1
    Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂                                min: 10.0    max: 89.0k




              Distribution of various metrics.


                         R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Performance Analysis

Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁                              min: 2.7   max: 57.1
     Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁                                min: 7.0   max: 125.1




              Distribution of various metrics.


                     R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Performance Analysis
$ mco plot resource config_retrieval_time

                     Information about Puppet managed resources
  Nodes
    8 ++----*-----+----------+-----------+----------+----------+----------++
      +       *       +              +            +           +     +               +
    7 ++     **                                                                    ++
      |      * *                                                                    |

                                           Slow machines
    6 ++     * *                                                                   ++
      |    * *                                                                      |
      |    * *                                                                      |
    5 ++ *       *                                                                 ++
      |    *     *                                                                  |
    4 ++ *       *                                                                 ++
      | *        *                                                                  |
    3 ++ *         *      *                                            *           ++
      | *          *    ** *                                          **            |
    2 ++*          ****     *                                         * *          ++
      |                       *                                     * *             |
      |                       *                                     *     *         |
    1 ++                        **************             ****** *       *     ** ++
      +               +              +         * +      **    +   *+        ***     +
    0 ++----------+----------+---------********-----+--*******-+----*-----++
      0               10             20           30          40    50              60
                                       Config Retrieval Time




          Distribution of config retrieval time.


                           R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Performance Analysis
$ mco find -S "resource().config_retrieval_time > 30"
dev3.example.net
dev4.example.net
dev7.example.net
dev6.example.net
dev8.example.net
dev9.example.net
dev10.example.net



   Find machines with config_retrieval_time over
         30 seconds - all the dev servers.



                R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Maintenance Windows
 and Access Control



     R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Puppet State As ACL
policy default deny
allow   cert=manager      enable disable                         *                       *
allow   cert=sysadmin     runonce status                         *                       *
allow   cert=developer    *                                      environment=development *




        Only cert=manager can enable and disable
        the Puppet Agent indicating maintenance
                       periods




                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Puppet State As ACL
policy default deny
allow   cert=manager       stop start                            *                       *
allow   cert=noc           stop start                            puppet().enabled=false
allow   cert=developer     *                                     environment=development *




             NOC can start and stop services
            only during a maintenance window.

             Manager user can always override
                 maintenance windows.

                    R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
What is MCollective?

• Ruby framework for writing Orchestration
  systems
• Provides Authentication, Authorization and
  Auditing
• No direct communication between client
  and nodes



             R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
Questions?
       twitter: @ripienaar
          email: rip@puppetlabs.com
            blog: www.devco.net
        github: ripienaar
   freenode: Volcane




 R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar

Managing Puppet using MCollective

  • 1.
    Managing Puppet using MCollective Puppet Camp Ghent R.I.Pienaar
  • 2.
    Who am I? •Puppet user since 0.22.x • Architect of MCollective • Author of Extlookup and Hiera • Developer at Puppet Labs London • Blog at http://devco.net • Tweets at @ripienaar • Volcane on IRC R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 3.
    The Problem? • Puppetneeds management just like other software • Enabling, disabling, ad-hoc runs, custom environments etc • The Puppet Master is a finite resource that needs protection • Orchestrated deploys R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 4.
    MCollective Puppet Agent package{[“mcollective-puppet-agent”, “mcollective-puppet-client”]: ensure => present } Available on yum.puppetlabs.com and apt.puppetlabs.com http://srt.ly/mcpuppet R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 5.
    Obtaining The Agent Status R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 6.
    Obtaining Statuses $ mcopuppet status * [ ============================================================> ] 11 / 11 node8.example.net: Currently stopped; last completed run 14 minutes 16 seconds ago .... Summary of Applying: false = 11 Summary of Daemon Running: unix text here Per node status stopped = 11 Summary of Enabled: Estate wide summary enabled = 10 disabled = 1 Summary of Idling: false = 11 Finished processing 11 / 11 hosts in 72.05 ms R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 7.
    Obtaining Statuses $ mcopuppet count Total Puppet nodes: 11 Nodes currently enabled: 10 Nodes currently disabled: 1 Nodes currently doing puppet runs: 5 Nodes currently stopped: 6 Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 6 R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 8.
    Obtaining Statuses $ mcorpc puppet last_run_summary * [ ============================================================> ] 28 / 28 . . . Summary of Config Retrieval Time: Average: 20.13 Summary of Total Resources: Average: 435 Summary of Total Time: Average: 39.33 Finished processing 28 / 28 hosts in 311.23 ms R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 9.
    Running Puppet R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 10.
    Doing Basic Runs $mco puppet runonce * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2593.85 ms $ mco puppet count Total Puppet nodes: 11 Puppet 3 disable message Nodes currently enabled: 10 Nodes currently disabled: 1 Nodes currently doing puppet runs: 2 Nodes currently stopped: 9 Nodes with daemons started: 10 Nodes without daemons started: 1 Daemons started but idling: 8 Run with default configured splay and splaylimit R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 11.
    Doing Basic Runs $mco puppet runonce -f * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms Run with no splay, still subject to enable/disable R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 12.
    Doing Basic Runs $mco puppet runonce --splay --splaylimit 120 * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms Force splay and set a custom splay limit R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 13.
    Tags and Environment $mco puppet runonce --tag webserver --tag syslog --environment development * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms Selects 2 tags in a specific Puppet Environment R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 14.
    Doing noop Runs $mco puppet runonce --noop * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms Do a noop run, gathers reports and audit information R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 15.
    Doing no-noop Runs $mco puppet runonce --tag webserver --no-noop * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms When puppet.conf has noop=true, do an actual run on demand R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 16.
    Choosing a Master $mco puppet runonce --server secops.example.net:8134 --tag compliance * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Puppet is disabled: 'machine under maintenance' Finished processing 11 / 11 hosts in 2661.99 ms Does a single run against a different Puppet Master R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 17.
    Preventing Puppet Runs R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 18.
    The Big RedButton $ mco puppet disable “we f’d up, stop the train!” * [ ============================================================> ] 11 / 11 node9.example.net Request Aborted Could not disable Puppet: Already disabled Summary of Enabled: disabled = 11 Finished processing 11 / 11 hosts in 90.06 ms Disables Puppet, does not change currently disabled nodes reasons R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 19.
    The Big GreenButton $ mco puppet enable -S ‘puppet().disable_message=/stop the train/’ * [ ============================================================> ] 10 / 10 Summary of Enabled: enabled = 10 Finished processing 10 / 10 hosts in 90.06 ms Enables all disabled Puppet nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 20.
    Operating On Groups Of Hosts R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 21.
    Selective Runs Facter fact Puppet Class $ mco puppet runonce -W “cluster=a roles::webserver” * [ ============================================================> ] 5 / 5 Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: all web servers with fact cluster=a R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 22.
    Selective Runs Any Puppet resource $ mco puppet runonce -S “resource(‘File[/srv/www]’).managed=true” * [ ============================================================> ] 5 / 5 Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: nodes where we manage /srv/www R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 23.
    Selective Runs $ mcopuppet runonce -S “resource().failed_resources>5 and resource().config_version=xyz” * [ ============================================================> ] 5 / 5 Finished processing 5 / 5 hosts in 90.06 ms Run using a filter: Most recent run config_version was xyz that had > 5 resource failures R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 24.
    Roll Out AChange Quickly $ mco puppet runall 7 2013-01-19 20:58:59: Running all nodes with a concurrency of 7 2013-01-19 20:58:59: Discovering enabled Puppet nodes to manage 2013-01-19 20:59:02: Found 11 enabled nodes 2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now 2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run Runs all nodes with a maximum concurrency R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 25.
    Roll Out AChange Quickly 2013-01-19 20:58:59: Running all nodes with a concurrency of 7 2013-01-19 20:58:59: Discovering enabled Puppet nodes to manage 2013-01-19 20:59:02: Found 11 enabled nodes Does not attempt to manage disabled nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 26.
    Roll Out AChange Quickly 2013-01-19 20:59:02: Found 11 enabled nodes 2013-01-19 20:59:06: node3.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:07: node1.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:09: node4.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:10: node6.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:12: node0.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:13: node5.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7 Starts the first 6 quickly but considers administrators doing 1other run at the same time R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 27.
    Roll Out AChange Quickly 2013-01-19 20:59:17: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:21: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:25: node9.example.net schedule status: Puppet is currently applying a catalog, cannot run now 2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run node9 was being run by an administrator or normal schedule already, skipped to next node R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 28.
    Roll Out AChange Quickly 2013-01-19 20:59:29: node8.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:33: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:38: node2.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:41: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:46: middleware.example.net schedule status: Started a background Puppet run 2013-01-19 20:59:50: Currently 7 nodes applying the catalog; waiting for less than 7 2013-01-19 20:59:55: node7.example.net schedule status: Started a background Puppet run Regularly checks the concurrency and starts more nodes soon as possible. Average node run time 34.39s, total time 55 seconds R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 29.
    Roll Out AChange Slowly Wait 5 minutes $ mco puppet runonce --batch 5 --batch-sleep 300 * [ ============================================================> ] 11 / 11 Finished processing 11 / 11 hosts in 903686.29 ms Does runonce in batches of 5, 5 minute sleep per batch. ^c after any batch to stop. 15 minute total run time. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 30.
    Advanced Status And PerformanceMetrics R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 31.
    Performance Analysis $ mcopuppet summary Summary statistics for 28 nodes: Total resources: ▂▇▂▁▁▃▁▂▂▂▄▁▂▁▁▁▁▁▂▁ min: 332.0 max: 695.0 Out Of Sync resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Failed resources: ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 0.0 Changed resources: ▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ min: 0.0 max: 2.0 Config Retrieval time (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Time since last run (seconds): ▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ min: 10.0 max: 89.0k Distribution of various metrics. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 32.
    Performance Analysis Config Retrievaltime (seconds): ▆▇▅▄▁▃▃▁▁▁▃▁▁▄▂▁▁▁▁▁ min: 2.7 max: 57.1 Total run-time (seconds): ▇▃▄▄▄▃▂▂▂▂▃▂▁▁▁▁▁▂▁▁ min: 7.0 max: 125.1 Distribution of various metrics. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 33.
    Performance Analysis $ mcoplot resource config_retrieval_time Information about Puppet managed resources Nodes 8 ++----*-----+----------+-----------+----------+----------+----------++ + * + + + + + + 7 ++ ** ++ | * * | Slow machines 6 ++ * * ++ | * * | | * * | 5 ++ * * ++ | * * | 4 ++ * * ++ | * * | 3 ++ * * * * ++ | * * ** * ** | 2 ++* **** * * * ++ | * * * | | * * * | 1 ++ ************** ****** * * ** ++ + + + * + ** + *+ *** + 0 ++----------+----------+---------********-----+--*******-+----*-----++ 0 10 20 30 40 50 60 Config Retrieval Time Distribution of config retrieval time. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 34.
    Performance Analysis $ mcofind -S "resource().config_retrieval_time > 30" dev3.example.net dev4.example.net dev7.example.net dev6.example.net dev8.example.net dev9.example.net dev10.example.net Find machines with config_retrieval_time over 30 seconds - all the dev servers. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 35.
    Maintenance Windows andAccess Control R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 36.
    Puppet State AsACL policy default deny allow cert=manager enable disable * * allow cert=sysadmin runonce status * * allow cert=developer * environment=development * Only cert=manager can enable and disable the Puppet Agent indicating maintenance periods R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 37.
    Puppet State AsACL policy default deny allow cert=manager stop start * * allow cert=noc stop start puppet().enabled=false allow cert=developer * environment=development * NOC can start and stop services only during a maintenance window. Manager user can always override maintenance windows. R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 38.
    What is MCollective? •Ruby framework for writing Orchestration systems • Provides Authentication, Authorization and Auditing • No direct communication between client and nodes R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar
  • 39.
    Questions? twitter: @ripienaar email: rip@puppetlabs.com blog: www.devco.net github: ripienaar freenode: Volcane R.I.Pienaar | rip@devco.net | http://devco.net | @ripienaar