Monitoring your VM's at Scale


Published on

Slides from the

Published in: Technology

Monitoring your VM's at Scale

  1. Beyond VM deployment Monitoring your VMs at scale Kris Buytaert
  2. Kris Buytaert● I used to be a Dev,● Then Became an Op● Chief Trolling Officer and Open Source Consultant● Everything is an effing DNS Problem● Building Clouds since before the bookstore● Some books, some papers, some blogs● Evangelizing devops● But mostly, trying to be good at my job
  3. Whats different in the cloud ?● Scale● Velocity● Change
  4. Challenges● Reproducability● Speed● Auditing● Keeping stuff in sync • Monitoring • Security
  5. Case :Using a configuration managementtool to configure, update and keepyour cloudscale monitoring and metricinfrastructure sane and manageable.
  6. Tools● Puppet ● Chef Cfengine ● Ganglia● Collectd ● Sensu● Graphite● Nagios / Icinga
  7. Not quite a Muppet.● Puppet is...● OSS● A DSL language● Written in Ruby● Client/server oriented● Contains abstraction layers● Repeatable processes
  8. Master of Puppets● Puppet master • CA authority • Hosts Modules • Hosts Node descriptions • Compare, compile, apply● Master is not a requirement !
  9. Puppet Clients● daemon● Cron jobs● External orchestration: • for i in $hosts; do ssh $i “puppetd --test”; done • mCollective, Func, …● Get catalogs, play them,● reporting
  10. Puppet Environments● Different code bases on 1 master● Dev, Uat, Prod● Only break one environment at once :)● What about testing your Puppetmaster ?
  11. Node definitions● Nodes.pp class defaults { $search = "" $nameservers = [,] include dns::resolv include ssh::keys include ssh::server } node "" { include defaults include dns::powerdns::server include dns::powerdns::resolver } node “” { include defaults include apache2 include mysql }
  12. External Node Classifier● Fixed hostname ?● How many nodes● Naming schemas solve some issues● External script that sends back yaml class descriptions • Custom writtten • Foreman • ...
  13. Classes vs Modules● Module : ● Abstract definition on configuring a service ● Reusable● Class : ● Specific implementation of your use case of such a module•e.g usernames / passwords / hosts do not belong inmodules
  14. Modules● Files● Templates● Manifests • DSL • Classes • Elements
  15. Parametrized Classes
  16. Stored Configs
  17. Use Cases:● Ssh keys● Reverse proxy configs● Monitoring resources● Measuring resources
  18. Collection and Export Export : Collect: @@resource { Resource <<| query |>> ... }Clean out nodes that dissapearpuppet node clean
  19. Defining a Service● Local class that : • Configures service using a standard module call with hiera based parameters • Configures Backup • Configures logrotation • Configures logshipping • Exports Monitoring Needs● Abuse modules for git ease
  20. Apache Example:
  21. #monitoringsucks Monitoring is AWESOME. Metrics are AWESOME. I love it. Heres what I dont love:● Having my hands tied with the model of host and service bindings.● Having to set up "fake" hosts just to group arbitrary metrics together● Having to either collect metrics twice - once for alerting and another for trending● Only being able to see my metrics in 5 minute intervals● Having to chose between shitty interface but great monitoring or shitty monitoring but great interface● Dealing with a monitoring system that thinks IT is the system of truth for my environment● Not actually having any real choices John Vincent (@lusis) on his blog
  22. #monitoringlove● Puppet● Nagios (Icinga)● Graphite● Collectd● Logstash
  23. Graphite● Graphing at Scale● Graphing at Ease● Any metric is a graph● echo "somestring $somevalue $timestamp" | nc <%= graphitehost %> 2003
  24. Graphite Composer x
  25. Graphite API
  26. Gdash In action
  27. Puppet and Graphite●● Includes Graphite / Gdash / Jmxtrans / Logster / Collectd / Statsd / Tattle and more modules as submodules !● git clone● git submodule init● git submodule update● vagrant up
  28. Collectd● Collects● Zillion Plugins • Nginx,apache, mysql, disk● Graphite Carbon Plugin● Send metrics to graphite
  29. Collectd & Graphite
  30. Exporting and Collecting
  31. Triggers on Graphs● Export Java Metrics ● Collect JMX Exports on JMXTransNode● JMXTrans ● Graph Em● Export JMXConfigs Collect Nagios● Configure NRPE Configs on Nagios Check Server● Export NagiosCheck
  32. Triggers on Graphs
  33. Triggers on Graphs
  34. Conclusion:● Reproducable monitoring setup● Dynamically generated monitoring config● Code is available at
  35. ContactKris BuytaertKris.Buytaert@inuits.beFurther Reading@krisbuytaert Inuits Duboistraat 50 2060 Antwerpen Belgium 891.514.231 +32 475 961221