Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fact-based Monitoring
puppetconf 2014
Alexis Lê-Quôc @alq
Alexis Lê-Quôc, @alq
CTO at Datadog
Poll: Monitoring makes me…
happy
proud
cry
want to hide
Puppet brings Automation to
Systems Management
Improve
Monitoring
the way Puppet has
improved
Systems Management
“The good old days”
• Your “CMDB” was Excel
• SSH in and hack away
• Little time for anything else
Then Puppet came…
• Expressive rules that capture expected result
• Using facts and classifiers, a.k.a. metadata to figure o...
–Me (just now)
“Puppet brings immunity of configuration to change in
infrastructure”
I have seen this before…
–C.J. Date (1977)
“[SQL brings] immunity of application to change in storage
structure and access strategy”
http://www.cs....
SQL
• 1974 IBM introduces System R and its Structured Query Language
• Expressive rules that capture expected result
• Usi...
SQL
• From a time-consuming, imperative mess (“how”)
• … to expressive data queries (“what”)
SQL query
SELECT (desired fac...
Puppet
• From a time-consuming, imperative mess (“how”)
• … to expressive configuration queries (“what”)
puppet apply
CHANG...
Is there a pattern?
–MCollective overview
“Break free from ever more complex naming conventions for
hostnames as a means of identity. Use a ve...
MCollective
• From a time-consuming, imperative mess (“how”)
• … to expressive orchestration queries (“what”)
mco rpc serv...
Back to monitoring
• Monitoring is to behavior what Puppet is to configuration
• Monitoring is to behavior what MCollective...
Monitoring
• From a time-consuming, imperative mess (“how”)
• … to expressive monitoring queries (“what”)
Monitoring query...
Examples
• “All provisioned web servers in the production environment,
datacenter ABC must respond to queries within 200ms...
Hosts are not the center of the
monitoring universe.
Facts are!
Hosts are just places where facts occur.
The proof is in the pudding…
Hosts at the center of the universe
a.k.a. the Wrong Way
–Nagios Core 4 manual on monitoring clusters
“Its fairly straightforward, so hopefully you find things easy to
understand…”
Host-centric: Monitor a DNS cluster
check_command
check_service_cluster!"DNS Cluster"!0!1!
$SERVICESTATEID:host1:DNS Servi...
Host-centric: can’t use facts directly
• “Host groups solve this problem”. No, they don’t.
• Combinatorial explosion, e.g....
Nagios-bashing?
• No!
• Same fatal flaw with all host-centric monitoring tools
• Host-centric monitoring forces an extra, e...
–puppet-nagios author
“Please note that this module is not for the faint of heart. Even I
(the author) have my head hurt e...
Facts at the center of the universe
a.k.a. the Right Way
"De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj...
Earlier Examples
• “All provisioned web servers in the production environment,
datacenter ABC must respond to queries with...
In Sensu (heartbeats)
• “All PostgreSQL servers must have a postgres: bgwriter process
running”
class postgres::monitoring...
In Datadog (metrics)
• “All provisioned web servers in the production environment,
datacenter ABC must respond to queries ...
In Datadog (metrics)
• Monitoring using a fact-based query
• Puppet facts directly reused
max(nginx.request.latency{produc...
What to take away
Fact-based monitoring
1. Hosts are not at the center of the monitoring universe
2. Expressive monitoring uses queries
3. M...
Thank you!
Upcoming SlideShare
Loading in …5
×

Fact-Based Monitoring - PuppetConf 2014

1,678 views

Published on

Fact-Based Monitoring - Alexis Le-Quoc, Datadog

Published in: Technology
  • Be the first to comment

Fact-Based Monitoring - PuppetConf 2014

  1. 1. Fact-based Monitoring puppetconf 2014 Alexis Lê-Quôc @alq
  2. 2. Alexis Lê-Quôc, @alq CTO at Datadog
  3. 3. Poll: Monitoring makes me… happy proud cry want to hide
  4. 4. Puppet brings Automation to Systems Management
  5. 5. Improve Monitoring the way Puppet has improved Systems Management
  6. 6. “The good old days” • Your “CMDB” was Excel • SSH in and hack away • Little time for anything else
  7. 7. Then Puppet came… • Expressive rules that capture expected result • Using facts and classifiers, a.k.a. metadata to figure out where to apply changes • That freed up a lot of our time* * on a per-machine basis
  8. 8. –Me (just now) “Puppet brings immunity of configuration to change in infrastructure”
  9. 9. I have seen this before…
  10. 10. –C.J. Date (1977) “[SQL brings] immunity of application to change in storage structure and access strategy” http://www.cs.berkeley.edu/~brewer/cs262/SystemR.pdf
  11. 11. SQL • 1974 IBM introduces System R and its Structured Query Language • Expressive rules that capture expected result • Using facts and predicates, a.k.a. metadata to figure out what data to get • That freed up a lot of development time
  12. 12. SQL • From a time-consuming, imperative mess (“how”) • … to expressive data queries (“what”) SQL query SELECT (desired facts)
 FROM (existing facts)
 WHERE (matching criteria)
  13. 13. Puppet • From a time-consuming, imperative mess (“how”) • … to expressive configuration queries (“what”) puppet apply CHANGE (desired facts)
 FROM (existing puppet facts)
 WHERE (matching puppet classes)
  14. 14. Is there a pattern?
  15. 15. –MCollective overview “Break free from ever more complex naming conventions for hostnames as a means of identity. Use a very rich set of meta data provided by each machine to address them.”
  16. 16. MCollective • From a time-consuming, imperative mess (“how”) • … to expressive orchestration queries (“what”) mco rpc service restart service=nginx -F webpool=A EXEC (desired actions)
 FROM (existing puppet facts)
 WHERE (matching puppet classes)
  17. 17. Back to monitoring • Monitoring is to behavior what Puppet is to configuration • Monitoring is to behavior what MCollective is to orchestration
  18. 18. Monitoring • From a time-consuming, imperative mess (“how”) • … to expressive monitoring queries (“what”) Monitoring query MONITOR (desired behavior)
 FROM (existing heartbeats/metrics)
 WHERE (matching puppet facts)
  19. 19. Examples • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” • “All PostgreSQL servers must have a postgres: bgwriter process running” • “At least one ActiveMQ server is up to support mcollective" • Never mention a hostname
  20. 20. Hosts are not the center of the monitoring universe. Facts are! Hosts are just places where facts occur.
  21. 21. The proof is in the pudding…
  22. 22. Hosts at the center of the universe a.k.a. the Wrong Way
  23. 23. –Nagios Core 4 manual on monitoring clusters “Its fairly straightforward, so hopefully you find things easy to understand…”
  24. 24. Host-centric: Monitor a DNS cluster check_command check_service_cluster!"DNS Cluster"!0!1! $SERVICESTATEID:host1:DNS Service$,$SERVICESTATEID:host2:DNS Service$,$SERVICESTATEID:host3:DNS Service$ Where do host1, host2, host3 come from?
  25. 25. Host-centric: can’t use facts directly • “Host groups solve this problem”. No, they don’t. • Combinatorial explosion, e.g. trivially • 4 data centers (us-1, us-2, eu, apac) • 5 classes (web, db, cache, appserver, hadoop) • 3 environments (test, staging, prod) • => up to 119 materialized host groups
  26. 26. Nagios-bashing? • No! • Same fatal flaw with all host-centric monitoring tools • Host-centric monitoring forces an extra, expensive step: • replicate fact-based conditionals in host-centric templates
  27. 27. –puppet-nagios author “Please note that this module is not for the faint of heart. Even I (the author) have my head hurt each time I have to make modifications to it…”
  28. 28. Facts at the center of the universe a.k.a. the Right Way "De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj.uj.edu.pl. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:De_Revolutionibus_manuscript_p9b.jpg#mediaviewer/ File:De_Revolutionibus_manuscript_p9b.jpga
  29. 29. Earlier Examples • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” • “All PostgreSQL servers must have a postgres: bgwriter process running” • “At least one ActiveMQ server is up to support mcollective"
  30. 30. In Sensu (heartbeats) • “All PostgreSQL servers must have a postgres: bgwriter process running” class postgres::monitoring::sensu { sensu::subscription { 'postgres': } } • Monitoring using a fact-based query • Is node of class “postgres” and subscribed to “postgres” or not? • If so, it will execute the postgres check
  31. 31. In Datadog (metrics) • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” $ puppet module install datadog-datadog_agent class { ‘datadog_agent’: api_key => …, tags => [$environment], fact_to_tags => [“datacenter”] } include datadog_agent::integrations::nginx
  32. 32. In Datadog (metrics) • Monitoring using a fact-based query • Puppet facts directly reused max(nginx.request.latency{production,datacenter:ABC}) < 200
  33. 33. What to take away
  34. 34. Fact-based monitoring 1. Hosts are not at the center of the monitoring universe 2. Expressive monitoring uses queries 3. Monitoring queries should use Puppet facts
  35. 35. Thank you!

×