SlideShare a Scribd company logo
Fact-based Monitoring 
puppetconf 2014 
Alexis Lê-Quôc @alq
Alexis Lê-Quôc, @alq 
CTO at Datadog
Poll: Monitoring makes me… 
happy 
proud 
cry 
want to hide
Puppet brings Automation to 
Systems Management
Improve 
Monitoring 
the way Puppet has 
improved 
Systems Management
“The good old days” 
• Your “CMDB” was Excel 
• SSH in and hack away 
• Little time for anything else
Then Puppet came… 
• Expressive rules that capture expected result 
• Using facts and classifiers, a.k.a. metadata to figure out where to 
apply changes 
• That freed up a lot of our time* 
* on a per-machine basis
“Puppet brings immunity of configuration to change in 
infrastructure” 
–Me (just now)
I have seen this before…
“[SQL brings] immunity of application to change in storage 
structure and access strategy” 
–C.J. Date (1977) 
http://www.cs.berkeley.edu/~brewer/cs262/SystemR.pdf
SQL 
• 1974 IBM introduces System R and its Structured Query Language 
• Expressive rules that capture expected result 
• Using facts and predicates, a.k.a. metadata to figure out what data 
to get 
• That freed up a lot of development time
SQL 
• From a time-consuming, imperative mess (“how”) 
• … to expressive data queries (“what”) 
SQL query 
SELECT (desired facts) 
FROM (existing facts) 
WHERE (matching criteria)
Puppet 
• From a time-consuming, imperative mess (“how”) 
• … to expressive configuration queries (“what”) 
puppet apply 
CHANGE (desired facts) 
FROM (existing puppet facts) 
WHERE (matching puppet classes)
Is there a pattern?
“Break free from ever more complex naming conventions for 
hostnames as a means of identity. Use a very rich set of meta 
data provided by each machine to address them.” 
–MCollective overview
MCollective 
• From a time-consuming, imperative mess (“how”) 
• … to expressive orchestration queries (“what”) 
mco rpc service restart service=nginx 
-F webpool=A 
EXEC (desired actions) 
FROM (existing puppet facts) 
WHERE (matching puppet classes)
Back to monitoring 
• Monitoring is to behavior what Puppet is to configuration 
• Monitoring is to behavior what MCollective is to orchestration
Monitoring 
• From a time-consuming, imperative mess (“how”) 
• … to expressive monitoring queries (“what”) 
Monitoring query 
MONITOR (desired behavior) 
FROM (existing heartbeats/metrics) 
WHERE (matching puppet facts)
Examples 
• “All provisioned web servers in the production environment, 
datacenter ABC must respond to queries within 200ms” 
• “All PostgreSQL servers must have a postgres: bgwriter process 
running” 
• “At least one ActiveMQ server is up to support mcollective" 
• Never mention a hostname
Hosts are not the center of the 
monitoring universe. 
Facts are! 
Hosts are just places where facts occur.
The proof is in the pudding…
Hosts at the center of the universe 
a.k.a. the Wrong Way
“Its fairly straightforward, so hopefully you find things easy to 
understand…” 
–Nagios Core 4 manual on monitoring clusters
Host-centric: Monitor a DNS cluster 
check_command 
check_service_cluster!"DNS Cluster"!0!1! 
$SERVICESTATEID:host1:DNS Service$,$SERVICESTATEID:host2:DNS 
Service$,$SERVICESTATEID:host3:DNS Service$ 
Where do host1, host2, host3 come from?
Host-centric: can’t use facts directly 
• “Host groups solve this problem”. No, they don’t. 
• Combinatorial explosion, e.g. trivially 
• 4 data centers (us-1, us-2, eu, apac) 
• 5 classes (web, db, cache, appserver, hadoop) 
• 3 environments (test, staging, prod) 
• => up to 119 materialized host groups
Nagios-bashing? 
• No! 
• Same fatal flaw with all host-centric monitoring tools 
• Host-centric monitoring forces an extra, expensive step: 
• replicate fact-based conditionals in host-centric templates
“Please note that this module is not for the faint of heart. Even I 
(the author) have my head hurt each time I have to make 
modifications to it…” 
–puppet-nagios author
Facts at the center of the universe 
a.k.a. the Right Way 
"De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj.uj.edu.pl. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:De_Revolutionibus_manuscript_p9b.jpg#mediaviewer/ 
File:De_Revolutionibus_manuscript_p9b.jpga
Earlier Examples 
• “All provisioned web servers in the production environment, 
datacenter ABC must respond to queries within 200ms” 
• “All PostgreSQL servers must have a postgres: bgwriter process 
running” 
• “At least one ActiveMQ server is up to support mcollective"
In Sensu (heartbeats) 
• “All PostgreSQL servers must have a postgres: bgwriter process 
running” 
class postgres::monitoring::sensu { 
sensu::subscription { 'postgres': } 
} 
• Monitoring using a fact-based query 
• Is node of class “postgres” and subscribed to “postgres” or not? 
• If so, it will execute the postgres check
In Datadog (metrics) 
• “All provisioned web servers in the production environment, 
datacenter ABC must respond to queries within 200ms” 
$ puppet module install datadog-datadog_agent 
class { 
‘datadog_agent’: 
api_key => …, 
tags => [$environment], 
fact_to_tags => [“datacenter”] 
} 
include datadog_agent::integrations::nginx
In Datadog (metrics) 
• Monitoring using a fact-based query 
• Puppet facts directly reused 
max(nginx.request.latency{production,datacenter:ABC}) < 200
What to take away
Fact-based monitoring 
1. Hosts are not at the center of the monitoring universe 
2. Expressive monitoring uses queries 
3. Monitoring queries should use Puppet facts
Thank you!

More Related Content

What's hot

Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
Datadog
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
Datadog
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
aspyker
 
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
smalltown
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streams
dmantula
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
DataStax Academy
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
Dave Holland
 
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment modelsSf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Peter Ss
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
Daniel Smith
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
Rohit Jnagal
 
Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperryanlecompte
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
Jon Haddad
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
Ruslan Meshenberg
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
Gordon Chung
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
Jon Haddad
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
Xin Wang
 
Introducing Chef | An IT automation for speed and awesomeness
Introducing Chef | An IT automation for speed and awesomenessIntroducing Chef | An IT automation for speed and awesomeness
Introducing Chef | An IT automation for speed and awesomeness
Ramit Surana
 

What's hot (20)

Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
AgileTW Feat. DevOpsTW: 維運 Kubernetes 的兩三事
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Introduction to Akka-Streams
Introduction to Akka-StreamsIntroduction to Akka-Streams
Introduction to Akka-Streams
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
 
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
Cassandra Day Denver 2014: Setting up a DataStax Enterprise Instance on Micro...
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment modelsSf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment models
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeper
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
Stabilising the jenga tower
Stabilising the jenga towerStabilising the jenga tower
Stabilising the jenga tower
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Streaming and Messaging
Streaming and MessagingStreaming and Messaging
Streaming and Messaging
 
Introducing Chef | An IT automation for speed and awesomeness
Introducing Chef | An IT automation for speed and awesomenessIntroducing Chef | An IT automation for speed and awesomeness
Introducing Chef | An IT automation for speed and awesomeness
 

Viewers also liked

I &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slidesI &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slides
Datadog
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
Datadog
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
Datadog
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
Datadog
 
DevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterpriseDevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterprise
Donnie Berkholz
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
Datadog
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
Datadog
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
SignalFx
 
Customer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer supportCustomer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer support
Datadog
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
Datadog
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
Datadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
Datadog
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
Mukta Aphale
 

Viewers also liked (15)

I &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slidesI &lt;3 graphs in 20 slides
I &lt;3 graphs in 20 slides
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
DevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterpriseDevOps, continuous delivery, & the new composable enterprise
DevOps, continuous delivery, & the new composable enterprise
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
Customer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer supportCustomer Ops: DevOps &lt;3 customer support
Customer Ops: DevOps &lt;3 customer support
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 

Similar to Fact based monitoring

Cassandra
CassandraCassandra
Cassandraexsuns
 
CBDW2014 - MockBox, get ready to mock your socks off!
CBDW2014 - MockBox, get ready to mock your socks off!CBDW2014 - MockBox, get ready to mock your socks off!
CBDW2014 - MockBox, get ready to mock your socks off!
Ortus Solutions, Corp
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
Amazon Web Services Korea
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
Lei (Harry) Zhang
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
Abhishek Andhavarapu
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
Roy Russo
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
zeeg
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
Enis Afgan
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
Jonathan Katz
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
The Incredible Automation Day
 
Protect Your Payloads: Modern Keying Techniques
Protect Your Payloads: Modern Keying TechniquesProtect Your Payloads: Modern Keying Techniques
Protect Your Payloads: Modern Keying Techniques
Leo Loobeek
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
David Lapsley
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
FAschenbrenner
 
Using Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data AnalysisUsing Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data Analysis
Sveta Smirnova
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
Peter Lawrey
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
Sarah Z
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
Mats Kindahl
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 

Similar to Fact based monitoring (20)

Cassandra
CassandraCassandra
Cassandra
 
CBDW2014 - MockBox, get ready to mock your socks off!
CBDW2014 - MockBox, get ready to mock your socks off!CBDW2014 - MockBox, get ready to mock your socks off!
CBDW2014 - MockBox, get ready to mock your socks off!
 
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵 [AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Devnexus 2018
Devnexus 2018Devnexus 2018
Devnexus 2018
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
 
Protect Your Payloads: Modern Keying Techniques
Protect Your Payloads: Modern Keying TechniquesProtect Your Payloads: Modern Keying Techniques
Protect Your Payloads: Modern Keying Techniques
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Learn you some Ansible for great good!
Learn you some Ansible for great good!Learn you some Ansible for great good!
Learn you some Ansible for great good!
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
 
Using Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data AnalysisUsing Apache Spark and MySQL for Data Analysis
Using Apache Spark and MySQL for Data Analysis
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 

More from Datadog

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
Datadog
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
Datadog
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
Datadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
Datadog
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
Datadog
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
Datadog
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob Terhaar
Datadog
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex Lesser
Datadog
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom Santero
Datadog
 
Bulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaBulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo Cabanilla
Datadog
 

More from Datadog (10)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 
Why Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob TerhaarWhy Puppet Sucks - Rob Terhaar
Why Puppet Sucks - Rob Terhaar
 
Welcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex LesserWelcome to a Computing Revolution - Alex Lesser
Welcome to a Computing Revolution - Alex Lesser
 
Cosa Nostra - Tom Santero
Cosa Nostra - Tom SanteroCosa Nostra - Tom Santero
Cosa Nostra - Tom Santero
 
Bulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo CabanillaBulk Exporting from Cassandra - Carlo Cabanilla
Bulk Exporting from Cassandra - Carlo Cabanilla
 

Fact based monitoring

  • 1. Fact-based Monitoring puppetconf 2014 Alexis Lê-Quôc @alq
  • 2. Alexis Lê-Quôc, @alq CTO at Datadog
  • 3. Poll: Monitoring makes me… happy proud cry want to hide
  • 4. Puppet brings Automation to Systems Management
  • 5. Improve Monitoring the way Puppet has improved Systems Management
  • 6. “The good old days” • Your “CMDB” was Excel • SSH in and hack away • Little time for anything else
  • 7. Then Puppet came… • Expressive rules that capture expected result • Using facts and classifiers, a.k.a. metadata to figure out where to apply changes • That freed up a lot of our time* * on a per-machine basis
  • 8. “Puppet brings immunity of configuration to change in infrastructure” –Me (just now)
  • 9. I have seen this before…
  • 10. “[SQL brings] immunity of application to change in storage structure and access strategy” –C.J. Date (1977) http://www.cs.berkeley.edu/~brewer/cs262/SystemR.pdf
  • 11. SQL • 1974 IBM introduces System R and its Structured Query Language • Expressive rules that capture expected result • Using facts and predicates, a.k.a. metadata to figure out what data to get • That freed up a lot of development time
  • 12. SQL • From a time-consuming, imperative mess (“how”) • … to expressive data queries (“what”) SQL query SELECT (desired facts) FROM (existing facts) WHERE (matching criteria)
  • 13. Puppet • From a time-consuming, imperative mess (“how”) • … to expressive configuration queries (“what”) puppet apply CHANGE (desired facts) FROM (existing puppet facts) WHERE (matching puppet classes)
  • 14. Is there a pattern?
  • 15. “Break free from ever more complex naming conventions for hostnames as a means of identity. Use a very rich set of meta data provided by each machine to address them.” –MCollective overview
  • 16. MCollective • From a time-consuming, imperative mess (“how”) • … to expressive orchestration queries (“what”) mco rpc service restart service=nginx -F webpool=A EXEC (desired actions) FROM (existing puppet facts) WHERE (matching puppet classes)
  • 17. Back to monitoring • Monitoring is to behavior what Puppet is to configuration • Monitoring is to behavior what MCollective is to orchestration
  • 18. Monitoring • From a time-consuming, imperative mess (“how”) • … to expressive monitoring queries (“what”) Monitoring query MONITOR (desired behavior) FROM (existing heartbeats/metrics) WHERE (matching puppet facts)
  • 19. Examples • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” • “All PostgreSQL servers must have a postgres: bgwriter process running” • “At least one ActiveMQ server is up to support mcollective" • Never mention a hostname
  • 20. Hosts are not the center of the monitoring universe. Facts are! Hosts are just places where facts occur.
  • 21. The proof is in the pudding…
  • 22. Hosts at the center of the universe a.k.a. the Wrong Way
  • 23. “Its fairly straightforward, so hopefully you find things easy to understand…” –Nagios Core 4 manual on monitoring clusters
  • 24. Host-centric: Monitor a DNS cluster check_command check_service_cluster!"DNS Cluster"!0!1! $SERVICESTATEID:host1:DNS Service$,$SERVICESTATEID:host2:DNS Service$,$SERVICESTATEID:host3:DNS Service$ Where do host1, host2, host3 come from?
  • 25. Host-centric: can’t use facts directly • “Host groups solve this problem”. No, they don’t. • Combinatorial explosion, e.g. trivially • 4 data centers (us-1, us-2, eu, apac) • 5 classes (web, db, cache, appserver, hadoop) • 3 environments (test, staging, prod) • => up to 119 materialized host groups
  • 26. Nagios-bashing? • No! • Same fatal flaw with all host-centric monitoring tools • Host-centric monitoring forces an extra, expensive step: • replicate fact-based conditionals in host-centric templates
  • 27. “Please note that this module is not for the faint of heart. Even I (the author) have my head hurt each time I have to make modifications to it…” –puppet-nagios author
  • 28. Facts at the center of the universe a.k.a. the Right Way "De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj.uj.edu.pl. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:De_Revolutionibus_manuscript_p9b.jpg#mediaviewer/ File:De_Revolutionibus_manuscript_p9b.jpga
  • 29. Earlier Examples • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” • “All PostgreSQL servers must have a postgres: bgwriter process running” • “At least one ActiveMQ server is up to support mcollective"
  • 30. In Sensu (heartbeats) • “All PostgreSQL servers must have a postgres: bgwriter process running” class postgres::monitoring::sensu { sensu::subscription { 'postgres': } } • Monitoring using a fact-based query • Is node of class “postgres” and subscribed to “postgres” or not? • If so, it will execute the postgres check
  • 31. In Datadog (metrics) • “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms” $ puppet module install datadog-datadog_agent class { ‘datadog_agent’: api_key => …, tags => [$environment], fact_to_tags => [“datacenter”] } include datadog_agent::integrations::nginx
  • 32. In Datadog (metrics) • Monitoring using a fact-based query • Puppet facts directly reused max(nginx.request.latency{production,datacenter:ABC}) < 200
  • 33. What to take away
  • 34. Fact-based monitoring 1. Hosts are not at the center of the monitoring universe 2. Expressive monitoring uses queries 3. Monitoring queries should use Puppet facts