SlideShare a Scribd company logo
1 of 40
Download to read offline
Monitoring
Challenges in a
World of
Automation
Monitoring is hard enough on its own.
Automation makes it harder.
Anthony Goddard
VP Operations
Sensu, Inc.
@anthonygoddard // @sensu
● Open core monitoring framework, released in 2011
● Enterprise offering launched in 2015
● Sensu Inc formed in January 2017
● 20 employees & growing!
About Sensu
What is Sensu?
● An open source, cloud native monitoring framework
● The monitoring router
● Infrastructure, service, and application monitoring
● Designed for automation
● Cross platform (linux, Windows, BSD, AIX, Solaris, MacOS, etc)
● Learn more: https://sensuapp.org
Mission Statement
Obviate the need to (re)build custom monitoring solutions.
This isn't a talk about
Sensu.
Purpose of this talk
● Discuss challenges of monitoring ephemeral systems
● Review basic cloud native monitoring requirements
○ Automated discovery
○ Automated monitoring
○ Automated decommissioning
● Talk about cloud native monitoring anti-patterns
● Live demo! (what could possibly go wrong?)
Let's do this
Cloud computing has
changed the world.
Which came first? Cloud computing or DevOps?
Problem Statement
● Cloud platforms and automation systems cause changes in
infrastructure that increase the complexity of monitoring
● New systems/endpoints must be discovered and monitored
automatically
● Monitoring must now distinguish the subtle differences between
"down" and "decommissioned"
Expectations
Our infrastructure is becoming increasingly more automated and ephemeral.
Shouldn't we expect similar capabilities from our monitoring?
Cloud Native Monitoring Requirements
Overview
1. Automated discovery
2. Automated monitoring
3. Automated decommissioning
1. Automated
Discovery
New systems should be
automatically discovered.
Cloud Native Monitoring Requirements
Cloud concepts
● Provisioning events create and replace instances
● Cloud providers automate replication of instances (e.g.
auto-scaling groups, etc)
● APIs allow external systems to invoke provisioning events
Automated Discovery
Automated Discovery
Cloud monitoring anti-patterns
● Polling-based discovery (regardless of protocol)
● Discovery that precludes complex network topologies
● Punching holes in firewalls (ingress traffic)
Polling is not a reliable discovery solution.
Automated Discovery
Cloud-native monitoring requirements
● New systems must be discovered in realtime
● Provide push-based or event-based discovery + discovery APIs
2. Automated
Monitoring
New systems should be
monitored automatically.
Cloud Native Monitoring Requirements
Automated Monitoring
Cloud concepts
● Almost all infrastructures are distributed systems
● Disparate systems fulfill unique roles (e.g. db, web service)
● Simple architectures = one or more roles per system
● Complex architectures = one role per system
Automated Monitoring
Cloud monitoring anti-patterns
● Monitoring configuration mapped to individual systems
● Monitoring via remote access (e.g. SSH, WinRM, NRPE)
Nope.
Automated Monitoring
Cloud-native monitoring requirements
● Monitoring configuration should be mapped to roles
● Monitoring should begin the moment systems come online
Automated monitoring should "just work"
3. Automated
Decommissioning
Terminated systems should be
automatically removed
from monitoring.
Cloud Native Monitoring Requirements
Automated Decommissioning
Cloud Concepts
● Utility computing incentivizes cost savings
● Decommission systems when not in use, or during reduced load
● Intentional actions look very similar to failure scenarios
Automated Decommissioning
Cloud monitoring anti-patterns
● Making assumptions about the lack of monitoring data
● Making assumptions about the loss of network connectivity
● Using a monitoring system as a source of absolute truth
Cloud-native monitoring requirements
● Should be invoked by the terminated system (i.e. stop signal)
● May be triggered by the provisioning system (i.e. via APIs)
● Optionally verified via external source(s) of truth (as needed)
● Must be the most reliable function of the monitoring system
Automated Decommissioning
When you can no longer trust your monitoring alerts.
Demo!
But first, some questions…
Public/Private Cloud (IaaS)
Who knows what "the cloud" is?
Who understands basic cloud computing
concepts like ASGs and ELBs?
Who is currently using a IaaS provider like
AWS, GCP, Azure, or OpenStack?
Kubernetes
Who knows what Kubernetes is?
Who has Kubernetes on their roadmap?
Who is currently using Kubernetes?
Audience participation time!
(DEMO)
QUESTIONS?
Conclusion
● Cloud computing introduces challenges that demand
cloud-native monitoring solutions.
● Monitoring solutions must automatically discover new systems.
● Monitoring configuration should be applied automatically.
● Monitoring should comprehend "down" vs "decommissioned".
Thank You

More Related Content

What's hot

Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write ModulesIcinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga
 

What's hot (20)

Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...
Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...
Nagios Conference 2014 - Tanja Lewit - Nagios and Kentix System Partners - Cr...
 
DevSecOps - Security in DevOps
DevSecOps - Security in DevOpsDevSecOps - Security in DevOps
DevSecOps - Security in DevOps
 
MoniTutor
MoniTutorMoniTutor
MoniTutor
 
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios SolutionsNagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
 
Icinga Camp Amsterdam - How to monitor Windows
Icinga Camp Amsterdam - How to monitor WindowsIcinga Camp Amsterdam - How to monitor Windows
Icinga Camp Amsterdam - How to monitor Windows
 
Bbva bank on Open Stack
Bbva bank on Open StackBbva bank on Open Stack
Bbva bank on Open Stack
 
Zabbix
ZabbixZabbix
Zabbix
 
Icinga Camp Bangalore - Icinga2 and Ansible
Icinga Camp Bangalore - Icinga2 and AnsibleIcinga Camp Bangalore - Icinga2 and Ansible
Icinga Camp Bangalore - Icinga2 and Ansible
 
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write ModulesIcinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
Icinga Camp Berlin 2017 - Icinga Web 2 - How to Write Modules
 
DevSecOps on Azure
DevSecOps on AzureDevSecOps on Azure
DevSecOps on Azure
 
Secure Development of Azure Function
Secure Development of Azure FunctionSecure Development of Azure Function
Secure Development of Azure Function
 
Icinga Director
Icinga DirectorIcinga Director
Icinga Director
 
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDealIcinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
Icinga Camp Bangalore - Icinga2 and Salt Stack at SnapDeal
 
How to Get Better Performance Out of Your App
How to Get Better Performance Out of Your AppHow to Get Better Performance Out of Your App
How to Get Better Performance Out of Your App
 
Icinga Camp San Diego 2016 - Icinga Director
Icinga Camp San Diego 2016 - Icinga DirectorIcinga Camp San Diego 2016 - Icinga Director
Icinga Camp San Diego 2016 - Icinga Director
 
Icinga at Flossuk 2015 in York
Icinga at Flossuk 2015 in YorkIcinga at Flossuk 2015 in York
Icinga at Flossuk 2015 in York
 
Icinga Camp Berlin 2017 - Integrations all the way
Icinga Camp Berlin 2017 - Integrations all the wayIcinga Camp Berlin 2017 - Integrations all the way
Icinga Camp Berlin 2017 - Integrations all the way
 
Presentation about Icinga at Kiratech DevOps Day in Verona
Presentation about Icinga at Kiratech DevOps Day in VeronaPresentation about Icinga at Kiratech DevOps Day in Verona
Presentation about Icinga at Kiratech DevOps Day in Verona
 
Icinga Camp Berlin 2017 - Icinga Director
Icinga Camp Berlin 2017 - Icinga DirectorIcinga Camp Berlin 2017 - Icinga Director
Icinga Camp Berlin 2017 - Icinga Director
 
Icinga @ OSMC 2014
Icinga @ OSMC 2014Icinga @ OSMC 2014
Icinga @ OSMC 2014
 

Similar to OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard

OSMC 2015: The Assimilation Project by Alan Robertson
OSMC 2015: The Assimilation Project by Alan RobertsonOSMC 2015: The Assimilation Project by Alan Robertson
OSMC 2015: The Assimilation Project by Alan Robertson
NETWAYS
 

Similar to OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard (20)

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
OSMC 2015: The Assimilation Project by Alan Robertson
OSMC 2015: The Assimilation Project by Alan RobertsonOSMC 2015: The Assimilation Project by Alan Robertson
OSMC 2015: The Assimilation Project by Alan Robertson
 
OSMC 2015 | The Assimilation Project by Alan Robertson
OSMC 2015 | The Assimilation Project by Alan Robertson OSMC 2015 | The Assimilation Project by Alan Robertson
OSMC 2015 | The Assimilation Project by Alan Robertson
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Operational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU SeminarOperational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU Seminar
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC
 
ENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in HeavenENT203 Monitoring and Autoscaling, a Match Made in Heaven
ENT203 Monitoring and Autoscaling, a Match Made in Heaven
 
Why integration is key in IoT solutions? (Sam Vanhoutte @Integrate2017)
Why integration is key in IoT solutions? (Sam Vanhoutte @Integrate2017)Why integration is key in IoT solutions? (Sam Vanhoutte @Integrate2017)
Why integration is key in IoT solutions? (Sam Vanhoutte @Integrate2017)
 
OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructur...
OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructur...OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructur...
OSMC 2019 | Automating the conficuration of Monitoring on Large Infrastructur...
 
Agent-less system and application monitoring with HP OpenView
Agent-less system and application monitoring with HP OpenViewAgent-less system and application monitoring with HP OpenView
Agent-less system and application monitoring with HP OpenView
 
Unified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The HourUnified Cloud Performance Monitoring - The Need of The Hour
Unified Cloud Performance Monitoring - The Need of The Hour
 
Itsummit2015 blizzard
Itsummit2015 blizzardItsummit2015 blizzard
Itsummit2015 blizzard
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Monitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafaMonitoring & alerting presentation sabin&mustafa
Monitoring & alerting presentation sabin&mustafa
 
Kick starting Network Automation
Kick starting Network AutomationKick starting Network Automation
Kick starting Network Automation
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 

OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard

  • 1. Monitoring Challenges in a World of Automation Monitoring is hard enough on its own. Automation makes it harder.
  • 2. Anthony Goddard VP Operations Sensu, Inc. @anthonygoddard // @sensu
  • 3. ● Open core monitoring framework, released in 2011 ● Enterprise offering launched in 2015 ● Sensu Inc formed in January 2017 ● 20 employees & growing! About Sensu
  • 4. What is Sensu? ● An open source, cloud native monitoring framework ● The monitoring router ● Infrastructure, service, and application monitoring ● Designed for automation ● Cross platform (linux, Windows, BSD, AIX, Solaris, MacOS, etc) ● Learn more: https://sensuapp.org
  • 5. Mission Statement Obviate the need to (re)build custom monitoring solutions.
  • 6. This isn't a talk about Sensu.
  • 7. Purpose of this talk ● Discuss challenges of monitoring ephemeral systems ● Review basic cloud native monitoring requirements ○ Automated discovery ○ Automated monitoring ○ Automated decommissioning ● Talk about cloud native monitoring anti-patterns ● Live demo! (what could possibly go wrong?)
  • 10. Which came first? Cloud computing or DevOps?
  • 11. Problem Statement ● Cloud platforms and automation systems cause changes in infrastructure that increase the complexity of monitoring ● New systems/endpoints must be discovered and monitored automatically ● Monitoring must now distinguish the subtle differences between "down" and "decommissioned"
  • 12. Expectations Our infrastructure is becoming increasingly more automated and ephemeral. Shouldn't we expect similar capabilities from our monitoring?
  • 13. Cloud Native Monitoring Requirements Overview 1. Automated discovery 2. Automated monitoring 3. Automated decommissioning
  • 15. New systems should be automatically discovered. Cloud Native Monitoring Requirements
  • 16. Cloud concepts ● Provisioning events create and replace instances ● Cloud providers automate replication of instances (e.g. auto-scaling groups, etc) ● APIs allow external systems to invoke provisioning events Automated Discovery
  • 17. Automated Discovery Cloud monitoring anti-patterns ● Polling-based discovery (regardless of protocol) ● Discovery that precludes complex network topologies ● Punching holes in firewalls (ingress traffic)
  • 18. Polling is not a reliable discovery solution.
  • 19. Automated Discovery Cloud-native monitoring requirements ● New systems must be discovered in realtime ● Provide push-based or event-based discovery + discovery APIs
  • 21. New systems should be monitored automatically. Cloud Native Monitoring Requirements
  • 22. Automated Monitoring Cloud concepts ● Almost all infrastructures are distributed systems ● Disparate systems fulfill unique roles (e.g. db, web service) ● Simple architectures = one or more roles per system ● Complex architectures = one role per system
  • 23. Automated Monitoring Cloud monitoring anti-patterns ● Monitoring configuration mapped to individual systems ● Monitoring via remote access (e.g. SSH, WinRM, NRPE)
  • 24. Nope.
  • 25. Automated Monitoring Cloud-native monitoring requirements ● Monitoring configuration should be mapped to roles ● Monitoring should begin the moment systems come online
  • 28. Terminated systems should be automatically removed from monitoring. Cloud Native Monitoring Requirements
  • 29. Automated Decommissioning Cloud Concepts ● Utility computing incentivizes cost savings ● Decommission systems when not in use, or during reduced load ● Intentional actions look very similar to failure scenarios
  • 30. Automated Decommissioning Cloud monitoring anti-patterns ● Making assumptions about the lack of monitoring data ● Making assumptions about the loss of network connectivity ● Using a monitoring system as a source of absolute truth
  • 31. Cloud-native monitoring requirements ● Should be invoked by the terminated system (i.e. stop signal) ● May be triggered by the provisioning system (i.e. via APIs) ● Optionally verified via external source(s) of truth (as needed) ● Must be the most reliable function of the monitoring system Automated Decommissioning
  • 32. When you can no longer trust your monitoring alerts.
  • 33. Demo! But first, some questions…
  • 34. Public/Private Cloud (IaaS) Who knows what "the cloud" is? Who understands basic cloud computing concepts like ASGs and ELBs? Who is currently using a IaaS provider like AWS, GCP, Azure, or OpenStack? Kubernetes Who knows what Kubernetes is? Who has Kubernetes on their roadmap? Who is currently using Kubernetes? Audience participation time!
  • 36.
  • 37.
  • 39. Conclusion ● Cloud computing introduces challenges that demand cloud-native monitoring solutions. ● Monitoring solutions must automatically discover new systems. ● Monitoring configuration should be applied automatically. ● Monitoring should comprehend "down" vs "decommissioned".