SlideShare a Scribd company logo
Practical Monitoring Techniques
Today's Talk
●
Our Mission
●
Current Tools
●
Increasing Coverage
●
PD Schedules
●
Automatic Self Healing
●
Bots And Alerts channels
●
Events Dashboard
●
Dashboard Accessibility
●
Best Practices Summary
Our Mission
Back up culture with the proper tools to support it
Current Tools
●
Metrics collections: Collectd, statsd, Cloudwatch
●
Monitoring: Sensu, NewRelic
●
Alert channels: PagerDuty, emails, slack
●
Dashboards: Grafana, CloudWatch, NewRelic
●
Application testing: E2E Testing System
●
Internal tools: Sensu mobile, events system,
Sensu bar and more
Increasing Coverage
●
Automatic collection of basic
system and 3rd party metrics
for new instances
●
Add alerts automatically for
new instance of existed
subscriber
●
Each Developer / DevOps is
responsible for monitoring his
application / infrastructure
●
Easy method to add new
alerts and dashboards
●
Automatic events flow
Pager Schedules
●
Divided into logical groups of ownership
●
Schedule has escalation point
●
On call should be able to connect and respond to
issues in his area
●
Easy method to override schedule
●
Ability to contact relevant on call
●
Ability to page relevant on call
Automatic Self Healing
●
Better MTTR
●
Avoid waking On Call if
possible
●
Log activity to float
recurrent issues
●
Limit the healing to avoid
restart loops
●
Make sure to sync
Healer Alert↔
Bots, Integrations and Alerts Channels
●
Alerts channels: Emails, slack, PD mobile, sms, calls
●
Integrations: Sensu to PD/Slack, CloudWatch to PD,
3rd party (EX: CouchBase, NewRelic, etc) to PD,
●
Slack Bot:
Events Dashboard
●
Simple Rest API for sending events
●
Clean timeline view to spot production events
●
Connections between events (“depends on” and “dependents”)
●
Detailed view for each event
Accessibility
●
Available from everywhere by mobile
●
Easy to ack, resolve, mute alerts
●
Slack bots to reach help
●
Automatically get graph with the alert
●
Ability to search, edit, copy, etc alerts
●
Treat alerts management as code (SVC, DB,
backups, etc)
Best Practices Summary
●
Share the pain
●
Automate base metrics
●
Automate healing
●
Make help reachable
●
Make it easy to add alerts and dashboards
●
Use warning levels as soft events to avoid phone calls at night
●
Automate graphs in alerts
●
Positive alerting system check each day
●
Dependencies between alerts
●
Postmortems
Questions

More Related Content

Viewers also liked

DevOps Roadtrip Minneapolis
DevOps Roadtrip Minneapolis DevOps Roadtrip Minneapolis
DevOps Roadtrip Minneapolis
VictorOps
 
DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015
Yuval Yeret
 
Devoxx 2014 monitoring
Devoxx 2014 monitoringDevoxx 2014 monitoring
Devoxx 2014 monitoring
Claude Falguiere
 
Run IT Support the DevOps Way
Run IT Support the DevOps WayRun IT Support the DevOps Way
Run IT Support the DevOps Way
Atlassian
 
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service ProvidersJelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic Multi-Cloud PaaS
 
Paris Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMXParis Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMX
Cyrille Le Clerc
 
Devops the Microsoft Way
Devops the Microsoft WayDevops the Microsoft Way
Devops the Microsoft Way
Patrick Chanezon
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
gjuljo
 
DevOps monitoring: Feedback loops in enterprise environments
DevOps monitoring: Feedback loops in enterprise environmentsDevOps monitoring: Feedback loops in enterprise environments
DevOps monitoring: Feedback loops in enterprise environments
Jonah Kowall
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
Keiichiro Ono
 

Viewers also liked (10)

DevOps Roadtrip Minneapolis
DevOps Roadtrip Minneapolis DevOps Roadtrip Minneapolis
DevOps Roadtrip Minneapolis
 
DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015
 
Devoxx 2014 monitoring
Devoxx 2014 monitoringDevoxx 2014 monitoring
Devoxx 2014 monitoring
 
Run IT Support the DevOps Way
Run IT Support the DevOps WayRun IT Support the DevOps Way
Run IT Support the DevOps Way
 
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service ProvidersJelastic - DevOps PaaS Business with Docker Support for Service Providers
Jelastic - DevOps PaaS Business with Docker Support for Service Providers
 
Paris Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMXParis Devops - Monitoring And Feature Toggle Pattern With JMX
Paris Devops - Monitoring And Feature Toggle Pattern With JMX
 
Devops the Microsoft Way
Devops the Microsoft WayDevops the Microsoft Way
Devops the Microsoft Way
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
DevOps monitoring: Feedback loops in enterprise environments
DevOps monitoring: Feedback loops in enterprise environmentsDevOps monitoring: Feedback loops in enterprise environments
DevOps monitoring: Feedback loops in enterprise environments
 
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...
 

Similar to Practical Monitoring Techniques

Anitha_Resume_BigData
Anitha_Resume_BigDataAnitha_Resume_BigData
Anitha_Resume_BigData
Anitha Bade
 
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
SculptSoft Private Limited
 
Modern incident management
Modern incident management Modern incident management
Modern incident management
OpsGenie
 
Copy of webinar modern incident management (1)
Copy of webinar  modern incident management (1)Copy of webinar  modern incident management (1)
Copy of webinar modern incident management (1)
Pırıl Kavlak
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
T. Alexander Lystad
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
OpsTree solutions
 
Accurate systems - ERP
Accurate systems - ERPAccurate systems - ERP
Accurate systems - ERP
Mohammed Abukhamseen
 
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
ITIL Indonesia
 
Event management slide share
Event management slide shareEvent management slide share
Event management slide share
KADAMBINI SHREE
 
Sigma Conso Consolidation & Reporting
Sigma Conso Consolidation & ReportingSigma Conso Consolidation & Reporting
Sigma Conso Consolidation & Reporting
sigmaconsoasia
 
Flexible Custom Workflows for Banner ERP and the Campus
Flexible Custom Workflows for Banner ERP and the CampusFlexible Custom Workflows for Banner ERP and the Campus
Flexible Custom Workflows for Banner ERP and the Campus
Bonitasoft
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Sri Ambati
 
Cutting Costs with Automation
Cutting Costs with AutomationCutting Costs with Automation
Cutting Costs with Automation
Vaporware
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
Michael Ghen
 
Tracking and Controlling Technical Documentation Projects
Tracking and Controlling Technical Documentation ProjectsTracking and Controlling Technical Documentation Projects
Tracking and Controlling Technical Documentation Projects
Saiff Solutions, Inc.
 
Data driven @startups
Data driven @startups Data driven @startups
Data driven @startups
IIMBNSRCEL
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
Rob Winters
 
ML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptxML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptx
AltafSMT
 
Core Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computersCore Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computers
Shikha Gupta
 
Project management
Project managementProject management
Project management
Ahmed Said
 

Similar to Practical Monitoring Techniques (20)

Anitha_Resume_BigData
Anitha_Resume_BigDataAnitha_Resume_BigData
Anitha_Resume_BigData
 
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
Choosing the Right Real Estate Management Software - A Comprehensive Guide.do...
 
Modern incident management
Modern incident management Modern incident management
Modern incident management
 
Copy of webinar modern incident management (1)
Copy of webinar  modern incident management (1)Copy of webinar  modern incident management (1)
Copy of webinar modern incident management (1)
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
Accurate systems - ERP
Accurate systems - ERPAccurate systems - ERP
Accurate systems - ERP
 
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
(ONLINE) ITIL Indonesia Community – Meetup “ITIL Introduction: Incident and P...
 
Event management slide share
Event management slide shareEvent management slide share
Event management slide share
 
Sigma Conso Consolidation & Reporting
Sigma Conso Consolidation & ReportingSigma Conso Consolidation & Reporting
Sigma Conso Consolidation & Reporting
 
Flexible Custom Workflows for Banner ERP and the Campus
Flexible Custom Workflows for Banner ERP and the CampusFlexible Custom Workflows for Banner ERP and the Campus
Flexible Custom Workflows for Banner ERP and the Campus
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
Cutting Costs with Automation
Cutting Costs with AutomationCutting Costs with Automation
Cutting Costs with Automation
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Tracking and Controlling Technical Documentation Projects
Tracking and Controlling Technical Documentation ProjectsTracking and Controlling Technical Documentation Projects
Tracking and Controlling Technical Documentation Projects
 
Data driven @startups
Data driven @startups Data driven @startups
Data driven @startups
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
ML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptxML_Internship Presentation_Infidata_2021.pptx
ML_Internship Presentation_Infidata_2021.pptx
 
Core Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computersCore Areas of a CA- Interlinked with computers
Core Areas of a CA- Interlinked with computers
 
Project management
Project managementProject management
Project management
 

More from Ariel Moskovich

Consul scale
Consul scaleConsul scale
Consul scale
Ariel Moskovich
 
Kafka ops-new
Kafka ops-newKafka ops-new
Kafka ops-new
Ariel Moskovich
 
Docker appsflyer
Docker appsflyerDocker appsflyer
Docker appsflyer
Ariel Moskovich
 
Advanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAdvanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the Field
Ariel Moskovich
 
Consul
ConsulConsul
sensu
sensusensu
devopstools
devopstoolsdevopstools
devopstools
Ariel Moskovich
 
kafka
kafkakafka
Bouncer
BouncerBouncer
Devopstools
DevopstoolsDevopstools
Devopstools
Ariel Moskovich
 
Kafka aws
Kafka awsKafka aws
Kafka aws
Ariel Moskovich
 
Docker in prod
Docker in prodDocker in prod
Docker in prod
Ariel Moskovich
 
Docker tlv
Docker tlvDocker tlv
Docker tlv
Ariel Moskovich
 

More from Ariel Moskovich (13)

Consul scale
Consul scaleConsul scale
Consul scale
 
Kafka ops-new
Kafka ops-newKafka ops-new
Kafka ops-new
 
Docker appsflyer
Docker appsflyerDocker appsflyer
Docker appsflyer
 
Advanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the FieldAdvanced Code Flow, Notes From the Field
Advanced Code Flow, Notes From the Field
 
Consul
ConsulConsul
Consul
 
sensu
sensusensu
sensu
 
devopstools
devopstoolsdevopstools
devopstools
 
kafka
kafkakafka
kafka
 
Bouncer
BouncerBouncer
Bouncer
 
Devopstools
DevopstoolsDevopstools
Devopstools
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Docker in prod
Docker in prodDocker in prod
Docker in prod
 
Docker tlv
Docker tlvDocker tlv
Docker tlv
 

Recently uploaded

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
Prakhyath Rai
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
ijaia
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
upoux
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 

Recently uploaded (20)

Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...Software Engineering and Project Management - Software Testing + Agile Method...
Software Engineering and Project Management - Software Testing + Agile Method...
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
一比一原版(uofo毕业证书)美国俄勒冈大学毕业证如何办理
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 

Practical Monitoring Techniques

  • 2. Today's Talk ● Our Mission ● Current Tools ● Increasing Coverage ● PD Schedules ● Automatic Self Healing ● Bots And Alerts channels ● Events Dashboard ● Dashboard Accessibility ● Best Practices Summary
  • 3. Our Mission Back up culture with the proper tools to support it
  • 4. Current Tools ● Metrics collections: Collectd, statsd, Cloudwatch ● Monitoring: Sensu, NewRelic ● Alert channels: PagerDuty, emails, slack ● Dashboards: Grafana, CloudWatch, NewRelic ● Application testing: E2E Testing System ● Internal tools: Sensu mobile, events system, Sensu bar and more
  • 5. Increasing Coverage ● Automatic collection of basic system and 3rd party metrics for new instances ● Add alerts automatically for new instance of existed subscriber ● Each Developer / DevOps is responsible for monitoring his application / infrastructure ● Easy method to add new alerts and dashboards ● Automatic events flow
  • 6. Pager Schedules ● Divided into logical groups of ownership ● Schedule has escalation point ● On call should be able to connect and respond to issues in his area ● Easy method to override schedule ● Ability to contact relevant on call ● Ability to page relevant on call
  • 7. Automatic Self Healing ● Better MTTR ● Avoid waking On Call if possible ● Log activity to float recurrent issues ● Limit the healing to avoid restart loops ● Make sure to sync Healer Alert↔
  • 8. Bots, Integrations and Alerts Channels ● Alerts channels: Emails, slack, PD mobile, sms, calls ● Integrations: Sensu to PD/Slack, CloudWatch to PD, 3rd party (EX: CouchBase, NewRelic, etc) to PD, ● Slack Bot:
  • 9. Events Dashboard ● Simple Rest API for sending events ● Clean timeline view to spot production events ● Connections between events (“depends on” and “dependents”) ● Detailed view for each event
  • 10. Accessibility ● Available from everywhere by mobile ● Easy to ack, resolve, mute alerts ● Slack bots to reach help ● Automatically get graph with the alert ● Ability to search, edit, copy, etc alerts ● Treat alerts management as code (SVC, DB, backups, etc)
  • 11. Best Practices Summary ● Share the pain ● Automate base metrics ● Automate healing ● Make help reachable ● Make it easy to add alerts and dashboards ● Use warning levels as soft events to avoid phone calls at night ● Automate graphs in alerts ● Positive alerting system check each day ● Dependencies between alerts ● Postmortems