SlideShare a Scribd company logo
Monitoring & Alerting
Quick dive
How much do outages cost us?
Facebook - $500k in just 30 min of outage in 2014
Amazon - $66k/min
Industry average - $300k/hour
Industry total lost revenue - $26.5B
What is monitoring?
The process of becoming aware of the state of a system.
Is my website up and accessible?
Does all the important functionality work?
Is each server up?
Are all the applications we deployed up?
What’s my CPU usage per machine? disk? memory?
Swap?
Start simple
Basic monitoring systems that you can try straight away:
● Google analytics (Android, iOS, UNITY, HTTP, analytics.js)
● Fabric (Crashlytics integration for Android and iOS)
You can also check this detailed comparison table of different monitoring systems.
What does monitoring help with?
● Early problem detection
● Decision making
● Automation
Early problem detection
Performance
● Monitoring anomalies in the behavior of the system helps to detect resource
saturation and rare defects (hard to spot by QA)
● Particular types of bugs related to heavy system load are hard to detect in test
environments, but can be consistently reproduced in production
Availability
● Downtime usually translates directly to losses in revenue and credibility
● 99.99% availability is the industry standard (50min/year)
Decision making
Baselining
● Know the normal, average state of your system (baseline)
● Data-backed Service-Level Agreements (SLAs)
● In-depth performance analysis, saving costs
Predictions
● Help predict what normal traffic levels are during peaks of activity, like
holidays, social events and such (capacity planning)
● Close interaction with monitoring may help predict business trends
Automation
Allows system to automatically adapt to high load situations.
Bursts of input may saturate a system’s capacity and it may have to drop
some traffic. In order to prevent uniformly bad experience for all users an
attempt is made to reject a portion of inputs. This is commonly known as
admission control.
Monitoring system architecture
● Data collection
● Data aggregation and storage
● Presentation
Data collection
The source of data are logs, device statistics, and system measurements:
● Logging network request failure rates (4xx, 5xx)
● Tracking performance of calls to individual
remote services
● Database calls and response time
● Disk and CPU usage
● Logging mobile clients analytics events
Data aggregation and storage
● Incoming data inputs are grouped by their properties and stored as timeseries
● Resulting timeseries submitted to an alarm evaluation engine, which
generates alarms if anomalies are detected (anomaly detection).
One such system is Graphite.
Presentation
Allows visualisation of the real time state of the system. When a fault is identified
and fixed, the correction should be immediately visible.
One powerful tool for dashboarding is Grafana:
● Integrate with Graphite, InfluxDB, OpenTSDB, and KairosDB
● Introduction and basic concepts can be found here
● Useful video on how to setup your first dashboard
● Give it a try
Alerting
Alerting is the capability of a
monitoring system to detect and notify
the engineer about meaningful events.
Levels of alert urgency
● Alerts as records - anomalies that do not impact the service functionality.
● Alerts as notifications - do not need immediate attention.
● Alerts as pages - high severity, response time inforced by internal SLAs.
Tools
● Pagerduty
● OpsGenie
● VictorOps
Anomaly detection
The identification of items, events or observations which do not conform to an
expected pattern or other items in a dataset.
Let’s see how Uber does it.
Issue is detected and fixed, now what?
Detecting and fixing an issue are only the first steps. We need to make sure that the
issue does not happen again.
Use of postmortems is one interesting approach.
Challenges
● Baselining
● Coverage
● Manageability
● Accuracy
● Context
● Human nature
Conclusion
● Get in the habit of measuring, you cannot manage what you cannot measure
● Monitor extensively
● Alarm selectively
● Work smart, not hard, learn from the experience of others
● Have a tactic
Further reading: Effective Monitoring and Alerting
Thank you!
Contact:
sabin.roman@gmail.com
https://nl.linkedin.com/in/sabinroman

More Related Content

What's hot

Getting Started with Amazon EC2
Getting Started with Amazon EC2Getting Started with Amazon EC2
Getting Started with Amazon EC2
Amazon Web Services
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
Datadog
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWS
Massimo Ferre'
 
Application Performance Monitoring (APM)
Application Performance Monitoring (APM)Application Performance Monitoring (APM)
Application Performance Monitoring (APM)
Site24x7
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and how
Neeraj Bagga
 
Aws ppt
Aws pptAws ppt
Aws ppt
RamyaG50
 
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Web Services
 
Automated Deployments
Automated DeploymentsAutomated Deployments
Automated Deployments
Martin Etmajer
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
Knoldus Inc.
 
AWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets Manager
Amazon Web Services
 
AWS Systems manager 2019
AWS Systems manager 2019AWS Systems manager 2019
AWS Systems manager 2019
John Varghese
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Amazon Web Services
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)
Garvit Anand
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
Ian Massingham
 
Applications Performance Monitoring with Applications Manager part 1
Applications Performance Monitoring with Applications Manager part 1Applications Performance Monitoring with Applications Manager part 1
Applications Performance Monitoring with Applications Manager part 1
ManageEngine, Zoho Corporation
 
Using Azure DevOps to continuously build, test, and deploy containerized appl...
Using Azure DevOps to continuously build, test, and deploy containerized appl...Using Azure DevOps to continuously build, test, and deploy containerized appl...
Using Azure DevOps to continuously build, test, and deploy containerized appl...
Adrian Todorov
 
How to start performance testing project
How to start performance testing projectHow to start performance testing project
How to start performance testing project
NaveenKumar Namachivayam
 
Fundamentals Performance Testing
Fundamentals Performance TestingFundamentals Performance Testing
Fundamentals Performance Testing
Bhuvaneswari Subramani
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and Monitoring
Rick Hwang
 
AWS API Gateway
AWS API GatewayAWS API Gateway
AWS API Gateway
Muhammed YALÇIN
 

What's hot (20)

Getting Started with Amazon EC2
Getting Started with Amazon EC2Getting Started with Amazon EC2
Getting Started with Amazon EC2
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWS
 
Application Performance Monitoring (APM)
Application Performance Monitoring (APM)Application Performance Monitoring (APM)
Application Performance Monitoring (APM)
 
Observability, what, why and how
Observability, what, why and howObservability, what, why and how
Observability, what, why and how
 
Aws ppt
Aws pptAws ppt
Aws ppt
 
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
 
Automated Deployments
Automated DeploymentsAutomated Deployments
Automated Deployments
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
 
AWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets Manager
 
AWS Systems manager 2019
AWS Systems manager 2019AWS Systems manager 2019
AWS Systems manager 2019
 
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 
Applications Performance Monitoring with Applications Manager part 1
Applications Performance Monitoring with Applications Manager part 1Applications Performance Monitoring with Applications Manager part 1
Applications Performance Monitoring with Applications Manager part 1
 
Using Azure DevOps to continuously build, test, and deploy containerized appl...
Using Azure DevOps to continuously build, test, and deploy containerized appl...Using Azure DevOps to continuously build, test, and deploy containerized appl...
Using Azure DevOps to continuously build, test, and deploy containerized appl...
 
How to start performance testing project
How to start performance testing projectHow to start performance testing project
How to start performance testing project
 
Fundamentals Performance Testing
Fundamentals Performance TestingFundamentals Performance Testing
Fundamentals Performance Testing
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and Monitoring
 
AWS API Gateway
AWS API GatewayAWS API Gateway
AWS API Gateway
 

Similar to Monitoring & alerting presentation sabin&mustafa

Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance Testing
Ajay Kumar Vaddadi
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
Robert Kubiś
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
TirthMehta19
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016
Dieter Plaetinck
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
Eran Levy
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
Nenad Bozic
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
Steve Mushero
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
AppDynamics
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstracts
ncct
 
What is onTune for management
What is onTune for managementWhat is onTune for management
What is onTune for management
TeemStone Pty Ltd
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
The Shape of Cloud to Come
The Shape of Cloud to ComeThe Shape of Cloud to Come
The Shape of Cloud to Come
Marc Tudurí Cladera
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
Ashutosh Agarwal
 
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony GoddardOSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
NETWAYS
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)
RichardWarburton
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
Knoldus Inc.
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
Argyle Executive Forum
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
Puppet
 

Similar to Monitoring & alerting presentation sabin&mustafa (20)

Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance Testing
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
 
Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016Next generation alerting and fault detection, SRECon Europe 2016
Next generation alerting and fault detection, SRECon Europe 2016
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
Challenges of monitoring distributed systems
Challenges of monitoring distributed systemsChallenges of monitoring distributed systems
Challenges of monitoring distributed systems
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
Asp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ AbstractsAsp Abstracts, Sample Copy 15+ Abstracts
Asp Abstracts, Sample Copy 15+ Abstracts
 
What is onTune for management
What is onTune for managementWhat is onTune for management
What is onTune for management
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
The Shape of Cloud to Come
The Shape of Cloud to ComeThe Shape of Cloud to Come
The Shape of Cloud to Come
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony GoddardOSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
OSMC 2017 | Monitoring Challenges in a World of Automation by Anthony Goddard
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)
 
PreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive ApplicationPreMonR - A Reactive Platform To Monitor Reactive Application
PreMonR - A Reactive Platform To Monitor Reactive Application
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
PuppetConf 2017: Deploying is Only Half the Battle! Operationalizing Applicat...
 

More from Lama K Banna

The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdf
Lama K Banna
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdf
Lama K Banna
 
Investment proposal
Investment proposalInvestment proposal
Investment proposal
Lama K Banna
 
Funding proposal
Funding proposalFunding proposal
Funding proposal
Lama K Banna
 
5 incisions
5 incisions5 incisions
5 incisions
Lama K Banna
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery
Lama K Banna
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery
Lama K Banna
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial Surgery
Lama K Banna
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery
Lama K Banna
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmd
Lama K Banna
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular joint
Lama K Banna
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3
Lama K Banna
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examination
Lama K Banna
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2
Lama K Banna
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial clefts
Lama K Banna
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lama K Banna
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformities
Lama K Banna
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorders
Lama K Banna
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3
Lama K Banna
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial trauma
Lama K Banna
 

More from Lama K Banna (20)

The TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdfThe TikTok Masterclass Deck.pdf
The TikTok Masterclass Deck.pdf
 
دليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdfدليل كتابة المشاريع.pdf
دليل كتابة المشاريع.pdf
 
Investment proposal
Investment proposalInvestment proposal
Investment proposal
 
Funding proposal
Funding proposalFunding proposal
Funding proposal
 
5 incisions
5 incisions5 incisions
5 incisions
 
Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery Lecture 3 facial cosmetic surgery
Lecture 3 facial cosmetic surgery
 
lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery lecture 1 facial cosmatic surgery
lecture 1 facial cosmatic surgery
 
Facial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial SurgeryFacial neuropathology Maxillofacial Surgery
Facial neuropathology Maxillofacial Surgery
 
Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery Lecture 2 Facial cosmatic surgery
Lecture 2 Facial cosmatic surgery
 
Lecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmdLecture 12 general considerations in treatment of tmd
Lecture 12 general considerations in treatment of tmd
 
Lecture 10 temporomandibular joint
Lecture 10 temporomandibular jointLecture 10 temporomandibular joint
Lecture 10 temporomandibular joint
 
Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3Lecture 11 temporomandibular joint Part 3
Lecture 11 temporomandibular joint Part 3
 
Lecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examinationLecture 9 TMJ anatomy examination
Lecture 9 TMJ anatomy examination
 
Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2Lecture 7 correction of dentofacial deformities Part 2
Lecture 7 correction of dentofacial deformities Part 2
 
Lecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial cleftsLecture 8 management of patients with orofacial clefts
Lecture 8 management of patients with orofacial clefts
 
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2Lecture 5 Diagnosis and management of salivary gland disorders Part 2
Lecture 5 Diagnosis and management of salivary gland disorders Part 2
 
Lecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformitiesLecture 6 correction of dentofacial deformities
Lecture 6 correction of dentofacial deformities
 
lecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorderslecture 4 Diagnosis and management of salivary gland disorders
lecture 4 Diagnosis and management of salivary gland disorders
 
Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3Lecture 3 maxillofacial trauma part 3
Lecture 3 maxillofacial trauma part 3
 
Lecture 2 maxillofacial trauma
Lecture 2 maxillofacial traumaLecture 2 maxillofacial trauma
Lecture 2 maxillofacial trauma
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Monitoring & alerting presentation sabin&mustafa

  • 2. How much do outages cost us? Facebook - $500k in just 30 min of outage in 2014 Amazon - $66k/min Industry average - $300k/hour Industry total lost revenue - $26.5B
  • 3. What is monitoring? The process of becoming aware of the state of a system. Is my website up and accessible? Does all the important functionality work? Is each server up? Are all the applications we deployed up? What’s my CPU usage per machine? disk? memory? Swap?
  • 4. Start simple Basic monitoring systems that you can try straight away: ● Google analytics (Android, iOS, UNITY, HTTP, analytics.js) ● Fabric (Crashlytics integration for Android and iOS) You can also check this detailed comparison table of different monitoring systems.
  • 5. What does monitoring help with? ● Early problem detection ● Decision making ● Automation
  • 6. Early problem detection Performance ● Monitoring anomalies in the behavior of the system helps to detect resource saturation and rare defects (hard to spot by QA) ● Particular types of bugs related to heavy system load are hard to detect in test environments, but can be consistently reproduced in production Availability ● Downtime usually translates directly to losses in revenue and credibility ● 99.99% availability is the industry standard (50min/year)
  • 7. Decision making Baselining ● Know the normal, average state of your system (baseline) ● Data-backed Service-Level Agreements (SLAs) ● In-depth performance analysis, saving costs Predictions ● Help predict what normal traffic levels are during peaks of activity, like holidays, social events and such (capacity planning) ● Close interaction with monitoring may help predict business trends
  • 8. Automation Allows system to automatically adapt to high load situations. Bursts of input may saturate a system’s capacity and it may have to drop some traffic. In order to prevent uniformly bad experience for all users an attempt is made to reject a portion of inputs. This is commonly known as admission control.
  • 9. Monitoring system architecture ● Data collection ● Data aggregation and storage ● Presentation
  • 10. Data collection The source of data are logs, device statistics, and system measurements: ● Logging network request failure rates (4xx, 5xx) ● Tracking performance of calls to individual remote services ● Database calls and response time ● Disk and CPU usage ● Logging mobile clients analytics events
  • 11. Data aggregation and storage ● Incoming data inputs are grouped by their properties and stored as timeseries ● Resulting timeseries submitted to an alarm evaluation engine, which generates alarms if anomalies are detected (anomaly detection). One such system is Graphite.
  • 12. Presentation Allows visualisation of the real time state of the system. When a fault is identified and fixed, the correction should be immediately visible. One powerful tool for dashboarding is Grafana: ● Integrate with Graphite, InfluxDB, OpenTSDB, and KairosDB ● Introduction and basic concepts can be found here ● Useful video on how to setup your first dashboard ● Give it a try
  • 13. Alerting Alerting is the capability of a monitoring system to detect and notify the engineer about meaningful events.
  • 14. Levels of alert urgency ● Alerts as records - anomalies that do not impact the service functionality. ● Alerts as notifications - do not need immediate attention. ● Alerts as pages - high severity, response time inforced by internal SLAs.
  • 16. Anomaly detection The identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Let’s see how Uber does it.
  • 17. Issue is detected and fixed, now what? Detecting and fixing an issue are only the first steps. We need to make sure that the issue does not happen again. Use of postmortems is one interesting approach.
  • 18. Challenges ● Baselining ● Coverage ● Manageability ● Accuracy ● Context ● Human nature
  • 19. Conclusion ● Get in the habit of measuring, you cannot manage what you cannot measure ● Monitor extensively ● Alarm selectively ● Work smart, not hard, learn from the experience of others ● Have a tactic Further reading: Effective Monitoring and Alerting

Editor's Notes

  1. Today we will discuss about what we love the most in engineering, being waken up at 4am in the morning because of a bug! Talk about how to detect problems with your application and how to fix them as soon as possible
  2. Has anybody used this tools?
  3. The ability to predict demands and then match them based on seasonality translates directly into revenue gains
  4. When a data store that supports a user-facing service starts serving queries much slower than usual, but not slow enough to make an appreciable difference in the overall service’s response time, that should generate a low-urgency alert that is recorded in your monitoring system for future reference or investigation but does not interrupt anyone’s work the data store is running low on disk space and should be scaled out in the next several days
  5. Pics, charts, examples, how much time it takes to setup system, conclusion, pitfalls,
  6. Baselining: “nothing endures but change” Coverage: systems evolve, so should the coverage
  7. Tactic: Runbooks 80% disc storage issue