SlideShare a Scribd company logo
SRE Demystified
Distributed Monitoring
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com,
http://ganeshniyer.com
Dr Ganesh Neelakanta Iyer
SRE
•
2https://image.slidesharecdn.com/devopssreatgooglescale-190121123035/95/devops-sre-at-google-scale-30-638.jpg?cb=1548074257
Why Monitor?
3
Analyzing long-term trends
Comparing time/iterations
Alerting
Building dashboards
How big is my database and how fast is it
growing? How quickly is my daily-active user
count growing?
Are queries faster with Acme Bucket of Bytes
2.72 versus Ajax DB 3.14? How much better
is my memcache hit rate with an extra node?
Is my site slower than it was last week?
Something is broken,
and somebody needs to
fix it right now! Or,
something might break
soon, so attention
needed soon.
Dashboards should answer basic questions
about your service, and normally include
some form of the four golden signals
Conducting ad hoc retrospective analysis
Our latency just shot up; what else happened around the same time?
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Setting Reasonable Expectations for Monitoring
• Simpler and faster monitoring systems, with better tools for post
hoc analysis
• Other uses of monitoring data such as capacity planning and
traffic prediction can tolerate more fragility, and thus, more
complexity
• To keep noise low and signal high, the elements of your
monitoring system that direct to a pager need to be very simple
and robust
• Rules that generate alerts for humans should be simple to
understand and represent a clear failure
4https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Example symptoms and causes
5
Symptom Cause
I’m serving HTTP 500s or 404s Database servers are refusing connections
My responses are slow
CPUs are overloaded by a bogosort, or an Ethernet
cable is crimped under a rack, visible as partial
packet loss
Users in Antarctica aren’t receiving animated cat
GIFs
Your Content Distribution Network hates scientists
and felines, and thus blacklisted some client IPs
Private content is world-readable
A new software push caused ACLs to be forgotten
and allowed all requests
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Black-box monitoring White-box monitoring
• Restricted to external
service behaviour
• e.g. ping, http
6
• Internal metrics and logs to
surface system issues
• e.g. request_errors_total,
request_processing_time,
DevOps School
7https://www.devopsschool.com/blog/white-box-monitoring-and-black-box-monitoring-explained/
Four Golden Signals
8
Choosing an Appropriate Resolution for
Measurements
• Observing CPU load over the time span of a minute won’t reveal even
quite long-lived spikes that drive high tail latencies
• For a web service targeting no more than 9 hours aggregate downtime
per year (99.9% annual uptime), probing for a 200 (success) status
more than once or twice a minute is probably unnecessarily frequent.
• Similarly, checking hard drive fullness for a service targeting 99.9%
availability more than once every 1–2 minutes is probably unnecessary
9
References
10
Dr Ganesh Neelakanta Iyer
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com

More Related Content

What's hot

Understanding parametersniffing sqlsat
Understanding parametersniffing sqlsatUnderstanding parametersniffing sqlsat
Understanding parametersniffing sqlsat
Sanil Mhatre
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Thoughtworks
 
Ride the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database RiderRide the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database Rider
Mikalai Alimenkou
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
mralexjuarez
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with Kubernetes
Mikalai Alimenkou
 
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
South Tyrol Free Software Conference
 
ImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and websiteImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and website
Devdutt Kumar
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
Lari Hotari
 
Baltimore share point user group june 2015
Baltimore share point user group june 2015Baltimore share point user group june 2015
Baltimore share point user group june 2015
Toby McGrail
 

What's hot (9)

Understanding parametersniffing sqlsat
Understanding parametersniffing sqlsatUnderstanding parametersniffing sqlsat
Understanding parametersniffing sqlsat
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
 
Ride the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database RiderRide the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database Rider
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with Kubernetes
 
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
 
ImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and websiteImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and website
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Baltimore share point user group june 2015
Baltimore share point user group june 2015Baltimore share point user group june 2015
Baltimore share point user group june 2015
 

Similar to SRE Demystified - 06 - Distributed Monitoring

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahoo
guestb1b95b
 
Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
Quest Software
 
The 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck EzellThe 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck Ezell
Datavail
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
Balvinder Hira
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web Sites
Páris Neto
 
High Performance Websites By Souders Steve
High Performance Websites By Souders SteveHigh Performance Websites By Souders Steve
High Performance Websites By Souders Steve
w3guru
 
Plop
PlopPlop
Plop
oakleaf
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
Kevin Crawley
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
mkherlakian
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance Optimization
Neha Thakur
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Sascha Wenninger
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
Andreas Grabner
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
didip
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
oscon2007
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
SolarWinds
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
rschuppe
 
Empowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsEmpowering Customers with Personalized Insights
Empowering Customers with Personalized Insights
Cloudera, Inc.
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017
Speedment, Inc.
 

Similar to SRE Demystified - 06 - Distributed Monitoring (20)

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahoo
 
Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
 
The 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck EzellThe 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck Ezell
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web Sites
 
High Performance Websites By Souders Steve
High Performance Websites By Souders SteveHigh Performance Websites By Souders Steve
High Performance Websites By Souders Steve
 
Plop
PlopPlop
Plop
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance Optimization
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
 
Empowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsEmpowering Customers with Personalized Insights
Empowering Customers with Personalized Insights
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017
 

More from Dr Ganesh Iyer

SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
Dr Ganesh Iyer
 
SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2
Dr Ganesh Iyer
 
SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1
Dr Ganesh Iyer
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement Model
Dr Ganesh Iyer
 
Machine Learning for Statisticians - Introduction
Machine Learning for Statisticians - IntroductionMachine Learning for Statisticians - Introduction
Machine Learning for Statisticians - Introduction
Dr Ganesh Iyer
 
Making Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approachMaking Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approach
Dr Ganesh Iyer
 
Cloud and Industry4.0
Cloud and Industry4.0Cloud and Industry4.0
Cloud and Industry4.0
Dr Ganesh Iyer
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
Dr Ganesh Iyer
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
Dr Ganesh Iyer
 
How to become a successful entrepreneur
How to become a successful entrepreneurHow to become a successful entrepreneur
How to become a successful entrepreneur
Dr Ganesh Iyer
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
Dr Ganesh Iyer
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deployment
Dr Ganesh Iyer
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
Dr Ganesh Iyer
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
Dr Ganesh Iyer
 
Cloud computing for image processing and bio informatics
Cloud computing for image processing and bio informaticsCloud computing for image processing and bio informatics
Cloud computing for image processing and bio informatics
Dr Ganesh Iyer
 
Io t trends
Io t trendsIo t trends
Io t trends
Dr Ganesh Iyer
 
Trends in IoT 2017
Trends in IoT 2017Trends in IoT 2017
Trends in IoT 2017
Dr Ganesh Iyer
 
Docker 101 - High level introduction to docker
Docker 101 - High level introduction to dockerDocker 101 - High level introduction to docker
Docker 101 - High level introduction to docker
Dr Ganesh Iyer
 
DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0
Dr Ganesh Iyer
 
Simplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaSSimplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaS
Dr Ganesh Iyer
 

More from Dr Ganesh Iyer (20)

SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
 
SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2
 
SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement Model
 
Machine Learning for Statisticians - Introduction
Machine Learning for Statisticians - IntroductionMachine Learning for Statisticians - Introduction
Machine Learning for Statisticians - Introduction
 
Making Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approachMaking Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approach
 
Cloud and Industry4.0
Cloud and Industry4.0Cloud and Industry4.0
Cloud and Industry4.0
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
How to become a successful entrepreneur
How to become a successful entrepreneurHow to become a successful entrepreneur
How to become a successful entrepreneur
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deployment
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
 
Cloud computing for image processing and bio informatics
Cloud computing for image processing and bio informaticsCloud computing for image processing and bio informatics
Cloud computing for image processing and bio informatics
 
Io t trends
Io t trendsIo t trends
Io t trends
 
Trends in IoT 2017
Trends in IoT 2017Trends in IoT 2017
Trends in IoT 2017
 
Docker 101 - High level introduction to docker
Docker 101 - High level introduction to dockerDocker 101 - High level introduction to docker
Docker 101 - High level introduction to docker
 
DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0
 
Simplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaSSimplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaS
 

Recently uploaded

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 

Recently uploaded (20)

zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 

SRE Demystified - 06 - Distributed Monitoring

  • 3. Why Monitor? 3 Analyzing long-term trends Comparing time/iterations Alerting Building dashboards How big is my database and how fast is it growing? How quickly is my daily-active user count growing? Are queries faster with Acme Bucket of Bytes 2.72 versus Ajax DB 3.14? How much better is my memcache hit rate with an extra node? Is my site slower than it was last week? Something is broken, and somebody needs to fix it right now! Or, something might break soon, so attention needed soon. Dashboards should answer basic questions about your service, and normally include some form of the four golden signals Conducting ad hoc retrospective analysis Our latency just shot up; what else happened around the same time? https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 4. Setting Reasonable Expectations for Monitoring • Simpler and faster monitoring systems, with better tools for post hoc analysis • Other uses of monitoring data such as capacity planning and traffic prediction can tolerate more fragility, and thus, more complexity • To keep noise low and signal high, the elements of your monitoring system that direct to a pager need to be very simple and robust • Rules that generate alerts for humans should be simple to understand and represent a clear failure 4https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 5. Example symptoms and causes 5 Symptom Cause I’m serving HTTP 500s or 404s Database servers are refusing connections My responses are slow CPUs are overloaded by a bogosort, or an Ethernet cable is crimped under a rack, visible as partial packet loss Users in Antarctica aren’t receiving animated cat GIFs Your Content Distribution Network hates scientists and felines, and thus blacklisted some client IPs Private content is world-readable A new software push caused ACLs to be forgotten and allowed all requests https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 6. Black-box monitoring White-box monitoring • Restricted to external service behaviour • e.g. ping, http 6 • Internal metrics and logs to surface system issues • e.g. request_errors_total, request_processing_time, DevOps School
  • 9. Choosing an Appropriate Resolution for Measurements • Observing CPU load over the time span of a minute won’t reveal even quite long-lived spikes that drive high tail latencies • For a web service targeting no more than 9 hours aggregate downtime per year (99.9% annual uptime), probing for a 200 (success) status more than once or twice a minute is probably unnecessarily frequent. • Similarly, checking hard drive fullness for a service targeting 99.9% availability more than once every 1–2 minutes is probably unnecessary 9
  • 11. Dr Ganesh Neelakanta Iyer ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com