SlideShare a Scribd company logo
1 of 40
Download to read offline
Prometheus
A next-generation monitoring system
Fabian Reinartz – Production Engineer, SoundCloud Ltd.
Monitoring at SC 2012 – from monolith ...
... to micro services
Monitoring at SC 2012
Service A
Service B
Service C
StatsD Graphite
History – monitoring at SoundCloud 2012
Source: http://eugenedvorkin.com/seven-micro-services-architecture-problems-and-solutions/
History – monitoring at SoundCloud 2012
Source: http://blog.sflow.com/2011/12/using-ganglia-to-monitor-java-virtual.html
History – monitoring at SoundCloud 2012
Source: http://www.bellarmine.edu/faculty/amahmood/tier3/monitoring.html
P R O M E T H E U S
Prometheus
- started by Matt Proud and Julius Volz as an Open Source project
- first commit 24-11-2012
- public announcement in January 2015
- inspired by Borgmon
- not Borgmon
Features – multi-dimensional data model
http_requests_total{instance=”web-1”, path=”/index”, status=”401”, method=”GET”}
#metrics x #labels x #values ▶ millions of time series
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~”5..”}[5m])
))
histogram_quantile(0.99, sum by(le, path) (
rate(http_requests_duration_seconds_bucket[5m])
))
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~”5..”}[5m])
))
{path=”/api/comments”, method=”POST”} 105.4
{path=”/api/user/:id”, method=”GET”} 34.122
{path=”/api/comment/:id/edit”, method=”POST”} 29.31
Features – easy to use, yet scalable
- single static binary, no dependencies
$ go get github.com/prometheus/prometheus/cmd/...
$ prometheus
- local storage
- high-throughput [millions of time series, 380,000 samples/sec]
- efficient compression
Integrations
Instrument – natively
var httpDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Namespace: namespace,
Name: "http_request_duration_seconds",
Help: "A histogram of HTTP request durations.",
Buckets: prometheus.ExponentialBuckets(0.0001, 1.5, 25),
},
[]string{"path", "method", "status"},
)
func handleAPI(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// do work
httpDuration.WithLabelValues(r.URL.Path, r.Method, status).Observe(time.Since(start).Seconds())
}
Features – built-in expression browser
Features – native Grafana support
Features – PromDash
D O E S I T E V E N S C A L E ?
Features – federation & sharding
Cluster A Cluster B
Cluster C
service metrics container metrics
S E R V I C E D I S C O V E R Y
DNS SRV
$ dig +short SRV all.foo-api.srv.int.example.com
0 0 4738 ip-10-22-11-32.int.example.com.
0 0 3433 ip-10-22-11-32.int.example.com.
0 0 5934 ip-10-22-11-34.int.example.com.
0 0 5093 ip-10-22-11-42.int.example.com.
0 0 4589 ip-10-22-11-43.int.example.com.
0 0 9848 ip-10-22-12-11.int.example.com.
[...]
DNS SRV
scrape_configs:
- job_name: "foo-api"
metrics_path: "/metrics"
dns_sd_configs:
- names: ["all.foo-api.srv.int.example.com"]
refresh_interval: 10s
Fancy SD
- Consul
- Kubernetes
- Zookeeper
- EC2
- Mesos-Marathon
- … any via file-based plugins
Relabel based on SD data.
Relabeing
relabel_config:
action: replace
source_labels: [__address__, __telemetry_port]
target_label: __address__
regex: (.+):(.+);(.+)
replacement: $1:$3
OUT
“__address__”: “10.44.12.135:82432”
“__telemetry_port”: “82432”
“cluster”: “AB”
“environment”: “production”
IN
“__address__”: “10.44.12.135:25431”
“__telemetry_port”: “82432”
“cluster”: “AB”
“environment”: “production”
AWS EC2
scrape_configs:
- job_name: "foo-api"
metrics_path: "/metrics"
ec2_sd_configs:
- region: us-east-1
refresh_interval: 60s
port: 80
The following meta labels are available during relabeling:
- __meta_ec2_instance_id: the EC2 instance ID
- __meta_ec2_public_ip: the public IP address of the instance
- __meta_ec2_private_ip: the private IP address of the instance, if present
- __meta_ec2_tag_<tagkey>: each tag value of the instance
AWS EC2 – relabeling
relabel_configs:
- source_labels: [__meta_ec2_tag_Type]
action: keep
regex: foo-api
- source_labels: [__meta_ec2_tag_Deployment]
action: replace
target_label: deployment
regex: (.+)
replacement: $1
A L E R T M A N A G E R
Alerting
- no opinions
- directly defined on time series data
- verbose on firing ▶ compact but detailed on notifcation
Alerting
ALERT HighErrorRate
IF sum by(job, path)(rate(http_requests_total{status=~”5..”})) /
sum by(job, path)(rate(http_requests_total)) * 100 > 1
FOR 10m
SUMMARY “high number of 5xx errors”
DESCRIPTION “{{$labels.job}} has {{$value}}% 5xx errors on {{ $labels.path }}”
Alerting
{path=”/api/comments”, method=”POST”} 5.43
{path=”/api/user/:id”, method=”GET”} 1.22
{path=”/api/comment/:id/edit”, method=”POST”} 1.01
Alerting
ALERT HighErrorRate
IF ... * 100 > 1
FOR 10m
WITH { severity = “warning” } …
ALERT HighErrorRate
IF ... * 100 > 3
FOR 10m
WITH { severity = “critical” } …
ALERTMANAGER
a l e r t s
silence
inhibit
g r o u p
d e d u p
r o u t e
PagerDuty
Mail
Slack
...
Alerting
ALERT DiskWillFillIn4Hours
IF predict_linear(node_filesystem_free{job='node'}[1h], 4*3600) < 0
FOR 5m
SUMMARY “device filling up”
DESCRIPTION “{{$labels.device}} mounted on {{$labels.mountpoint}} on
{{$labels.instance}} will fill up within 4 hours.”
http://www.robustperception.io/reduce-noise-from-disk-space-alerts/
D E M O
Turing complete
http://www.robustperception.io/conways-life-in-prometheus/
Excursion – recording rules
job:http_requests:rate5m = sum by(job) (
rate(http_requests_total[5m])
)

More Related Content

Similar to OSMC 2015 | Prometheus: A Next-Generation Monitoring System by Fabian Reinartz

Mobile App Performance: Getting the Most from APIs (MBL203) | AWS re:Invent ...
Mobile App Performance:  Getting the Most from APIs (MBL203) | AWS re:Invent ...Mobile App Performance:  Getting the Most from APIs (MBL203) | AWS re:Invent ...
Mobile App Performance: Getting the Most from APIs (MBL203) | AWS re:Invent ...Amazon Web Services
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016Aad Versteden
 
ZZ BC#7.5 asp.net mvc practice and guideline refresh!
ZZ BC#7.5 asp.net mvc practice  and guideline refresh! ZZ BC#7.5 asp.net mvc practice  and guideline refresh!
ZZ BC#7.5 asp.net mvc practice and guideline refresh! Chalermpon Areepong
 
What’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBWhat’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBVMware Tanzu
 
What's new in android jakarta gdg (2015-08-26)
What's new in android   jakarta gdg (2015-08-26)What's new in android   jakarta gdg (2015-08-26)
What's new in android jakarta gdg (2015-08-26)Google
 
Building Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisBuilding Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisFIWARE
 
Mobile App Development With IBM Cloudant
Mobile App Development With IBM CloudantMobile App Development With IBM Cloudant
Mobile App Development With IBM CloudantIBM Cloud Data Services
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Aad Versteden
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkNico Meisenzahl
 
attachment_3998 (3).pdf
attachment_3998 (3).pdfattachment_3998 (3).pdf
attachment_3998 (3).pdfssuser02a37f1
 
What's Next Replay - SpringSource
What's Next Replay - SpringSourceWhat's Next Replay - SpringSource
What's Next Replay - SpringSourceZenikaOuest
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Effective Data Pipelines with Docker & Jenkins - Brian Donaldson
Effective Data Pipelines with Docker & Jenkins - Brian DonaldsonEffective Data Pipelines with Docker & Jenkins - Brian Donaldson
Effective Data Pipelines with Docker & Jenkins - Brian DonaldsonDocker, Inc.
 
Cloud Computing in Mobile
Cloud Computing in MobileCloud Computing in Mobile
Cloud Computing in MobileSVWB
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewMaría Angélica Bracho
 
Running MongoDB Enterprise on Kubernetes
Running MongoDB Enterprise on KubernetesRunning MongoDB Enterprise on Kubernetes
Running MongoDB Enterprise on KubernetesAriel Jatib
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Luca Lusso
 
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...RootedCON
 

Similar to OSMC 2015 | Prometheus: A Next-Generation Monitoring System by Fabian Reinartz (20)

Mobile App Performance: Getting the Most from APIs (MBL203) | AWS re:Invent ...
Mobile App Performance:  Getting the Most from APIs (MBL203) | AWS re:Invent ...Mobile App Performance:  Getting the Most from APIs (MBL203) | AWS re:Invent ...
Mobile App Performance: Getting the Most from APIs (MBL203) | AWS re:Invent ...
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
 
ZZ BC#7.5 asp.net mvc practice and guideline refresh!
ZZ BC#7.5 asp.net mvc practice  and guideline refresh! ZZ BC#7.5 asp.net mvc practice  and guideline refresh!
ZZ BC#7.5 asp.net mvc practice and guideline refresh!
 
What’s New in Spring Data MongoDB
What’s New in Spring Data MongoDBWhat’s New in Spring Data MongoDB
What’s New in Spring Data MongoDB
 
What's new in android jakarta gdg (2015-08-26)
What's new in android   jakarta gdg (2015-08-26)What's new in android   jakarta gdg (2015-08-26)
What's new in android jakarta gdg (2015-08-26)
 
Building Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEisBuilding Your Own IoT Platform using FIWARE GEis
Building Your Own IoT Platform using FIWARE GEis
 
Mobile App Development With IBM Cloudant
Mobile App Development With IBM CloudantMobile App Development With IBM Cloudant
Mobile App Development With IBM Cloudant
 
Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016Semantic technologies in practice - KULeuven 2016
Semantic technologies in practice - KULeuven 2016
 
Social Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections PinkSocial Connections 13 - Troubleshooting Connections Pink
Social Connections 13 - Troubleshooting Connections Pink
 
attachment_3998 (3).pdf
attachment_3998 (3).pdfattachment_3998 (3).pdf
attachment_3998 (3).pdf
 
What's Next Replay - SpringSource
What's Next Replay - SpringSourceWhat's Next Replay - SpringSource
What's Next Replay - SpringSource
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Effective Data Pipelines with Docker & Jenkins - Brian Donaldson
Effective Data Pipelines with Docker & Jenkins - Brian DonaldsonEffective Data Pipelines with Docker & Jenkins - Brian Donaldson
Effective Data Pipelines with Docker & Jenkins - Brian Donaldson
 
Sfdx introduction
Sfdx introductionSfdx introduction
Sfdx introduction
 
Cloud Computing in Mobile
Cloud Computing in MobileCloud Computing in Mobile
Cloud Computing in Mobile
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
 
Running MongoDB Enterprise on Kubernetes
Running MongoDB Enterprise on KubernetesRunning MongoDB Enterprise on Kubernetes
Running MongoDB Enterprise on Kubernetes
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...
Daniel Kachakil - Android's Download Provider: Discovering and exploiting thr...
 

Recently uploaded

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

OSMC 2015 | Prometheus: A Next-Generation Monitoring System by Fabian Reinartz

  • 1. Prometheus A next-generation monitoring system Fabian Reinartz – Production Engineer, SoundCloud Ltd.
  • 2. Monitoring at SC 2012 – from monolith ...
  • 3. ... to micro services
  • 4. Monitoring at SC 2012 Service A Service B Service C StatsD Graphite
  • 5. History – monitoring at SoundCloud 2012 Source: http://eugenedvorkin.com/seven-micro-services-architecture-problems-and-solutions/
  • 6. History – monitoring at SoundCloud 2012 Source: http://blog.sflow.com/2011/12/using-ganglia-to-monitor-java-virtual.html
  • 7. History – monitoring at SoundCloud 2012 Source: http://www.bellarmine.edu/faculty/amahmood/tier3/monitoring.html
  • 8. P R O M E T H E U S
  • 9. Prometheus - started by Matt Proud and Julius Volz as an Open Source project - first commit 24-11-2012 - public announcement in January 2015 - inspired by Borgmon - not Borgmon
  • 10. Features – multi-dimensional data model http_requests_total{instance=”web-1”, path=”/index”, status=”401”, method=”GET”} #metrics x #labels x #values ▶ millions of time series
  • 11. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~”5..”}[5m]) )) histogram_quantile(0.99, sum by(le, path) ( rate(http_requests_duration_seconds_bucket[5m]) ))
  • 12. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~”5..”}[5m]) )) {path=”/api/comments”, method=”POST”} 105.4 {path=”/api/user/:id”, method=”GET”} 34.122 {path=”/api/comment/:id/edit”, method=”POST”} 29.31
  • 13. Features – easy to use, yet scalable - single static binary, no dependencies $ go get github.com/prometheus/prometheus/cmd/... $ prometheus - local storage - high-throughput [millions of time series, 380,000 samples/sec] - efficient compression
  • 14.
  • 16. Instrument – natively var httpDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Namespace: namespace, Name: "http_request_duration_seconds", Help: "A histogram of HTTP request durations.", Buckets: prometheus.ExponentialBuckets(0.0001, 1.5, 25), }, []string{"path", "method", "status"}, ) func handleAPI(w http.ResponseWriter, r *http.Request) { start := time.Now() // do work httpDuration.WithLabelValues(r.URL.Path, r.Method, status).Observe(time.Since(start).Seconds()) }
  • 17. Features – built-in expression browser
  • 18. Features – native Grafana support
  • 20.
  • 21. D O E S I T E V E N S C A L E ?
  • 22. Features – federation & sharding Cluster A Cluster B Cluster C service metrics container metrics
  • 23.
  • 24. S E R V I C E D I S C O V E R Y
  • 25. DNS SRV $ dig +short SRV all.foo-api.srv.int.example.com 0 0 4738 ip-10-22-11-32.int.example.com. 0 0 3433 ip-10-22-11-32.int.example.com. 0 0 5934 ip-10-22-11-34.int.example.com. 0 0 5093 ip-10-22-11-42.int.example.com. 0 0 4589 ip-10-22-11-43.int.example.com. 0 0 9848 ip-10-22-12-11.int.example.com. [...]
  • 26. DNS SRV scrape_configs: - job_name: "foo-api" metrics_path: "/metrics" dns_sd_configs: - names: ["all.foo-api.srv.int.example.com"] refresh_interval: 10s
  • 27. Fancy SD - Consul - Kubernetes - Zookeeper - EC2 - Mesos-Marathon - … any via file-based plugins Relabel based on SD data.
  • 28. Relabeing relabel_config: action: replace source_labels: [__address__, __telemetry_port] target_label: __address__ regex: (.+):(.+);(.+) replacement: $1:$3 OUT “__address__”: “10.44.12.135:82432” “__telemetry_port”: “82432” “cluster”: “AB” “environment”: “production” IN “__address__”: “10.44.12.135:25431” “__telemetry_port”: “82432” “cluster”: “AB” “environment”: “production”
  • 29. AWS EC2 scrape_configs: - job_name: "foo-api" metrics_path: "/metrics" ec2_sd_configs: - region: us-east-1 refresh_interval: 60s port: 80 The following meta labels are available during relabeling: - __meta_ec2_instance_id: the EC2 instance ID - __meta_ec2_public_ip: the public IP address of the instance - __meta_ec2_private_ip: the private IP address of the instance, if present - __meta_ec2_tag_<tagkey>: each tag value of the instance
  • 30. AWS EC2 – relabeling relabel_configs: - source_labels: [__meta_ec2_tag_Type] action: keep regex: foo-api - source_labels: [__meta_ec2_tag_Deployment] action: replace target_label: deployment regex: (.+) replacement: $1
  • 31. A L E R T M A N A G E R
  • 32. Alerting - no opinions - directly defined on time series data - verbose on firing ▶ compact but detailed on notifcation
  • 33. Alerting ALERT HighErrorRate IF sum by(job, path)(rate(http_requests_total{status=~”5..”})) / sum by(job, path)(rate(http_requests_total)) * 100 > 1 FOR 10m SUMMARY “high number of 5xx errors” DESCRIPTION “{{$labels.job}} has {{$value}}% 5xx errors on {{ $labels.path }}”
  • 34. Alerting {path=”/api/comments”, method=”POST”} 5.43 {path=”/api/user/:id”, method=”GET”} 1.22 {path=”/api/comment/:id/edit”, method=”POST”} 1.01
  • 35. Alerting ALERT HighErrorRate IF ... * 100 > 1 FOR 10m WITH { severity = “warning” } … ALERT HighErrorRate IF ... * 100 > 3 FOR 10m WITH { severity = “critical” } …
  • 36. ALERTMANAGER a l e r t s silence inhibit g r o u p d e d u p r o u t e PagerDuty Mail Slack ...
  • 37. Alerting ALERT DiskWillFillIn4Hours IF predict_linear(node_filesystem_free{job='node'}[1h], 4*3600) < 0 FOR 5m SUMMARY “device filling up” DESCRIPTION “{{$labels.device}} mounted on {{$labels.mountpoint}} on {{$labels.instance}} will fill up within 4 hours.” http://www.robustperception.io/reduce-noise-from-disk-space-alerts/
  • 38. D E M O
  • 40. Excursion – recording rules job:http_requests:rate5m = sum by(job) ( rate(http_requests_total[5m]) )