Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Monitoring a Kubernetes-backed microservice
architecture with Prometheus
Björn “Beorn” Rabenstein — SoundCloud
Fabian Rein...
SoundCloud 2012 – from monolith …
… to microservices
Orchestration needed
Run containers in a cluster…
In-house innovation: Bazooka – PaaS, Heroko style.
Problems:
- Only 12-f...
K U B E R N E T E S
Kubernetes
- inspired by Google’s Borg
- not Borg
Today:
Tomorrow:
X
Labels Labels are the
new hierarchies!
name = MyAPI
app = MyAPI
version = 0.4
app = MyAPI
version = 0.5
app = MyAPI
versio...
Graphite
Monitoring at SC 2012
Service A
Service B
Service C
Monitoring challenges
‐ A lot of traffic to monitor
- Monitoring traffic should not be proportional to user traffic
‐ Way ...
Monitor everything, all levels, with the same system
Level What to monitor (examples) What exposes metrics (example)
Netwo...
P R O M E T H E U S
“Obviously the solution to all our problems with everything forever, right?”
Benjamin Staffin, Fitbit ...
Prometheus
- inspired by Google’s Borgmon
- not Borgmon
- initially developed at SoundCloud, open-source from the beginnin...
Features – multi-dimensional data model
http_requests_total{instance="web-1", path="/index", status="500", method="GET"}
h...
Features – powerful query language
The 3 path-method combinations with the highest number of failing requests?
topk(3, sum...
Features – powerful query language
topk(3, sum by(path, method) (
rate(http_requests_total{status=~"5.."}[5m])
))
{path="/...
Features – easy instrumentation
from prometheus_client import start_http_server, Histogram
# Create a metric to track time...
Integrations (selection)
K U B E R N E T E S
P R O M E T H E U S
Three questions
How to monitor
services running
on Kubernetes
with Prometheus?
X
?
?
How to monitor
Kubernetes and
contain...
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
Monitoring Services via Exporters
svc = mysql
namespace = production
MySQL
exporter
MySQL
Monitoring Kubernetes
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
Running Prometheus on Kubernetes
- So far: Prometheus ran outside of cluster
- Pod IPs must be routable
- Conventional dep...
Monitoring Services
svc = service-X
namespace = production
version = 22.4
version = 22.5
app = prometheus
namespace = kube-system
apiserver
kubelet
kubelet
etcd
kubelet
kubelet
kubelet
kubelet
app = prometheus
namespace = infra
...
What about storage?
A) None
B) Network/Cloud volumes
C) Host volumes
DEMO
CC BY 2.0
Author: carstingaxion / Carsten Bach
The end
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Upcoming SlideShare
Loading in …5
×

Monitoring a Kubernetes-backed microservice architecture with Prometheus

2,773 views

Published on

As many startups of the last decade, SoundCloud’s architecture started as a Ruby-on-Rails monolith, which later had to be broken into microservices to cope with the growing size and complexity of the site. The microservices initially ran on an in-house container management and deployment platform. Recently, the company has started to migrate to Kubernetes.
With the introduction of microservices, the existing conventional monitoring setup failed both conceptually and in terms of scalability. Thus, starting in 2012, SoundCloud invested heavily into the development of the open-source monitoring system Prometheus, which was designed for large-scale highly dynamic service-oriented architectures.
Migrating to Kubernetes, it became apparent that Prometheus and Kubernetes are a match made in open-source heaven. The talk will demonstrate the current Prometheus setup at SoundCloud, monitoring a large-scale Kubernetes cluster.

Published in: Technology

Monitoring a Kubernetes-backed microservice architecture with Prometheus

  1. 1. Monitoring a Kubernetes-backed microservice architecture with Prometheus Björn “Beorn” Rabenstein — SoundCloud Fabian Reinartz — CoreOS
  2. 2. SoundCloud 2012 – from monolith …
  3. 3. … to microservices
  4. 4. Orchestration needed Run containers in a cluster… In-house innovation: Bazooka – PaaS, Heroko style. Problems: - Only 12-factor apps (stateless etc.). - Limited resource isolation. - No sidecars. - Maturity. Meanwhile, the open-source world has evolved…
  5. 5. K U B E R N E T E S
  6. 6. Kubernetes - inspired by Google’s Borg - not Borg Today: Tomorrow: X
  7. 7. Labels Labels are the new hierarchies! name = MyAPI app = MyAPI version = 0.4 app = MyAPI version = 0.5 app = MyAPI version = 0.4 app = MyDB cluster = auth app = MyDB cluster = misc select by [ app = MyAPI ] service pod pod pod pod pod
  8. 8. Graphite Monitoring at SC 2012 Service A Service B Service C
  9. 9. Monitoring challenges ‐ A lot of traffic to monitor - Monitoring traffic should not be proportional to user traffic ‐ Way more targets to monitor - One host can run many containers ‐ And they constantly change - Deploys, scaling, rescheduling unhealthy instances … ‐ Need a fleet-wide view. - What’s my overall 99th percentile latency? ‐ Still need to be able to drill down for troubleshooting - Which instance/endpoint/version/... causes those errors I’m seeing? ‐ Meaningful alerting - Symptom-based alerting for pages, cause-based alerting for warnings - See Rob Ewaschuk’s “My philosophy on alerting" https://goo.gl/2vrpSO
  10. 10. Monitor everything, all levels, with the same system Level What to monitor (examples) What exposes metrics (example) Network Routers, switches SNMP exporter Host (OS, hardware) Hardware failure, provisioning, host resources Node exporter Container Resource usage, performance characteristics cAdvisor Application Latency, errors, QPS, internal state Your own code Orchestration Cluster resources, scheduling Kubernetes components
  11. 11. P R O M E T H E U S “Obviously the solution to all our problems with everything forever, right?” Benjamin Staffin, Fitbit Site Operations
  12. 12. Prometheus - inspired by Google’s Borgmon - not Borgmon - initially developed at SoundCloud, open-source from the beginning - public announcement early 2015 - collects metrics at scale via HTTP (think: yet another client of your microservice) - thousands of targets, millions of time series, 800k samples/s, no dependencies - easy to scale
  13. 13. Features – multi-dimensional data model http_requests_total{instance="web-1", path="/index", status="500", method="GET"} http_requests_total{instance="web-1", path="/index", status="404", method="POST"} http_requests_total{instance="web-3", path="/index", status="200", method="GET"} #metrics x #values(instance) x #values(path) x #values(status) x #values(method) ▶ millions of time series Labels are the new hierarchies!
  14. 14. Features – powerful query language The 3 path-method combinations with the highest number of failing requests? topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) The 99th percentile request latency by request path? histogram_quantile(0.99, sum by(le, path) ( rate(http_requests_duration_seconds_bucket[5m]) )) The questions to ask are often not known beforehand.
  15. 15. Features – powerful query language topk(3, sum by(path, method) ( rate(http_requests_total{status=~"5.."}[5m]) )) {path="/api/comments", method="POST"} 105.4 {path="/api/user/:id", method="GET"} 34.122 {path="/api/comment/:id/edit", method="POST"} 29.31
  16. 16. Features – easy instrumentation from prometheus_client import start_http_server, Histogram # Create a metric to track time spent and requests made. REQUEST_TIME = Histogram('request_processing_seconds', 'Time spent processing request') # Decorate function with metric. @REQUEST_TIME.time() def process_request(t): # do work … return start_http_server(8000)
  17. 17. Integrations (selection)
  18. 18. K U B E R N E T E S P R O M E T H E U S
  19. 19. Three questions How to monitor services running on Kubernetes with Prometheus? X ? ? How to monitor Kubernetes and containers with Prometheus? How to run Prometheus on Kubernetes?
  20. 20. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5
  21. 21. Monitoring Services via Exporters svc = mysql namespace = production MySQL exporter MySQL
  22. 22. Monitoring Kubernetes namespace = kube-system apiserver kubelet kubelet etcd kubelet kubelet kubelet kubelet
  23. 23. Running Prometheus on Kubernetes - So far: Prometheus ran outside of cluster - Pod IPs must be routable - Conventional deployment (Chef, Puppet, …) - Service discovery needs authentication - To run Prometheus inside of cluster: kubectl run --image="quay.io/prometheus/prometheus:0.18.0" prometheus
  24. 24. Monitoring Services svc = service-X namespace = production version = 22.4 version = 22.5 app = prometheus
  25. 25. namespace = kube-system apiserver kubelet kubelet etcd kubelet kubelet kubelet kubelet app = prometheus namespace = infra Monitoring Kubernetes
  26. 26. What about storage? A) None B) Network/Cloud volumes C) Host volumes
  27. 27. DEMO CC BY 2.0 Author: carstingaxion / Carsten Bach
  28. 28. The end

×