2. Before we start
- How to answer those questions?
+ Why is the system too slow?
+ Does everything work fine?
+ What’s the main bottleneck of our system?
+ What did happen at 10:00 AM this morning that made a
lot of customers complain?
+ What’s the average time the user has to wait until they get
the notification?
+ etc.
3. In short, we built a system successfully.
BUT WE HAVE NO IDEA HOW IT PERFORMS.
4. Observability
- Programmatically and continuously capture the states of a
running system
- Analyze and extract the information to produce a set of
knowledge that the observer is interested in
- Detect the abnormal behaviors and notify the responsible,
and automatically take actions to resolve the situation
- Archive the data in convenient forms that support future
investigation or analyzing
7. We need a solution that offers
- Detailed (both real-time and aggregated) statistics about our
microservices.
- Alerting when usage peeks or accidents happen.
- Easy method to implement for our microservices.
- Supports a variety of ways to keep data. (counter, gauge,
histogram ….)
- Two-way integration with Kubernetes
9. Prometheus and Grafana
- Prometheus is an open-source systems
monitoring and alerting toolkit
originally built at SoundCloud.
- Grafana is is an open source
dashboard tool for data visualization.
- They are our selected approach to
extract/collect and display monitored
data.
12. Node 1
Pull Model and Sidecar Model
Application
Node 3
Metrics collector
Node 2
GET /metrics
GET /metrics
Metric Server
/tmp/monitoring
Application Metric Server
/tmp/monitoring
13. - This gem helps you monitor your
service with ease.
- It abstracts away many infrastructural
layer via a lot of helpers
- Built-in native supports for gRPC,
Kafka, Sidekiq (soon)
EhMonitoring gem