Application monitoring on
Engineer at AI
Kubernetes at Arvind Internet
● Our Infra is deployed on AWS
● Kubernetes minions are running on m4.xlarge instances
● Kubernetes version 1.7.5 in QA/Prod, 1.8.3 on Pre-prod
● QA/Dev, Pre-Prod & Production running on Kubernetes
● Total Pods ⇒ More than 350 (QA/Dev, Prod)
● Total services ⇒ More than 200 (QA/Dev, Prod)
● Running Mongo, MySQL, Redis, Hazelcast in Kubernetes in QA/Dev
Monitoring at AI (earlier)
1. Multiple monitoring system
2. Difficulty in Troubleshooting
3. Additional Infrastructure cost to support three monitoring system
4. Graphite doesn’t provide pod level Application metrics
5. Infra team need to understand Sensu, Prometheus alerting
6. Application metrics are single dimension Ex. (a.b.c.d.99)
7. Grafana alerting for Application metrics
● It developed at SoundCloud by ex-Googlers
● Prometheus is a close cousin of Kubernetes
● A multi-dimensional data model with time series data identified by
metric name and key/value pairs
● Alerting and graphing are unified, using the same language.
● Time series collection happens via a pull model over HTTP
● Targets are discovered via service discovery or static configuration
● Provides multiple exporters to send EC2, Kafka, Mongo, Cassandra,
RMQ, Redis metrics
Approach #1 - Prometheus on EC2
#1. Getting EC2 server metrics is quite easy and straightforward.
Prometheus provides EC2 discovery.
#2. Getting Kubernetes and Application metrics is very complex. It has 300+
lines of configuration to support just Kubernetes metrics
What is Prometheus operator?
The Prometheus Operator creates, configures, and manages Prometheus
monitoring instances. Automatically generates monitoring target
configurations based on familiar Kubernetes label queries.
Service monitor Custom Resource Definition(CRD)