@nileshgule
Improve
Monitoring and Observability
for
Kubernetes
with
OSS tools
$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://GitHub.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}
@nileshgule
Pre-requisites
Self contained application with all its dependencies
Docker
❖ Orchestrates containers
❖ Self healing
❖ Service discovery
❖ Scaling
Kubernetes
❖ Scalable apps in dynamic environments (public /
private / hybrid clouds)
❖ Exemplified by Containers, service meshes,
microservices, immutable infrastructure &
declarative APIs
❖ Loosely coupled systems, resilient, observable &
manageable
❖ Robust automation
Cloud Native Applications
@nileshgule
@nileshgule
CNCF cloud trail
https://github.com/cncf/trailmap
@nileshgule
CNCF Observability landscape
https://landscape.cncf.io
@nileshgule
CNCF Observability Radar
https://radar.cncf.io/2020-09-observability
@nileshgule
CNCF Observability Radar
https://radar.cncf.io/2020-09-observability
@nileshgule
3 Pillars of Observability
Logs Metrics Traces
@nileshgule
Centralized
Logging
@nileshgule
❑ Application specific
❖ Long term log retention for compliance reasons
❖ Workloads scheduled on different nodes during
application restarts / updates
❖ Autoscaling workloads
❑ Kubernetes upgrades
❖ Auto healing can reschedule workloads
❖ Underlying nodes added / deleted during cluster
scaling
❖ Underlying nodes replaced during cluster
upgrades
Container based workloads
Why centralized logging
❖ Not much control over underlying infra
❖ Relies on cloud prover specific logging and monitoring
solution
PaaS / Serverless services
@nileshgule
Tech Talks EFK integration
Log collector Log storage Log search, visualise,
dashboards
rabbitmq-producer-service rabbitmq-consumer-deployment
@nileshgule
Demo 1 – Log Aggregation with EFK
@nileshgule
Monitoring and
Alerting
@nileshgule
• Application specific
• Monitor resource usage
• Monitor scaling needs
• Monitor anomalies / outliers
• Kubernetes platform level
• Monitor cluster resources (CPU / RAM)
• API health
• Autoscaling
Container based workloads
Why Monitoring & Alerting
• Monitor resource usage
• Scaling
• Bottlenecks
PaaS / Serverless services
@nileshgule
Prometheus Architecture
@nileshgule
Demo 2 – Metrics using Prometheus &
Grafana
@nileshgule
Spring Boot Conference App integration
https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server
conference-demo-service-monitor
conference-demo-service
@nileshgule
Exception
Handling
@nileshgule
Sentry Architecture
https://develop.sentry.dev/architecture/
@nileshgule
Spring Boot Sentry integration
conference-demo-service
Managed Kubernetes cluster
@nileshgule
Demo 3 – Exception aggregation using
Sentry
@nileshgule
End to End Observability
@nileshgule
Observability challenges
➢ Too many telemetry agents
➢ Instrumentation of Apps
➢ Dynamic & small units in Cloud Native Applications
➢ Right retention period for each type of metric and usage
➢ Minimize vendor or feature lock-in
➢ Buy vs Build
➢ Transition from Monitoring to Observability
➢ Single pane of glass for consuming different information
➢ Correlation of signals
@nileshgule
Analogy - Use right tool for right purpose
@nileshgule
Summary
✓ Use best-of-class for given use case
✓ Rely on open standards (e.g. OpenTelemetry)
✓ Build portable observability systems (e.g. hybrid cloud migration)
Log Aggregation
✓ EFK stack helps in centralized logging
✓ Kibana is used to visualize logs and build dashboards
Monitoring & Alerting
✓ Prometheus provides easy to use metrics for platforms, applications
✓ Grafana provides visualization capabilities to build intuitive dashboards
Exception Aggregation
✓ Sentry provides Exception Aggregation capabilities
✓ Excellent telemetry data captured by Sentry to help diagnose problems
@nileshgule
Some Recommendations
♣ Too many agents
♣ Instrumentation, vendor lock-in
♣ Cloud native logs
♣ Cloud native metrics
♣ Cloud native traces
♣ Single pane of glass, correlation
∞ OpenTelemetry collector
∞ OpenTelemetry, OpenMetrics
∞ Fluent Bit / Fluentd, OpenSearch, Loki
∞ Prometheus, Cortex, Thanos
∞ OpenTelemetry, Jaeger, Grafana
∞ Grafana
Challenges Tools
@nileshgule
References
Log Aggregation
❖ Elastic stack
❖ Kibana
❖ Fluentbit
Monitoring & Alerting
❖ Prometheus
❖ Grafana
❖ Kube Prometheus stack
❖ Dynatrace – Monitoring vs Observability
❖ Houssem Dellai – Prometheus & Grafana
for monitoring Kubernetes
Sentry
❖ Sentry docs
@nileshgule
Source Code & slide deck
Tech Talks
https://github.com/NileshGule/pd-tech-fest-2019
Observability & Monitoring markdown
Conference app
https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server
https://speakerdeck.com/nileshgule/
https://www.slideshare.net/nileshgule/
Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://bit.ly/youtube-nileshgule
Q&A

Improve monitoring and observability for kubernetes with oss tools