Improve monitoring and observability for kubernetes with oss tools

@nileshgule
Improve
Monitoring and Observability
for
Kubernetes
with
OSS tools

$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://GitHub.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}

@nileshgule
Pre-requisites
Self contained application with all its dependencies
Docker
❖ Orchestrates containers
❖ Self healing
❖ Service discovery
❖ Scaling
Kubernetes
❖ Scalable apps in dynamic environments (public /
private / hybrid clouds)
❖ Exemplified by Containers, service meshes,
microservices, immutable infrastructure &
declarative APIs
❖ Loosely coupled systems, resilient, observable &
manageable
❖ Robust automation
Cloud Native Applications

@nileshgule
CNCF cloud trail
https://github.com/cncf/trailmap

@nileshgule
CNCF Observability landscape
https://landscape.cncf.io

@nileshgule
CNCF Observability Radar
https://radar.cncf.io/2020-09-observability

@nileshgule
3 Pillars of Observability
Logs Metrics Traces

@nileshgule
Centralized
Logging

@nileshgule
❑ Application specific
❖ Long term log retention for compliance reasons
❖ Workloads scheduled on different nodes during
application restarts / updates
❖ Autoscaling workloads
❑ Kubernetes upgrades
❖ Auto healing can reschedule workloads
❖ Underlying nodes added / deleted during cluster
scaling
❖ Underlying nodes replaced during cluster
upgrades
Container based workloads
Why centralized logging
❖ Not much control over underlying infra
❖ Relies on cloud prover specific logging and monitoring
solution
PaaS / Serverless services

@nileshgule
Tech Talks EFK integration
Log collector Log storage Log search, visualise,
dashboards
rabbitmq-producer-service rabbitmq-consumer-deployment

@nileshgule
Demo 1 – Log Aggregation with EFK

@nileshgule
Monitoring and
Alerting

@nileshgule
• Application specific
• Monitor resource usage
• Monitor scaling needs
• Monitor anomalies / outliers
• Kubernetes platform level
• Monitor cluster resources (CPU / RAM)
• API health
• Autoscaling
Container based workloads
Why Monitoring & Alerting
• Monitor resource usage
• Scaling
• Bottlenecks
PaaS / Serverless services

@nileshgule
Prometheus Architecture

@nileshgule
Demo 2 – Metrics using Prometheus &
Grafana

@nileshgule
Spring Boot Conference App integration
https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server
conference-demo-service-monitor
conference-demo-service

@nileshgule
Exception
Handling

@nileshgule
Sentry Architecture
https://develop.sentry.dev/architecture/

@nileshgule
Spring Boot Sentry integration
conference-demo-service
Managed Kubernetes cluster

@nileshgule
Demo 3 – Exception aggregation using
Sentry

@nileshgule
End to End Observability

@nileshgule
Observability challenges
➢ Too many telemetry agents
➢ Instrumentation of Apps
➢ Dynamic & small units in Cloud Native Applications
➢ Right retention period for each type of metric and usage
➢ Minimize vendor or feature lock-in
➢ Buy vs Build
➢ Transition from Monitoring to Observability
➢ Single pane of glass for consuming different information
➢ Correlation of signals

@nileshgule
Analogy - Use right tool for right purpose

@nileshgule
Summary
✓ Use best-of-class for given use case
✓ Rely on open standards (e.g. OpenTelemetry)
✓ Build portable observability systems (e.g. hybrid cloud migration)
Log Aggregation
✓ EFK stack helps in centralized logging
✓ Kibana is used to visualize logs and build dashboards
Monitoring & Alerting
✓ Prometheus provides easy to use metrics for platforms, applications
✓ Grafana provides visualization capabilities to build intuitive dashboards
Exception Aggregation
✓ Sentry provides Exception Aggregation capabilities
✓ Excellent telemetry data captured by Sentry to help diagnose problems

@nileshgule
Some Recommendations
♣ Too many agents
♣ Instrumentation, vendor lock-in
♣ Cloud native logs
♣ Cloud native metrics
♣ Cloud native traces
♣ Single pane of glass, correlation
∞ OpenTelemetry collector
∞ OpenTelemetry, OpenMetrics
∞ Fluent Bit / Fluentd, OpenSearch, Loki
∞ Prometheus, Cortex, Thanos
∞ OpenTelemetry, Jaeger, Grafana
∞ Grafana
Challenges Tools

@nileshgule
References
Log Aggregation
❖ Elastic stack
❖ Kibana
❖ Fluentbit
Monitoring & Alerting
❖ Prometheus
❖ Grafana
❖ Kube Prometheus stack
❖ Dynatrace – Monitoring vs Observability
❖ Houssem Dellai – Prometheus & Grafana
for monitoring Kubernetes
Sentry
❖ Sentry docs

@nileshgule
Source Code & slide deck
Tech Talks
https://github.com/NileshGule/pd-tech-fest-2019
Observability & Monitoring markdown
Conference app
https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server
https://speakerdeck.com/nileshgule/
https://www.slideshare.net/nileshgule/

Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule
@nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com
https://bit.ly/youtube-nileshgule

Improve monitoring and observability for kubernetes with oss tools

More Related Content

What's hot

Similar to Improve monitoring and observability for kubernetes with oss tools

More from Nilesh Gule

Recently uploaded

In this document

Improve monitoring and observability for kubernetes with oss tools