Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kubernetes Monitoring & Best Practices


Published on

I presented Kubernetes Monitoring & Best Practices talk at Sumo Logic Bengaluru User group last Friday.

Published in: Technology
  • What if you had a printing press that could spit out hundred dollar bills on demand? Do you think that would change your life? ★★★
    Are you sure you want to  Yes  No
    Your message goes here

Kubernetes Monitoring & Best Practices

  1. 1. Sumo Logic confidential Kubernetes Monitoring & Best Practices 1
  2. 2. Sumo Logic confidential • Principal Development Engineer at DellEMC • 1st half of my career was in CGI & VMware • 2nd half of my career has been in System Integration Testing • Docker Captain (since 2016) • Docker Bangalore Meetup Organizer ( 8800+ Registered Users) • DockerLabs Incubator ~ 1700+ Slack Members • Freqeunt Blogger – Ajeet Singh Raina Twitter: @ajeetsraina GitHub: ajeetraina 2
  3. 3. Sumo Logic confidential Suresh Govindachetty • Enterprise Sales Engineer at Sumo Logic • Formerly with Citrix, HPE,Nortel • Mostly in Presales, Networking and Security 3
  4. 4. Sumo Logic confidential Massive shift in monitoring requirements from host based monitoring to “container-specific & service-oriented monitoring” 4
  5. 5. Sumo Logic confidential Containers & Kubernetes: The New Reality App Traditional Software Architecture Containerized Architecture Server Orchestrated Containerized Architecture 5
  6. 6. Sumo Logic confidential Traditional Monitoring Solution Bare Metal System hypervisor Virtual Machines Containers Monitoring agent 6
  7. 7. In a Monolithic World… What to Monitor? Application Hosts on which the applications gets deployed 7
  8. 8. In a Cloud Native World… What to Monitor? Hosts Kubernetes Platform Docker Containers Containerized Microservices 8
  9. 9. Sumo Logic confidential Benefits of Containers & Kubernetes Portability Scalability Rolling Updates Service Discovery Load Balancing Self Healing Secure 9
  10. 10. Sumo Logic confidential While Kubernetes solves old problems, it introduces new ones. 10
  11. 11. Sumo Logic confidential K8s is powerful… but Complex ! Kubernetes is great but COMPLEX! $kubectl create –f web.yaml Current Challenges in Kubernetes Monitoring & Troubleshooting
  12. 12. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Everything, In K8s by design Is Ephemeral
  13. 13. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! Cascading Failures - Container Communication - Increased Dependencies - Changing Architecture
  14. 14. Sumo Logic confidential Current Challenges in Kubernetes Monitoring & Troubleshooting K8s is powerful… but Complex ! More & Noisy Metrics(100x) - Container Unique Metrics - Ephemeral Data - False Positives
  15. 15. Sumo Logic confidential Methodology Switch Cattle: (Container) Pet: (K8s Services) o Named with strings of numbers o Almost identical o Ephemeral o Sick: get new one o 1 or more identical Pods o Specific Name( kube_app, kube_name) o Give context to container metrics o Sick: nurse back to health 15
  16. 16. Sumo Logic confidential Visualizing Kubernetes Objects Service A Namespace Service B Container Pod C1 Pod C2 Pod C3 Service C Container Container Pet Cattle 16
  17. 17. Sumo Logic confidential K8s Monitoring Strategies & Methods - Remote Polling( K8s metric/event APIs) - Node-based (agent per host/ DaemonSets) - Sidecars (agent per Pod) - Logs & APM 17
  18. 18. Sumo Logic confidential K8s Metrics - Monitoring Kubernetes Cluster Node resource utilization The number of nodes Running pods - Are number of nodes available sufficient? - Can they handle the entire workload in case a node fails? - Number of nodes available - What you are paying for - Discover what the cluster is being used for. - Network bandwidth - Disk utilization - CPU, and - Memory 18
  19. 19. Sumo Logic confidential K8s Metrics - Monitoring Pod Kubernetes Metrics Container Metrics Application Metrics - Developed by the application itself and are related to the business rules it addresses. - For example, a database application exposing metrics related to an indices’ state and statistics concerning tables and relationships. - Using Cadvisor and exposed by Heapster, which queries every node about the running containers. - Metrics like CPU, network, and memory usage compared with the maximum allowed are the highlights. - Monitor how a specific pod and its deployment are being handled - The number of instances a pod has at the moment and how many were expected - How the on-progress deployment is going (how many instances were changed from an older version to a new one), health checks, and some network data available through network services. 19
  20. 20. Sumo Logic confidential Node Metrics from node_exporter Container Metrics from cadvisor K8s Metrics from K8s API Server - node_exporter installed a DaemonSet - 1 instance per node - Also called as “K8s Core Metrics” - Metrics about the performance of the k8s API server - Standard Host Metrics - Load Average - CPU - Memory - Disk - Network - Embedded into the Kubelet, so we scrape the Kubelet to get container metrics - For each container on the node: - CPU Usage - Filesystem read/write/limits - Memory usage and limits - Network transmit/receive/dropped - Performance of controller work queues - Request Rates and Latencies - ETCD helper cache work queues and cache performance - General process status(File Descriptors/Memory/CPU seconds. - GoLang Status(GC/Memory/Threads). 100 unique series in typical node Sources of Metrics in Kubernetes 20
  21. 21. Sumo Logic confidential Source of Metrics in Kubernetes k8s derived kube-state-metrics Etcd Metrics from etcd - Counts & metadata about many k8s types - Count of many 'nouns' - Resource limits - Container States - Ready/restarts/running/terminated/waiting - Etcd is "master of all truth" within a k8s cluster - Leader existence and leader change rate - Disk Write Performance - Inbound gRPC stats - etcd_http_received_total - etcd_http_failed_total - etcd_http_successful_duration_* 21
  22. 22. Kubernetes Monitoring Best Practices 22
  23. 23. Sumo Logic confidential #1: Collect Metrics at Container Level but Alerts at Service Level $cat /etc/docker/daemon.json { "metrics-addr" : "", "experimental" : true }
  24. 24. Sumo Logic confidential #2: Monitor Service Level Objective(SLO) per Service per Route • Error Rate per Service per route • Latency per Service per route
  25. 25. Sumo Logic confidential #3: Infra Metrics: Utilization - Resource Availability for Pods Vs Allocation - Verify every Pod/Container has a limit (BP) 25
  26. 26. Sumo Logic confidential #4: Always alert on High Disk Usage 26 • Monitor ALL disk volumes, including the root file system. • Kubernetes Node Exporter provides a nice metric for tracking devices
  27. 27. Sumo Logic confidential #5: Never ignore Kube-system 27 • Total DNS Requests - Resource Issue, Scaling Limits, Application Bug • DNS Request Time - High Latency • Quorum Loss in the cluster/Failure in Leader Election • Unusual High Snapshot Duration • Network criticality
  28. 28. Sumo Logic confidential #6: Consistent Metadata Enrichment Tag individual components of Kubernetes so that it can provide context for your services
  29. 29. Sumo Logic confidential Best Practice #6: No Better KPI than API - Track the API Gateway for Microservices in order to automatically detect application issues <Image TBD> 29
  30. 30. Sumo Logic confidential Discoverability - Infrastructure vs. Service View - Complex - Slow to find and troubleshoot issues - Disconnected from the customer reality - Simple to understand - Quick to find and troubleshoot issues - Tightly connected to the customer reality Service-centric ViewpointInfrastructure-centric Viewpoint 30
  31. 31. Sumologic K8s Monitoring and Troubleshooting • Delivers a best in class, end-to-end Kubernetes Monitoring and Troubleshooting experience. • Open source collectors (Fluentbit, Fluentd,Prometheus, Falco) • Visualize K8s hierarchies through Deployment, Service, Node and Namespace views • Honeycomb visualization - quick overview of data in a visually digestible way. • Simplified Monitoring and Troubleshooting • Correlation of Logs, Metrics, event and Security • Integrated security with Falco+ partner apps
  32. 32. Sumo Logic confidential Data Collection with Sumo Logic 32
  33. 33. Sumo Logic Confidential Our Kubernetes Partner Apps - Security App Purpose Details SecOps Provides comprehensive monitoring and analysis solution for detecting vulnerabilities and potential threats throughout your environment, including hosts, containers, images and registry. SecOps Helps you detect, investigate and remediate vulnerabilities, insecure configurations and compliance violations across all container and Kubernetes environments. SecOps Provides granular security and compliance control monitoring to DevSecOps teams throughout the cloud native application lifecycle, from development to runtime in production. SecOps Gives customers the ability to detect, investigate, and remediate vulnerabilities in software artifacts across your deployment environments. 33
  34. 34. Sumo Logic Confidential Ecosystem - Unified K8s DevOps and SecOps Monitoring CI/CD DevOps SecOps circleci codefresh armory harness Kubernetes AmazonEKS Google Kubernetes Service Azure Kubernetes Service Falco Twistlock StackRox aqua Tigera JFrog Xray 34
  35. 35. Sumo Logic confidential It’s Demo Time… 35
  36. 36. Sumo Logic Confidential 36
  37. 37. References monitoring/ 37
  38. 38. Sumo Logic Confidential Thank You 38