Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and Kubernetes


Published on


The high level of automation for the container and microservice lifecycle makes the monitoring of Kubernetes or Swarm more challenging than in more traditional, more static deployments. Any static setup to monitor specific application containers does not work because orchestration tools like Kubernetes or Swarm make their own decisions according to the defined deployment rules. In this talk you will learn how DevOps can cope with challenges in Monitoring and Log Management on Docker Swarm and Kubernetes. We will start with the basics of container monitoring and logging, including APIs and tools, followed by an overview of the key metrics of both platforms. We will speak about cluster-wide deployments for monitoring and log management solutions and how to discover services for log collection and monitoring, tagging of logs and metrics. Finally, we will share insights derived from monitoring a 4700 node Swarm cluster, as part of the Swarm3k project.

Published in: Technology
  • Login to see the comments

DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and Kubernetes

  1. 1. Monitoring and Log Management for Docker Swarm and Kubernetes Stefan Thies Sematext Group, Inc.
  2. 2. Sematext & I Logsene SPM logs metrics Docker Agent #nodejs
  3. 3. Agenda • What is • Centralized Log Management + Performance Monitoring • Kubernetes / Swarm • Container Logs • Container Metrics • Example: Swarm3k Monitoring • Summary
  4. 4. Centralized Log Management Logagent
  5. 5. Centralized Monitoring Expose Metrics Collect Metrics Ship Metrics Store Metrics Aggregate Metrics Visualize Metrics • Correlation with Logs Anomaly Detection Alerting Server + App / Container Configuration Monitoring Agents Time Series Database Dashboard Tools, Alerting Tools, ChatOps Tools
  6. 6.
  7. 7. Orchestration Container POD Node Node 1 POD 1 Namespace ns1 Kibana Elasticsearch POD 2 Namespace ns2 Redis Services (proxy) Replication Controllers DaemonSets 3 HorizontalPod Autoscaler
  8. 8. Kubernetes Dashboard / Heapster • Current status • Shows basic resource usage for workloads (Pod) • Simple logs view • Heapster is required for autoscaling features
  9. 9. Orchestration Container Stacks Nodes Node 1 ELK (compose, app bundle) Kibana 1 Elasticsearch 1 Redis (service) redis1 3 Node 2 ELK Elasticsearch 2 Elasticsearch 3
  10. 10. Kubernetes != Swarm • Common base is Docker • Docker Logs & Metrics • Docker API
  11. 11. Container Logs
  12. 12. Docker Logging Drivers Docker json-file (default) Files journald (CoreOS) System journal Syslog TCP UDP Fluentd TCP $plunk TCP Gelf Centralized Log Management Local Log Shipper
  13. 13. Docker logs Containers (should) log to stdout/stderr !!! docker logs container_id docker logs container_name Docker API Docker client Container logs
  14. 14. Fun with Docker logging drivers $ docker run --log-driver=syslog --log-opt syslog-address=udp://$HOSTNAME:514 --log-opt tag=„{{.ImageName}}#{{.Name}}#{{.ID}}" -p 9003:80 –name nginx1 -d nginx $ docker logs nginx 1 "logs" command is supported only for "json-file" and "journald" logging drivers (got: syslog) Add Context!
  15. 15. More fun with TCP logging drivers! docker run --log-driver=syslog --log-opt syslog- address=tcp:// --log-opt tag="{{.ImageName}}#{{.Name}}#{{.ID}}" -p 9004:80 -d nginx docker: Error response from daemon: Failed to initialize logging driver: dial tcp getsockopt: connection refused.
  16. 16. Fix it – run syslog server first! docker run -d -p 514:514 factorish/syslog -t tcp docker run –logging-driver=syslog … nginx curl localhost:9004 docker logs syslog ==> syslog listening on tcp <30>Nov 17 18:23:43 nginx#nginx1#afebdfff0eed[1710]: - - [17/Nov/2016:18:23:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.49.1" "-"
  17. 17. Is UDP better?
  18. 18. Alternatives? Docker Log files json-file or journald API Agent Remote Log Storage Disk Buffer Docker API provides the most complete information! Reliable networks and backend services? Better buffer & retransmit in case of failure! Attach metadata to logs/metrics or route data to different servers or indices? “docker logs” works & logs are stored on local disk! Centralize search, analytics, alerts, access permissions Parse logs
  19. 19. Automatic tagging of logs, metrics, events • Automatic tagging of log / metrics with • Docker • Container Name / ID • Image Name / ID • Labels / Environment • Hostname / IP • Kubernetes • Namespace, Pod Name , UID • Swarm • Swarm Service Name , ID, Compose Project, Container # (scale) • Single collector for logs, metrics, events, metadata • Base for correlation and visualisation
  20. 20. Container Metrics Collection
  21. 21. Collection
  22. 22. Metric collection via Docker API
  23. 23. Smart monitoring agent - all in one Docker API Agent Remote Storage Disk Buffer Docker API provides Labels, Metrics, Logs, Events … Reliable networks and backend services? Better buffer & retransmit in case of failure!Auto-tagging using container labels. Discovery of services Centralize logs, metrics, analytics, alerts, access permissions Metrics, Logs, Events
  24. 24. Integrate application monitoring in the stack - Custom images - add/remove app with all req. options - Start monitoring, reading config from etcd App Config to expose metrics App Monitor Configured for App Container Service Discovery etcd consul
  25. 25. Auto Discovery via Docker API and Labels? App Container config to expose metrics App MonitorDocker Monitor run discovery Docker Automatic run
  26. 26. Key Container Metrics
  27. 27. Node Storage • Good kids clean up their rooms. Good Docker ops clean up their disks by removing unused containers & images.
  28. 28. Number of containers per host • Verify deployment strategies
  29. 29. CPU quota per container
  30. 30. Container memory and OOM counter
  31. 31. Docker Events
  32. 32. Swarm Task Status
  33. 33. Limit container resources for your apps! • Set CPU quotas –cpu-quota=6000 • Limit Memory and configure App in container to the same limits! -m 512mb • Disable Swap: –memory-swap=-1 • To limit a Docker container from eating all your disk IO use e.g. –device-write-bps /dev/sda:1mb
  34. 34. Automatic Deployment of monitoring agents • One command to run a service on each node joining the cluster • Kubernetes: • DaemonSet creates a pod per node kubectl create -f sematext-agent.yml • Swarm: • Global Service docker service create –mode global ...
  35. 35. Swarm3k Monitoring
  36. 36. Swarm3k Requirements • Monitoring • Host metrics • Container metrics • Docker Events • Task Monitoring • Collect Container Logs: Task Errors only • 3000+ Nodes (actual: 4.7k) • 150.000 (actual: 60k) • Duration 8 hours – 28 GB data collected • Public/shared Dashboard for the community
  37. 37. Pre-flight test with 500 nodes • 60.000 containers deployed in less than 5 minutes!
  38. 38. Swarm3k in one picture
  39. 39. Limits in visualisation Missing Labels to group hosts or containers
  40. 40. Summary • Setup of Monitoring & Logging is complex in dynamic environments • Kubernetes != Swarm (yet). Common base: Docker Containers • Smart Agents to collect, analyze, aggregate metrics, events and logs • Auto discovery of containers for data collection • Use metadata tag metrics & logs as base for correlation and visualization • Integrate monitoring in application stacks for app specific metrics • Auto Discovery of services and automatic configuration for application level monitoring
  41. 41. We are engineers! We develop DevOps tools! We are DevOps people! We do fun stuff ;)
  42. 42. Thank you for listening! Get in touch! Stefan @seti321 @sematext Come talk to us at the booth