Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Gunnar Aasen / Partner Engineering
Container Monitoring
Best Practices
Using AWS and InfluxData

Agenda
• What is container monitoring
• Options for running containers on AWS
• Best practices for container monitoring
• Run TICK on AWS container services
• Demo
• Questions

Partner Engineering
Manager,
InfluxData
InfluxDB expert
Based in San Francisco, Gunnar is a former InfluxData
support engineer. He has intimate knowledge of InfluxDB
and the rest of the TICK stack. As a partner engineer, he’s
focused on integrating InfluxDB into the larger open source
and cloud ecosystems to help InfluxData’s partners and
customers succeed.

Process in cgroup?
Docker?
Kubernetes?
Another buzzword like ”cloud”

It’s containers all the way down
🐢
🐢
🐢

AWS ECS/Fargate
• Elastic Container Service (ECS)
– Docker-based container deployment
– Essentially AWS’ version of Kubernetes
• Terminology a bit different: Tasks vs services
– Exposes the EC2 hosts used underneath
– Can use Docker compose
• Fargate
– The same as ECS, with no EC2 instances exposed
– Pay only for container CPU/memory used

AWS EKS
• EKS is AWS’ managed Kubernetes offering
– Equivalent to Google’s GKS
• Uses EC2 instances underneath
– These are exposed to the user
• AWS manages the Kubernetes API
• Some integration with IAM and load balancers

Options for deploying TICK on AWS
• CloudFormation module for EC2
• Link: https://github.com/influxdata/amazon-cloud-formation-influxdb-enterprise
• ECS/Fargate via Docker Compose
• Link: https://github.com/influxdata/sandbox
• EKS
– Via Helm (On the AWS Marketplace)
• Link: https://aws.amazon.com/marketplace/pp/B07KGM885K
– Via InfluxDB operator
• Link: https://docs.influxdata.com/platform/integrations/kubernetes/

Kubernetes resources
• Summary
– Link: https://docs.influxdata.com/platform/integrations/kubernetes/
• kube-influxdb project
– Enable monitoring of Kubernetes with TICK easy on different platforms
• Link: https://github.com/influxdata/kube-influxdb
– Similar to kube-prometheus
– Includes common container and Kubernetes inputs to enable
– Includes graphs and dashboards for those metrics
– Will include alerts as well

Recommendations for
monitoring on AWS

What’s different
• Proliferation of containers
– Running in AWS…
• Enables microservices
– Increases the amount of inter-container (inter-process) communication
• Minimal environments
– Lack of familiar debugging tools and techniques

Observability is the new paradigm
• A holistic understanding of reality in a system
– Monitoring
• Current state of the system
– Logging
• Actions taken by services in the system
– Tracing
• Interactions between different services
– Graphs/alerting
• Translating machine information into human information

Levels of container monitoring
• Host/node level monitoring
– EC2 node failures
• Container monitoring
– Lack of resources
• Application monitoring
– Service does not respond
• Cluster monitoring
– Is Kubernetes overextended?

Telegraf in Kubernetes
• Three options
– DaemonSet: monitoring per node (one telegraf per EC2)
• Collect host/node metrics
– Deployment: single service for a cluster (Prometheus scraping)
• Collect application and cluster metrics
– SideCar: tight coupling with the application
• Collect container metrics
• DaemonSet or SideCar? Start with DaemonSet
• Understand the metrics you’re generating before deploying

Telegraf input plugins for instrumenting nodes
• cpu: standard CPU metrics
• system: general stats on system load
• processes: uptime, and number of users logged in
• procstat: fine grained process stats like RSS memory
• diskio: metrics about disk traffic and timing
• Disk: metrics about disk usage.
• Mem: system memory metrics.
• netstat: network related metrics
• http_response: setup local ping
• filestat: Files to gather stats about (meta node only)

Telegraf input plugins for instrumenting containers
• logs: requires syslog
• swap: system swap metrics.
• internal: Telegraf related stats
• docker: if deployed in containers
• kubernetes: kubelet stats like per-node pod metrics
• kube_inventory: Kubernetes state metrics
• prometheus: Prometheus-style /metrics endpoints
• syslog: structured logging

Monitoring recommendations
• Remember to set up black box testing
– Kubernetes may look fine internally but egress may be failing
– Always start here for alerting
• Node health is still important in Kubernetes
– OOM killer, no disk space are still problems
– Pay attention to local system disk space
• Believe your user’s reports
– Most small problems are never reported
– Microservices/container scheduling can create many small outages

System recommendations
• Decouple the monitoring system from the target infrastructure
– SaaS, VMs work well for decoupling
• Test the monitoring system
– All large environments should have staging metrics
• Monitoring should be deployed with your application
– Infrastructure as code like CloudFormation or Terraform templates
• Always consider how cascading failures will affect monitoring
– Monitoring systems tend to go down during other service issues

AWS recommendations
• Keeping an accessible record of Cloudwatch stats
– Keep in mind Cloudwatch API limits
• Always consider AWS limits ahead of time
– Available instance classes
– Hard to monitor without access to the AWS support API
• Kubernetes
– Stay up to date for the best experience
– Pay attention to IAM roles
– Use CloudFormation

Future Plans
• Next couple months
– Migrating to official Helm charts repo
• Deprecating TICK charts and kube-influxdb repos
• One well-known place for all charts
• This summer
– Operator extended for InfluxDB Enterprise
– Additional operator functionality for other TICK components
– Publish more tools for tracing

🙋♀️ Questions? 🙋♂️

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

More Related Content

What's hot

Similar to Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen

More from InfluxData

Recently uploaded

Container Monitoring Best Practices Using AWS and InfluxData by Gunnar Aasen