Monitoring in 2017 - TIAD Camp Docker

Monitoring in 2017
Challenges in monitoring containers, and dynamic
infrastructure.
TIAD

Oct 6, 2017
Charly Fontaine
Software Engineer - Containers team 
Datadog

CharlyF
[charly@datadoghq.com]
Name: Charly Fontaine
Role: Software Engineer
 
Interests:
* Containerized Infrastructures
* Monitoring all the things
* Motorbikes

• SaaS based infrastructure and app monitoring
• Open Source Agent
• Time series data (metrics and events)
• Processing nearly a trillion data points per day
• Intelligent Alerting
• We’re hiring! (www.datadoghq.com/careers/)
Datadog Overview

Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches,
Queues and more...
Monitor Everything

$ cat ~/.plan
1. Intro: The Importance of Monitoring
2. The Challenge: Monitoring Dynamic Infrastructure
3. Finding the Signal: How do we know what to monitor?
4. Implementation: Applying it to Containerized Workloads
5. Demo: Monitoring of a containerized web app deployment

Collecting data is cheap; 
not having it when you
need it can be expensive

Sharing
Using and Sharing the same
metrics and measurements
across teams is key to avoiding
misunderstandings.

Why do we focus on Docker and
Containers?

When the choice of technology is
determined by what is popular on
HackerNews that week.
Hacker News Driven Development

https://www.datadoghq.com/docker-adoption/

Docker Adoption Growth
We’ve see 5x increase of Docker adoption over the last year.

Open Questions
• Where is my container running?
• What is the capacity of my cluster?
• What’s the total throughput of my app?
• What’s its response time per tag? (app, version, region)
• What’s the distribution of 5xx error per container?

More Details at: http://www.datadoghq.com/blog/monitoring-101-alerting/

Examples: NGINX - Metrics
Resource Metrics: 
• Disk I/O
• Memory
• CPU
• Queue Length
Work Metrics:  
• Requests Per Second
• Request Time
• Error Rates (4xx or 5xx)
• Success (2xx)

Examples: NGINX - Events
• Configuration Change
• Code Deployment
• Service Started / Stopped

What to demand from our
monitoring tooling?

EVERY ALERT MUST BE ACTIONABLE

Query Based Monitoring
“What’s the average throughput of
application:nginx per version ?”
“Alert me when one of my pod from replication
controller:foo is not behaving like the others?”
“Show me rate of HTTP 500 responses from nginx”
“… across all data centers”
“… running my app version 2….”

Resource Metrics
Utilization:
• CPU (user + system)
• memory
• i/o
• network traffic
Saturation
• throttling
• swap
Error
• Network Errors  
(receive vs transmit)

Container Events
• Starting / Stopping Containers
• Scaling Events for Underlying Instances
• Deploying a new container build

Pseudo-files
• Provide visibility into container metrics via the file system.
• Generally under:  
/cgroup/<resource>/docker/$CONTAINER_ID/  
or 
/sys/fs/cgroup/<resource>/docker/$CONTAINER_ID/

Pseudo-files: CPU Metrics
$ cat /sys/fs/cgroup/cpuacct/docker/$CONTAINER_ID/cpuacct.stat
> user 2451 # time spent running processes since boot
> system 966 # time spent executing system calls since boot
$ cat /sys/fs/cgroup/cpu/docker/$CONTAINER_ID/cpu.stat
> nr_periods 565 # Number of enforcement intervals that have elapsed
> nr_throttled 559 # Number of times the group has been throttled
> throttled_time 12119585961 # Total time that members of the group were throttled (12.12 seconds)
Pseudo-files: CPU Throttling

Docker API
• Detailed streaming metrics as JSON HTTP socket 
$ curl -v --unix-socket /var/run/docker.sock http://localhost/containers/
28d7a95f468e/stats

Service Discovery
Docker API Kubernetes
Monitoring Agent
Container
A O A O
Containers List &
Metadata
Additional Metadata
(Tags, etc)
Config Backends
Integration Configurations
Host Level
Metrics

Custom Metrics
• Instrument custom applications 
• You know your key transactions best. 
• Use async protocols like Etys’ STATSD or  
DogstatsD

Resources
Monitoring 101: Alerting  
https://www.datadoghq.com/blog/monitoring-101-alerting/
Monitoring 101: Collecting the Right Data
https://www.datadoghq.com/blog/monitoring-101-collecting-data/
Monitoring 101: Investigating performance issues
https://www.datadoghq.com/blog/monitoring-101-investigation/ 
The Power of Tagged Metrics
https://www.datadoghq.com/blog/the-docker-monitoring-problem/
How to Collect Docker Metrics
https://www.datadoghq.com/blog/how-to-collect-docker-metrics/
8 surprising facts about Docker Adoption
https://www.datadoghq.com/docker-adoption/

Monitoring in 2017 - TIAD Camp Docker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Monitoring in 2017 - TIAD Camp Docker

Similar to Monitoring in 2017 - TIAD Camp Docker (20)

More from The Incredible Automation Day

More from The Incredible Automation Day (20)

Recently uploaded

Recently uploaded (20)

Monitoring in 2017 - TIAD Camp Docker