DockerCon Europe 2018 Monitoring & Logging Workshop

Transparency into your workloads
Docker Monitoring & Logging

Agenda
- [ ] Introduction
- [ ] Operations Overview
- [ ] Logging Workshop
- [ ] Monitoring Workshop
- [ ] Best Practices & Recap

Docker Captain
Site Reliability Engineer / Co-Founder
56K.Cloud
Brian Christner

Site Reliability Engineer / Co-Founder
56K.Cloud
Darragh Grealish
Docker Captain / Solutions Architect
BoxBoat
Brandon Mitchell

Agenda
- [ X ] Introduction
- [ ] Operations Overview

Ops Paradise
• Everything is Automated
• Reduce Costs
• No support calls

• Total Downtime: Just under 4
minutes
• 502 error messages total: 12 000
• People affected by the 502 error
who did not get their bargain: 400
Website Down?

Users Care About 3 Things
● Availability - Is my system online yes/no
● Latency - Does it take a long time to access application x,y,z
● Reliability - Can the user rely on using the application

Brain Based Tools
• We can track 8 objects on average
• 4 Moving Objects
• Build Dashboards & Tools
accordingly

SRE is treats Operations as if it
were a Software Problem
“Hope is not a
strategy.”
Traditional SRE saying
SRE (Site Reliability Engineering)
www.google.com/sre

4 Golden Signals
www.google.com/sre
Latency
Traffic
Errors
Saturation

R.E.D (Microservice level)
(Request) Rate: the number of requests, per second, you
services are serving.
(Request) Errors: the number of failed requests per second.
Utilization: the average time that the resource was busy servicing
work
(Request) Duration: distributions of the amount of time each
request takes.

U.S.E (Low Level / Infrastructure)
For every resource, check Utilization, Saturation, and Errors
Resource: all physical server functional components (CPUs,
disks, busses, ...)
Utilization: the average time that the resource was busy
servicing work
Saturation: the degree to which the resource has extra work
which it can't service, often queued
Errors: the count of error events

Black box vs. White box
Monitoring
Black Box Monitoring White Box Monitoring
App metrics, requests,
responses, process times
HTTP, Ping, etc
External App Metrics Internal App Metrics

Operating
Systems
Understanding Failure Modes
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security

Operating
Systems
Host / Hardware
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Container Engine
ENTERPRISE PLATFORM
Platform
Security
CPU
Memory
Liveness
File Descriptors
Storage Capacity

Operating
Systems
Networking
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Container Engine
ENTERPRISE PLATFORM
Platform
Security
Reachability
Link Utilization
File Descriptors
Storage Capacity

Operating
Systems
Orchestration
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Container Engine
ENTERPRISE PLATFORM
Platform
Security
State
Deployment Rates
Capacity
Scheduling Events

Operating
Systems
Applications
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Container Engine
ENTERPRISE PLATFORM
Platform
Security
Response Times
Error Rates
App Specific Metrics
Availability

Agenda
- [ X ] Operations Overview

https://ee-labs.play-with-docker.com

● What should we log?
● Where and how long should we store logs
● Analysis
Logging Challenges

● docker logs
● docker-compose logs
● docker service logs
● log drivers
Logging Tools

Agenda
- [ X ] Logging Workshop

● What's Broken?
● Why is it Broken?
● How Long has it been broken?
The Basis of Monitoring

● Docker Stats
● Docker Top
● Docker df
● cAdvisor
● Prometheus
Monitoring Tools

• Live container resources
• All containers or single
• Very basic but useful info
Docker Stats

Docker Top
● Display running process in a container

• Developed by Google
• Real-time Data
• Clean UI
• Exposes Metrics
• Integrates well

Agenda
- [ X ] Monitoring Workshop

• Start small & increment
• Don’t Overlert yourself
• Set Resource Limits
• Aim for actionable Information
• Run separate from Workload
• Test for Failures
• Know your Failure Models
Best Practices

- 56K.Cloud - https://56K.Cloud
- Prometheus - https://github.com/vegasbrianc/prometheus
- ELK - https://github.com/deviantony/docker-elk
- Labs – github.com/56kcloud/Training/tree/master/DockerCon
- Docker Resource Link - https://awesome-docker.netlify.com
Resources

Agenda
- [ X ] Monitoring Workshop
- [ X ] Best Practices & Recap

Thank You
Brian Christner
brian@56K.cloud
@idomyowntricks

DockerCon Europe 2018 Monitoring & Logging Workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DockerCon Europe 2018 Monitoring & Logging Workshop

Similar to DockerCon Europe 2018 Monitoring & Logging Workshop (20)

More from Brian Christner

More from Brian Christner (16)

Recently uploaded

Recently uploaded (20)

DockerCon Europe 2018 Monitoring & Logging Workshop