SlideShare a Scribd company logo
Transparency into your workloads
Docker Monitoring & Logging
Agenda
- [ ] Introduction
- [ ] Operations Overview
- [ ] Logging Workshop
- [ ] Monitoring Workshop
- [ ] Best Practices & Recap
Docker Captain
Site Reliability Engineer / Co-Founder
56K.Cloud
Brian Christner
Site Reliability Engineer / Co-Founder
56K.Cloud
Darragh Grealish
Docker Captain / Solutions Architect
BoxBoat
Brandon Mitchell
Agenda
- [ X ] Introduction
- [ ] Operations Overview
- [ ] Logging Workshop
- [ ] Monitoring Workshop
- [ ] Best Practices & Recap
Zugerberg, Zug, Switzerland
Ops Paradise
• Everything is Automated
• Reduce Costs
• No support calls
• Total Downtime: Just under 4
minutes
• 502 error messages total: 12 000
• People affected by the 502 error
who did not get their bargain: 400
Website Down?
Ops Firefighting
Operational Models
Users Care About 3 Things
● Availability - Is my system online yes/no
● Latency - Does it take a long time to access application x,y,z
● Reliability - Can the user rely on using the application
Brain Based Tools
• We can track 8 objects on average
• 4 Moving Objects
• Build Dashboards & Tools
accordingly
SRE
SRE is treats Operations as if it
were a Software Problem
“Hope is not a
strategy.”
Traditional SRE saying
SRE (Site Reliability Engineering)
www.google.com/sre
4 Golden Signals
www.google.com/sre
Latency
Traffic
Errors
Saturation
R.E.D (Microservice level)
(Request) Rate: the number of requests, per second, you
services are serving.
(Request) Errors: the number of failed requests per second.
Utilization: the average time that the resource was busy servicing
work
(Request) Duration: distributions of the amount of time each
request takes.
U.S.E (Low Level / Infrastructure)
For every resource, check Utilization, Saturation, and Errors
Resource: all physical server functional components (CPUs,
disks, busses, ...)
Utilization: the average time that the resource was busy
servicing work
Saturation: the degree to which the resource has extra work
which it can't service, often queued
Errors: the count of error events
Black box vs. White box
Monitoring
Black Box Monitoring White Box Monitoring
App metrics, requests,
responses, process times
HTTP, Ping, etc
External App Metrics Internal App Metrics
Healthchecks
Readiness / Liveness Probes
What to Monitor
Operating
Systems
Understanding Failure Modes
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security
Operating
Systems
Host / Hardware
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security
CPU
Memory
Liveness
File Descriptors
Storage Capacity
Operating
Systems
Networking
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security
Reachability
Link Utilization
File Descriptors
Storage Capacity
Operating
Systems
Orchestration
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security
State
Deployment Rates
Capacity
Scheduling Events
Operating
Systems
Applications
Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes
PhysicalVirtualizationPublic Cloud
Developer
Services
Registry
Services
Access
Policies
App Lifecycle
Management
Automation &
Extensibility
Networking Orchestration Storage
Container Engine
ENTERPRISE PLATFORM
Platform
Security
Response Times
Error Rates
App Specific Metrics
Availability
Improve incrementally
Agenda
- [ X ] Introduction
- [ X ] Operations Overview
- [ ] Logging Workshop
- [ ] Monitoring Workshop
- [ ] Best Practices & Recap
Logging Tools Overview
github.com/56kcloud/Training
https://ee-labs.play-with-docker.com
● What should we log?
● Where and how long should we store logs
● Analysis
Logging Challenges
● docker logs
● docker-compose logs
● docker service logs
● log drivers
Logging Tools
Logging Workshop
github.com/56kcloud/Training
Agenda
- [ X ] Introduction
- [ X ] Operations Overview
- [ X ] Logging Workshop
- [ ] Monitoring Workshop
- [ ] Best Practices & Recap
● What's Broken?
● Why is it Broken?
● How Long has it been broken?
The Basis of Monitoring
● Docker Stats
● Docker Top
● Docker df
● cAdvisor
● Prometheus
Monitoring Tools
• Live container resources
• All containers or single
• Very basic but useful info
Docker Stats
Docker Top
● Display running process in a container
• Developed by Google
• Real-time Data
• Clean UI
• Exposes Metrics
• Integrates well
PrometheusStack
Federated Configuration
github.com/56kcloud/Training
Agenda
- [ X ] Introduction
- [ X ] Operations Overview
- [ X ] Logging Workshop
- [ X ] Monitoring Workshop
- [ ] Best Practices & Recap
• Start small & increment
• Don’t Overlert yourself
• Set Resource Limits
• Aim for actionable Information
• Run separate from Workload
• Test for Failures
• Know your Failure Models
Best Practices
- 56K.Cloud - https://56K.Cloud
- Prometheus - https://github.com/vegasbrianc/prometheus
- ELK - https://github.com/deviantony/docker-elk
- Labs – github.com/56kcloud/Training/tree/master/DockerCon
- Docker Resource Link - https://awesome-docker.netlify.com
Resources
Agenda
- [ X ] Introduction
- [ X ] Operations Overview
- [ X ] Monitoring Workshop
- [ X ] Logging Workshop
- [ X ] Best Practices & Recap
Thank You
Brian Christner
brian@56K.cloud
@idomyowntricks
Transparency into your workloads
Docker Monitoring & Logging

More Related Content

What's hot

Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Zabbix
 
Securing Databases with Dynamic Credentials and HashiCorp Vault
Securing Databases with Dynamic Credentials and HashiCorp VaultSecuring Databases with Dynamic Credentials and HashiCorp Vault
Securing Databases with Dynamic Credentials and HashiCorp Vault
Mitchell Pronschinske
 
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix
 
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu SkinNagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
Nagios
 
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDeskNagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
 
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
Frédéric Lepied
 
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Zabbix
 
Structured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, AccentureStructured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, Accenture
Docker, Inc.
 
Release it! - Takeaways
Release it! - TakeawaysRelease it! - Takeaways
Release it! - Takeaways
Manuela Grindei
 
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker, Inc.
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
Andrew Yongjoon Kong
 
Nagios
NagiosNagios
Nagios
Eddy Mulyono
 
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Docker, Inc.
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
Max Kuzkin
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
Uri Cohen
 
Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual Data
Kyle Hailey
 
A map for DevOps on Microsoft Stack - MS DevSummit
A map for DevOps on Microsoft Stack - MS DevSummitA map for DevOps on Microsoft Stack - MS DevSummit
A map for DevOps on Microsoft Stack - MS DevSummit
Giulio Vian
 

What's hot (20)

Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
 
Securing Databases with Dynamic Credentials and HashiCorp Vault
Securing Databases with Dynamic Credentials and HashiCorp VaultSecuring Databases with Dynamic Credentials and HashiCorp Vault
Securing Databases with Dynamic Credentials and HashiCorp Vault
 
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
 
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu SkinNagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDeskNagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
 
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]
 
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
 
Structured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, AccentureStructured Container Delivery by Oscar Renalias, Accenture
Structured Container Delivery by Oscar Renalias, Accenture
 
Release it! - Takeaways
Release it! - TakeawaysRelease it! - Takeaways
Release it! - Takeaways
 
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
 
Openstack summit 2015
Openstack summit 2015Openstack summit 2015
Openstack summit 2015
 
Nagios
NagiosNagios
Nagios
 
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
Everything You Need to Know About Docker and Storage by Ryan Wallner, ClusterHQ
 
Google Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with ZabbixGoogle Cloud Platform monitoring with Zabbix
Google Cloud Platform monitoring with Zabbix
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
Accelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual DataAccelerate Develoment with VIrtual Data
Accelerate Develoment with VIrtual Data
 
A map for DevOps on Microsoft Stack - MS DevSummit
A map for DevOps on Microsoft Stack - MS DevSummitA map for DevOps on Microsoft Stack - MS DevSummit
A map for DevOps on Microsoft Stack - MS DevSummit
 

Similar to DockerCon Europe 2018 Monitoring & Logging Workshop

Monitor everything
Monitor everythingMonitor everything
Monitor everything
Brian Christner
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
Docker, Inc.
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
rschuppe
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
EDB
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
gree_tech
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
Docker, Inc.
 
Early Software Development through Palladium Emulation
Early Software Development through Palladium EmulationEarly Software Development through Palladium Emulation
Early Software Development through Palladium Emulation
Raghav Nayak
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
SPC Adriatics
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment
BIOVIA
 
Azure enterprise integration platform
Azure enterprise integration platformAzure enterprise integration platform
Azure enterprise integration platform
Michael Stephenson
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Redis Labs
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
CIVEL Benoit
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1CIVEL Benoit
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
Vlad Fedosov
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
Rodolfo Kohn
 
Arm html5 presentation
Arm html5 presentationArm html5 presentation
Arm html5 presentationIan Renyard
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications ireland
Michael Meagher
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Vikas Sahni
 

Similar to DockerCon Europe 2018 Monitoring & Logging Workshop (20)

Monitor everything
Monitor everythingMonitor everything
Monitor everything
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Early Software Development through Palladium Emulation
Early Software Development through Palladium EmulationEarly Software Development through Palladium Emulation
Early Software Development through Palladium Emulation
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment
 
Azure enterprise integration platform
Azure enterprise integration platformAzure enterprise integration platform
Azure enterprise integration platform
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)Cerberus : Framework for Manual and Automated Testing (Web Application)
Cerberus : Framework for Manual and Automated Testing (Web Application)
 
Cerberus_Presentation1
Cerberus_Presentation1Cerberus_Presentation1
Cerberus_Presentation1
 
DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.DevOps Fest 2020. immutable infrastructure as code. True story.
DevOps Fest 2020. immutable infrastructure as code. True story.
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Arm html5 presentation
Arm html5 presentationArm html5 presentation
Arm html5 presentation
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Building azure applications ireland
Building azure applications irelandBuilding azure applications ireland
Building azure applications ireland
 
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
Building Real World Applications using Windows Azure - Scott Guthrie, 2nd Dec...
 

More from Brian Christner

56k.cloud intro and pitch deck
56k.cloud intro and pitch deck56k.cloud intro and pitch deck
56k.cloud intro and pitch deck
Brian Christner
 
Monitor Traefik with Prometheus
Monitor Traefik with PrometheusMonitor Traefik with Prometheus
Monitor Traefik with Prometheus
Brian Christner
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
Brian Christner
 
Docker, DevOps, & IoT
Docker, DevOps, & IoTDocker, DevOps, & IoT
Docker, DevOps, & IoT
Brian Christner
 
Docker Switzelrand Meetup #18 DockerCon Recap
Docker Switzelrand Meetup #18 DockerCon RecapDocker Switzelrand Meetup #18 DockerCon Recap
Docker Switzelrand Meetup #18 DockerCon Recap
Brian Christner
 
Zero to Serverless in 60s - Anywhere
Zero to Serverless in 60s - AnywhereZero to Serverless in 60s - Anywhere
Zero to Serverless in 60s - Anywhere
Brian Christner
 
Experts Live CH Bern Docker & Kubernetes
Experts Live CH Bern Docker & KubernetesExperts Live CH Bern Docker & Kubernetes
Experts Live CH Bern Docker & Kubernetes
Brian Christner
 
56K.cloud Docker Training
56K.cloud Docker Training56K.cloud Docker Training
56K.cloud Docker Training
Brian Christner
 
Cloud Native & Docker
Cloud Native & DockerCloud Native & Docker
Cloud Native & Docker
Brian Christner
 
Cloud Native & Docker
Cloud Native & DockerCloud Native & Docker
Cloud Native & Docker
Brian Christner
 
Docker Serverless
Docker ServerlessDocker Serverless
Docker Serverless
Brian Christner
 
Lugano Tech Talks - Why Docker
Lugano Tech Talks - Why DockerLugano Tech Talks - Why Docker
Lugano Tech Talks - Why Docker
Brian Christner
 
Docker - Build, Ship and Run Any App, Anywhere Hollywood edition
Docker - Build, Ship and Run Any App, Anywhere Hollywood editionDocker - Build, Ship and Run Any App, Anywhere Hollywood edition
Docker - Build, Ship and Run Any App, Anywhere Hollywood edition
Brian Christner
 
Monitoring mayhem - Using Prometheus
Monitoring mayhem - Using PrometheusMonitoring mayhem - Using Prometheus
Monitoring mayhem - Using Prometheus
Brian Christner
 
Docker Swarm 1.12 Overview and Demo
Docker Swarm 1.12 Overview and DemoDocker Swarm 1.12 Overview and Demo
Docker Swarm 1.12 Overview and Demo
Brian Christner
 
2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation
Brian Christner
 

More from Brian Christner (16)

56k.cloud intro and pitch deck
56k.cloud intro and pitch deck56k.cloud intro and pitch deck
56k.cloud intro and pitch deck
 
Monitor Traefik with Prometheus
Monitor Traefik with PrometheusMonitor Traefik with Prometheus
Monitor Traefik with Prometheus
 
56k.cloud training
56k.cloud training56k.cloud training
56k.cloud training
 
Docker, DevOps, & IoT
Docker, DevOps, & IoTDocker, DevOps, & IoT
Docker, DevOps, & IoT
 
Docker Switzelrand Meetup #18 DockerCon Recap
Docker Switzelrand Meetup #18 DockerCon RecapDocker Switzelrand Meetup #18 DockerCon Recap
Docker Switzelrand Meetup #18 DockerCon Recap
 
Zero to Serverless in 60s - Anywhere
Zero to Serverless in 60s - AnywhereZero to Serverless in 60s - Anywhere
Zero to Serverless in 60s - Anywhere
 
Experts Live CH Bern Docker & Kubernetes
Experts Live CH Bern Docker & KubernetesExperts Live CH Bern Docker & Kubernetes
Experts Live CH Bern Docker & Kubernetes
 
56K.cloud Docker Training
56K.cloud Docker Training56K.cloud Docker Training
56K.cloud Docker Training
 
Cloud Native & Docker
Cloud Native & DockerCloud Native & Docker
Cloud Native & Docker
 
Cloud Native & Docker
Cloud Native & DockerCloud Native & Docker
Cloud Native & Docker
 
Docker Serverless
Docker ServerlessDocker Serverless
Docker Serverless
 
Lugano Tech Talks - Why Docker
Lugano Tech Talks - Why DockerLugano Tech Talks - Why Docker
Lugano Tech Talks - Why Docker
 
Docker - Build, Ship and Run Any App, Anywhere Hollywood edition
Docker - Build, Ship and Run Any App, Anywhere Hollywood editionDocker - Build, Ship and Run Any App, Anywhere Hollywood edition
Docker - Build, Ship and Run Any App, Anywhere Hollywood edition
 
Monitoring mayhem - Using Prometheus
Monitoring mayhem - Using PrometheusMonitoring mayhem - Using Prometheus
Monitoring mayhem - Using Prometheus
 
Docker Swarm 1.12 Overview and Demo
Docker Swarm 1.12 Overview and DemoDocker Swarm 1.12 Overview and Demo
Docker Swarm 1.12 Overview and Demo
 
2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation2015 DockeCon monitoring presentation
2015 DockeCon monitoring presentation
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

DockerCon Europe 2018 Monitoring & Logging Workshop

  • 1. Transparency into your workloads Docker Monitoring & Logging
  • 2. Agenda - [ ] Introduction - [ ] Operations Overview - [ ] Logging Workshop - [ ] Monitoring Workshop - [ ] Best Practices & Recap
  • 3. Docker Captain Site Reliability Engineer / Co-Founder 56K.Cloud Brian Christner
  • 4. Site Reliability Engineer / Co-Founder 56K.Cloud Darragh Grealish Docker Captain / Solutions Architect BoxBoat Brandon Mitchell
  • 5.
  • 6. Agenda - [ X ] Introduction - [ ] Operations Overview - [ ] Logging Workshop - [ ] Monitoring Workshop - [ ] Best Practices & Recap
  • 8.
  • 9.
  • 10. Ops Paradise • Everything is Automated • Reduce Costs • No support calls
  • 11. • Total Downtime: Just under 4 minutes • 502 error messages total: 12 000 • People affected by the 502 error who did not get their bargain: 400 Website Down?
  • 14. Users Care About 3 Things ● Availability - Is my system online yes/no ● Latency - Does it take a long time to access application x,y,z ● Reliability - Can the user rely on using the application
  • 15. Brain Based Tools • We can track 8 objects on average • 4 Moving Objects • Build Dashboards & Tools accordingly
  • 16. SRE
  • 17. SRE is treats Operations as if it were a Software Problem “Hope is not a strategy.” Traditional SRE saying SRE (Site Reliability Engineering) www.google.com/sre
  • 19. R.E.D (Microservice level) (Request) Rate: the number of requests, per second, you services are serving. (Request) Errors: the number of failed requests per second. Utilization: the average time that the resource was busy servicing work (Request) Duration: distributions of the amount of time each request takes.
  • 20. U.S.E (Low Level / Infrastructure) For every resource, check Utilization, Saturation, and Errors Resource: all physical server functional components (CPUs, disks, busses, ...) Utilization: the average time that the resource was busy servicing work Saturation: the degree to which the resource has extra work which it can't service, often queued Errors: the count of error events
  • 21. Black box vs. White box Monitoring Black Box Monitoring White Box Monitoring App metrics, requests, responses, process times HTTP, Ping, etc External App Metrics Internal App Metrics
  • 25. Operating Systems Understanding Failure Modes Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes PhysicalVirtualizationPublic Cloud Developer Services Registry Services Access Policies App Lifecycle Management Automation & Extensibility Networking Orchestration Storage Container Engine ENTERPRISE PLATFORM Platform Security
  • 26. Operating Systems Host / Hardware Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes PhysicalVirtualizationPublic Cloud Developer Services Registry Services Access Policies App Lifecycle Management Automation & Extensibility Networking Orchestration Storage Container Engine ENTERPRISE PLATFORM Platform Security CPU Memory Liveness File Descriptors Storage Capacity
  • 27. Operating Systems Networking Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes PhysicalVirtualizationPublic Cloud Developer Services Registry Services Access Policies App Lifecycle Management Automation & Extensibility Networking Orchestration Storage Container Engine ENTERPRISE PLATFORM Platform Security Reachability Link Utilization File Descriptors Storage Capacity
  • 28. Operating Systems Orchestration Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes PhysicalVirtualizationPublic Cloud Developer Services Registry Services Access Policies App Lifecycle Management Automation & Extensibility Networking Orchestration Storage Container Engine ENTERPRISE PLATFORM Platform Security State Deployment Rates Capacity Scheduling Events
  • 29. Operating Systems Applications Config Mgt Monitoring LoggingCI/CD ..more..Images Networking Volumes PhysicalVirtualizationPublic Cloud Developer Services Registry Services Access Policies App Lifecycle Management Automation & Extensibility Networking Orchestration Storage Container Engine ENTERPRISE PLATFORM Platform Security Response Times Error Rates App Specific Metrics Availability
  • 31.
  • 32. Agenda - [ X ] Introduction - [ X ] Operations Overview - [ ] Logging Workshop - [ ] Monitoring Workshop - [ ] Best Practices & Recap
  • 36. ● What should we log? ● Where and how long should we store logs ● Analysis Logging Challenges
  • 37. ● docker logs ● docker-compose logs ● docker service logs ● log drivers Logging Tools
  • 40. Agenda - [ X ] Introduction - [ X ] Operations Overview - [ X ] Logging Workshop - [ ] Monitoring Workshop - [ ] Best Practices & Recap
  • 41. ● What's Broken? ● Why is it Broken? ● How Long has it been broken? The Basis of Monitoring
  • 42. ● Docker Stats ● Docker Top ● Docker df ● cAdvisor ● Prometheus Monitoring Tools
  • 43. • Live container resources • All containers or single • Very basic but useful info Docker Stats
  • 44. Docker Top ● Display running process in a container
  • 45. • Developed by Google • Real-time Data • Clean UI • Exposes Metrics • Integrates well
  • 46.
  • 50. Agenda - [ X ] Introduction - [ X ] Operations Overview - [ X ] Logging Workshop - [ X ] Monitoring Workshop - [ ] Best Practices & Recap
  • 51. • Start small & increment • Don’t Overlert yourself • Set Resource Limits • Aim for actionable Information • Run separate from Workload • Test for Failures • Know your Failure Models Best Practices
  • 52. - 56K.Cloud - https://56K.Cloud - Prometheus - https://github.com/vegasbrianc/prometheus - ELK - https://github.com/deviantony/docker-elk - Labs – github.com/56kcloud/Training/tree/master/DockerCon - Docker Resource Link - https://awesome-docker.netlify.com Resources
  • 53. Agenda - [ X ] Introduction - [ X ] Operations Overview - [ X ] Monitoring Workshop - [ X ] Logging Workshop - [ X ] Best Practices & Recap
  • 55. Transparency into your workloads Docker Monitoring & Logging