SlideShare a Scribd company logo
1 of 42
Monitoring and
Log Management for
Docker Swarm and Kubernetes
Stefan Thies Sematext Group, Inc.
Sematext & I
Logsene
SPM
logs
metrics
Docker Agent
#nodejs
Agenda
• What is
• Centralized Log Management + Performance Monitoring
• Kubernetes / Swarm
• Container Logs
• Container Metrics
• Example: Swarm3k Monitoring
• Summary
Centralized Log Management
Logagent
Centralized Monitoring
Expose
Metrics
Collect
Metrics
Ship Metrics
Store
Metrics
Aggregate
Metrics
Visualize
Metrics
• Correlation
with Logs
Anomaly
Detection
Alerting
Server +
App / Container
Configuration
Monitoring Agents Time Series
Database
Dashboard Tools,
Alerting Tools,
ChatOps Tools
https://sematext.com/blog/2016/07/19/open-source-docker-monitoring-logging/
Orchestration
Container
POD
Node Node 1
POD 1
Namespace
ns1
Kibana Elasticsearch
POD 2
Namespace
ns2
Redis
Services (proxy)
Replication
Controllers
DaemonSets
3
HorizontalPod
Autoscaler
Kubernetes Dashboard / Heapster
• Current status
• Shows basic resource usage
for workloads (Pod)
• Simple logs view
• Heapster is required for
autoscaling features
Orchestration
Container
Stacks
Nodes Node 1
ELK
(compose,
app bundle)
Kibana 1 Elasticsearch 1
Redis
(service)
redis1
3
Node 2
ELK
Elasticsearch
2
Elasticsearch
3
Kubernetes != Swarm
• Common base is Docker
• Docker Logs & Metrics
• Docker API
Container Logs
Docker Logging Drivers
Docker
json-file (default) Files
journald (CoreOS) System journal
Syslog
TCP
UDP
Fluentd TCP
$plunk TCP
Gelf
Centralized
Log Management
Local Log Shipper
Docker logs
Containers (should) log to stdout/stderr !!!
docker logs container_id
docker logs container_name
Docker
API
Docker
client
Container
logs
Fun with Docker logging drivers
$ docker run --log-driver=syslog
--log-opt syslog-address=udp://$HOSTNAME:514
--log-opt tag=„{{.ImageName}}#{{.Name}}#{{.ID}}"
-p 9003:80 –name nginx1 -d nginx
$ docker logs nginx 1
"logs" command is supported only for "json-file"
and "journald" logging drivers (got: syslog)
Add
Context!
More fun with TCP logging drivers!
docker run --log-driver=syslog --log-opt syslog-
address=tcp://127.0.0.1:514 --log-opt
tag="{{.ImageName}}#{{.Name}}#{{.ID}}" -p
9004:80 -d nginx
docker: Error response from daemon: Failed to
initialize logging driver: dial tcp
127.0.0.1:514: getsockopt: connection refused.
Fix it – run syslog server first!
docker run -d -p 514:514 factorish/syslog -t tcp
docker run –logging-driver=syslog … nginx
curl localhost:9004
docker logs syslog
==> syslog listening on tcp
<30>Nov 17 18:23:43 nginx#nginx1#afebdfff0eed[1710]:
172.17.0.1 - - [17/Nov/2016:18:23:43 +0000] "GET /
HTTP/1.1" 200 612 "-" "curl/7.49.1" "-"
Is UDP
better?
Alternatives?
Docker
Log files
json-file or
journald
API
Agent
Remote
Log Storage
Disk
Buffer
Docker API provides
the most complete
information!
Reliable networks and
backend services?
Better buffer & retransmit
in case of failure!
Attach metadata to
logs/metrics or
route data to
different servers or
indices?
“docker logs”
works & logs are
stored on local
disk!
Centralize search,
analytics, alerts,
access permissions
Parse logs
Automatic tagging of logs, metrics, events
• Automatic tagging of log / metrics with
• Docker
• Container Name / ID
• Image Name / ID
• Labels / Environment
• Hostname / IP
• Kubernetes
• Namespace, Pod Name , UID
• Swarm
• Swarm Service Name , ID, Compose Project, Container # (scale)
• Single collector for logs, metrics, events, metadata
• Base for correlation and visualisation
Container Metrics Collection
Collection
Metric collection via Docker API
Smart monitoring agent - all in one
Docker
API
Agent
Remote
Storage
Disk
Buffer
Docker API provides
Labels, Metrics,
Logs, Events …
Reliable networks and
backend services?
Better buffer & retransmit
in case of failure!Auto-tagging using
container labels.
Discovery of
services Centralize logs, metrics,
analytics, alerts, access
permissions
Metrics,
Logs, Events
Integrate application monitoring in the stack
- Custom images
- add/remove
app with all req. options
- Start monitoring,
reading config from etcd
App
Config to expose
metrics
App Monitor
Configured for App
Container
Service Discovery
etcd
consul
Auto Discovery via Docker API and Labels?
App Container
config to expose
metrics
App MonitorDocker Monitor
run
discovery
Docker
Automatic
run
Key Container Metrics
Node Storage
• Good kids clean up their rooms. Good Docker ops clean up their
disks by removing unused containers & images.
Number of containers per host
• Verify deployment strategies
CPU quota per container
Container memory and OOM counter
Docker Events
Swarm Task Status
Limit container resources for your apps!
• Set CPU quotas –cpu-quota=6000
• Limit Memory and configure App in container to the same limits!
-m 512mb
• Disable Swap: –memory-swap=-1
• To limit a Docker container from eating all your disk IO use
e.g. –device-write-bps /dev/sda:1mb
Automatic Deployment of monitoring agents
• One command to run a service
on each node joining the cluster
• Kubernetes:
• DaemonSet creates a pod per
node
kubectl create -f
sematext-agent.yml
• Swarm:
• Global Service docker
service create –mode
global ...
Swarm3k Monitoring
Swarm3k Requirements
• Monitoring
• Host metrics
• Container metrics
• Docker Events
• Task Monitoring
• Collect Container Logs: Task Errors only
• 3000+ Nodes (actual: 4.7k)
• 150.000 (actual: 60k)
• Duration 8 hours – 28 GB data collected
• Public/shared Dashboard for the community
Pre-flight test with 500 nodes
• 60.000 containers deployed in less than 5 minutes!
Swarm3k in one picture
Limits in visualisation
Missing Labels to
group hosts or
containers
Summary
• Setup of Monitoring & Logging is complex in dynamic environments
• Kubernetes != Swarm (yet). Common base: Docker Containers
• Smart Agents to collect, analyze, aggregate metrics, events and logs
• Auto discovery of containers for data collection
• Use metadata tag metrics & logs as base for correlation and visualization
• Integrate monitoring in application stacks for app specific metrics
• Auto Discovery of services and automatic configuration for application level
monitoring
We are engineers!
We develop DevOps tools!
We are DevOps people!
We do fun stuff ;)
http://sematext.com/jobs
Thank you for listening! Get in touch!
Stefan
stefan.thies@sematext.com
@seti321
http://sematext.com
@sematext http://sematext.com/jobs
Come talk to us
at the booth

More Related Content

What's hot

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationPatrick Di Loreto
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Databricks
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerVu Nguyen Duy
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLDatadog
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbJen Aman
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouDatabricks
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingDataWorks Summit
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesAnand Narayanan
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Databricks
 

What's hot (20)

How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Deploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and dockerDeploy data analysis pipeline with mesos and docker
Deploy data analysis pipeline with mesos and docker
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Spark Working Environment in Windows OS
Spark Working Environment in Windows OSSpark Working Environment in Windows OS
Spark Working Environment in Windows OS
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
Flink Forward Berlin 2017: Steffen Hausmann - Build a Real-time Stream Proces...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
Managing Thousands of Spark Workers in Cloud Environment with Yuhao Zheng and...
 

Similar to DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and Kubernetes

Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleDatadog
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Anthony Dahanne
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceDocker, Inc.
 
Nex clipper 1905_summary_eng
Nex clipper 1905_summary_engNex clipper 1905_summary_eng
Nex clipper 1905_summary_engJinyong Kim
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...Amazon Web Services
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetesDongwon Kim
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Monitoring cloud applications and hyperconverged infrastructure
Monitoring cloud applications and hyperconverged infrastructureMonitoring cloud applications and hyperconverged infrastructure
Monitoring cloud applications and hyperconverged infrastructureManageEngine, Zoho Corporation
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresDocker, Inc.
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Datadog
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes VikRam S
 

Similar to DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and Kubernetes (20)

Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018Kubernetes for java developers - Tutorial at Oracle Code One 2018
Kubernetes for java developers - Tutorial at Oracle Code One 2018
 
How to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experienceHow to accelerate docker adoption with a simple and powerful user experience
How to accelerate docker adoption with a simple and powerful user experience
 
Nex clipper 1905_summary_eng
Nex clipper 1905_summary_engNex clipper 1905_summary_eng
Nex clipper 1905_summary_eng
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Monitoring cloud applications and hyperconverged infrastructure
Monitoring cloud applications and hyperconverged infrastructureMonitoring cloud applications and hyperconverged infrastructure
Monitoring cloud applications and hyperconverged infrastructure
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failures
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes
 
On Prem Container Cloud - Lessons Learned
On Prem Container Cloud - Lessons LearnedOn Prem Container Cloud - Lessons Learned
On Prem Container Cloud - Lessons Learned
 

Recently uploaded

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and Kubernetes

  • 1. Monitoring and Log Management for Docker Swarm and Kubernetes Stefan Thies Sematext Group, Inc.
  • 3. Agenda • What is • Centralized Log Management + Performance Monitoring • Kubernetes / Swarm • Container Logs • Container Metrics • Example: Swarm3k Monitoring • Summary
  • 5. Centralized Monitoring Expose Metrics Collect Metrics Ship Metrics Store Metrics Aggregate Metrics Visualize Metrics • Correlation with Logs Anomaly Detection Alerting Server + App / Container Configuration Monitoring Agents Time Series Database Dashboard Tools, Alerting Tools, ChatOps Tools
  • 7. Orchestration Container POD Node Node 1 POD 1 Namespace ns1 Kibana Elasticsearch POD 2 Namespace ns2 Redis Services (proxy) Replication Controllers DaemonSets 3 HorizontalPod Autoscaler
  • 8. Kubernetes Dashboard / Heapster • Current status • Shows basic resource usage for workloads (Pod) • Simple logs view • Heapster is required for autoscaling features
  • 9. Orchestration Container Stacks Nodes Node 1 ELK (compose, app bundle) Kibana 1 Elasticsearch 1 Redis (service) redis1 3 Node 2 ELK Elasticsearch 2 Elasticsearch 3
  • 10. Kubernetes != Swarm • Common base is Docker • Docker Logs & Metrics • Docker API
  • 12. Docker Logging Drivers Docker json-file (default) Files journald (CoreOS) System journal Syslog TCP UDP Fluentd TCP $plunk TCP Gelf Centralized Log Management Local Log Shipper
  • 13. Docker logs Containers (should) log to stdout/stderr !!! docker logs container_id docker logs container_name Docker API Docker client Container logs
  • 14. Fun with Docker logging drivers $ docker run --log-driver=syslog --log-opt syslog-address=udp://$HOSTNAME:514 --log-opt tag=„{{.ImageName}}#{{.Name}}#{{.ID}}" -p 9003:80 –name nginx1 -d nginx $ docker logs nginx 1 "logs" command is supported only for "json-file" and "journald" logging drivers (got: syslog) Add Context!
  • 15. More fun with TCP logging drivers! docker run --log-driver=syslog --log-opt syslog- address=tcp://127.0.0.1:514 --log-opt tag="{{.ImageName}}#{{.Name}}#{{.ID}}" -p 9004:80 -d nginx docker: Error response from daemon: Failed to initialize logging driver: dial tcp 127.0.0.1:514: getsockopt: connection refused.
  • 16. Fix it – run syslog server first! docker run -d -p 514:514 factorish/syslog -t tcp docker run –logging-driver=syslog … nginx curl localhost:9004 docker logs syslog ==> syslog listening on tcp <30>Nov 17 18:23:43 nginx#nginx1#afebdfff0eed[1710]: 172.17.0.1 - - [17/Nov/2016:18:23:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.49.1" "-"
  • 18. Alternatives? Docker Log files json-file or journald API Agent Remote Log Storage Disk Buffer Docker API provides the most complete information! Reliable networks and backend services? Better buffer & retransmit in case of failure! Attach metadata to logs/metrics or route data to different servers or indices? “docker logs” works & logs are stored on local disk! Centralize search, analytics, alerts, access permissions Parse logs
  • 19. Automatic tagging of logs, metrics, events • Automatic tagging of log / metrics with • Docker • Container Name / ID • Image Name / ID • Labels / Environment • Hostname / IP • Kubernetes • Namespace, Pod Name , UID • Swarm • Swarm Service Name , ID, Compose Project, Container # (scale) • Single collector for logs, metrics, events, metadata • Base for correlation and visualisation
  • 22. Metric collection via Docker API
  • 23. Smart monitoring agent - all in one Docker API Agent Remote Storage Disk Buffer Docker API provides Labels, Metrics, Logs, Events … Reliable networks and backend services? Better buffer & retransmit in case of failure!Auto-tagging using container labels. Discovery of services Centralize logs, metrics, analytics, alerts, access permissions Metrics, Logs, Events
  • 24. Integrate application monitoring in the stack - Custom images - add/remove app with all req. options - Start monitoring, reading config from etcd App Config to expose metrics App Monitor Configured for App Container Service Discovery etcd consul
  • 25. Auto Discovery via Docker API and Labels? App Container config to expose metrics App MonitorDocker Monitor run discovery Docker Automatic run
  • 27. Node Storage • Good kids clean up their rooms. Good Docker ops clean up their disks by removing unused containers & images.
  • 28. Number of containers per host • Verify deployment strategies
  • 29. CPU quota per container
  • 30. Container memory and OOM counter
  • 33. Limit container resources for your apps! • Set CPU quotas –cpu-quota=6000 • Limit Memory and configure App in container to the same limits! -m 512mb • Disable Swap: –memory-swap=-1 • To limit a Docker container from eating all your disk IO use e.g. –device-write-bps /dev/sda:1mb
  • 34. Automatic Deployment of monitoring agents • One command to run a service on each node joining the cluster • Kubernetes: • DaemonSet creates a pod per node kubectl create -f sematext-agent.yml • Swarm: • Global Service docker service create –mode global ...
  • 36. Swarm3k Requirements • Monitoring • Host metrics • Container metrics • Docker Events • Task Monitoring • Collect Container Logs: Task Errors only • 3000+ Nodes (actual: 4.7k) • 150.000 (actual: 60k) • Duration 8 hours – 28 GB data collected • Public/shared Dashboard for the community
  • 37. Pre-flight test with 500 nodes • 60.000 containers deployed in less than 5 minutes!
  • 38. Swarm3k in one picture
  • 39. Limits in visualisation Missing Labels to group hosts or containers
  • 40. Summary • Setup of Monitoring & Logging is complex in dynamic environments • Kubernetes != Swarm (yet). Common base: Docker Containers • Smart Agents to collect, analyze, aggregate metrics, events and logs • Auto discovery of containers for data collection • Use metadata tag metrics & logs as base for correlation and visualization • Integrate monitoring in application stacks for app specific metrics • Auto Discovery of services and automatic configuration for application level monitoring
  • 41. We are engineers! We develop DevOps tools! We are DevOps people! We do fun stuff ;) http://sematext.com/jobs
  • 42. Thank you for listening! Get in touch! Stefan stefan.thies@sematext.com @seti321 http://sematext.com @sematext http://sematext.com/jobs Come talk to us at the booth

Editor's Notes

  1. Use json-file or journald log drivers In worst case your logs can be found on the docker host! No connection issues with TCP and no dependency of Containers to a running logging service (everything can break ...) And UDP? No dependency on startup, however UDP packets could be dropped and logs would be lost Use a log shipper Docker API based Logspout Sematext Docker Agent File based Rsyslog, Syslog-ng, Fluentd, Logstash, Logagent, ...
  2. Use json-file or journald log drivers In worst case your logs can be found on the docker host! No connection issues with TCP and no dependency of Containers to a running logging service (everything can break ...) And UDP? No dependency on startup, however UDP packets could be dropped and logs would be lost Use a log shipper Docker API based Logspout Sematext Docker Agent File based Rsyslog, Syslog-ng, Fluentd, Logstash, Logagent, ...