Logging for Production Systems in The Container Era
1. Logging for Production Systems
in The Container Era
Sadayuki Furuhashi
Founder & Software Architect
DOCKER MOUNTAIN VIEW
2. A little about me…
Sadayuki Furuhashi
github: @frsyuki
A founder of Treasure Data, Inc. located in Silicon Valley.
Fluentd - Unifid log collection infrastracture Embulk - Plugin-based ETL tool
OSS projects I founded:
An open-source hacker.
4. The Container Era
Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
5. Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
The Container Era
How should log & metrics collection
be done in The Container Era?
7. The traditional logrotate + rsync on containers
Log Server
Application
Container A
File FileFile
Hard to analyze!!
Complex text parsers
Application
Container C
File FileFile
Application
Container B
File FileFile
High latency!!
Must wait for a day
Ephemeral!!
Could be lost at any time
8. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Small & many containers make storages overloaded
Too many connections
from micro containers!
9. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
System images are immutable
Too many connections
from micro containers!
Embedding destination IPs
in ALL Docker images
makes management hard
10. Combination explosion with microservices
requires too many scripts for data integration
LOG
script to
parse data
cron job for
loading
filtering
script
syslog
script
Tweet-
fetching
script
aggregation
script
aggregation
script
script to
parse data
rsync
server
14. What’s Fluentd?
Simple core
+ Variety of plugins
Buffering, HA (failover),
Secondary output, etc.
Like syslogd
AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL
20. Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Primitive deployment…
Too many connections
from many containers!
Embedding destination IPs
in ALL Docker images
makes management hard
21. Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd Kafka
elasticsearch
HDFS
Container
Container
Container
Container
destination is always
localhost from app’s
point of view
Source aggregation decouples config
from apps
22. Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd
active / standby /
load balancing
Destination aggregation makes storages scalable
for high traffic
Aggregation server(s)
23. Aggregation servers
• Logging directly from microservices makes log
storages overloaded.
> Too many RX connections
> Too frequent import API calls
• Aggregation servers make the logging infrastracture
more reliable and scalable.
> Connection aggregation
> Buffering for less frequent import API calls
> Data persistency during downtime
> Automatic retry at recovery from downtime
35. Error Handling and Recovery
in_tail
/var/log/access.log
/var/log/fluentd/buffer
but_file
Buffering for any outputs
Retrying automatically
With exponential wait
and persistence on a disk
and secondary output
39. Data partitioning by time on HDFS / S3
access.log
buffer
Custom file
formatter
Slice files based on time
2016-01-01/01/access.log.gz
2016-01-01/02/access.log.gz
2016-01-01/03/access.log.gz
…
in_tail
43. Microsoft
Operations Management Suite uses Fluentd: "The core of the agent uses an existing
open source data aggregator called Fluentd. Fluentd has hundreds of existing
plugins, which will make it really easy for you to add new data sources."
Syslog
Linux Computer
Operating System
Apache
MySQL
Containers
omsconfig (DSC)
PS DSC
Providers
OMI Server
(CIM Server)
omsagent
Firewall/proxy
OMSService
Upload Data
(HTTPS)
Pull
configuration
(HTTPS)
44. Atlassian
"At Atlassian, we've been impressed by Fluentd and have chosen to use it in
Atlassian Cloud's logging and analytics pipeline."
Kinesis
Elasticsearch
cluster
Ingestion
service
45. Amazon web services
The architecture of Fluentd (Sponsored by Treasure Data) is very similar to Apache
Flume or Facebook’s Scribe. Fluentd is easier to install and maintain and has better
documentation and support than Flume and Scribe.
Types of DataStoreCollect
Transactional
• Database reads & write (OLTP)
• Cache
Search
• Logs
• Streams
File
• Log files (/val/log)
• Log collectors & frameworks
Stream
• Log records
• Sensors & IoT data
Web Apps
IoTApplicationsLogging
Mobile Apps
Database
Search
File Storage
Stream Storage