Logging for Production Systems in The Container Era

Sadayuki Furuhashi
Sadayuki FuruhashiFounder & Architect at Treasure Data, Inc.
Logging for Production Systems
in The Container Era
Sadayuki Furuhashi

Founder & Software Architect
DOCKER MOUNTAIN VIEW
A little about me…
Sadayuki Furuhashi
github: @frsyuki
A founder of Treasure Data, Inc. located in Silicon Valley.
Fluentd - Unifid log collection infrastracture Embulk - Plugin-based ETL tool
OSS projects I founded:
An open-source hacker.
It's like JSON.
but fast and small.
A little about me…
The Container Era
Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
Server Era Container Era
Service Architecture Monolithic Microservices
System Image Mutable Immutable
Managed By Ops Team DevOps Team
Local Data Persistent Ephemeral
Log Collection syslogd / rsync ?
Metrics Collection Nagios / Zabbix ?
The Container Era
How should log & metrics collection
be done in The Container Era?
Problems
The traditional logrotate + rsync on containers
Log Server
Application
Container A
File FileFile
Hard to analyze!!

Complex text parsers
Application
Container C
File FileFile
Application
Container B
File FileFile
High latency!!

Must wait for a day
Ephemeral!!

Could be lost at any time
Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Small & many containers make storages overloaded
Too many connections
from micro containers!
Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
System images are immutable
Too many connections
from micro containers!
Embedding destination IPs

in ALL Docker images

makes management hard
Combination explosion with microservices

requires too many scripts for data integration
LOG
script to
parse data
cron job for
loading
filtering
script
syslog
script
Tweet-
fetching
script
aggregation
script
aggregation
script
script to
parse data
rsync
server
A solution: centralized log collection service
LOG
Log Service
The centralized log collection service
LOG
The centralized log collection service
LOG
We Released!

(Apache License)
What’s Fluentd?
Simple core

+ Variety of plugins
Buffering, HA (failover),
Secondary output, etc.
Like syslogd
AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL
How to collect logs from

Docker containers
Text logging with --log-driver=fluentd
Server
Container
App
FluentdSTDOUT / STDERR
docker run 
--log-driver=fluentd 

--log-opt 
fluentd-address=localhost:24224
{

“container_id”: “ad6d5d32576a”,

“container_name”: “myapp”,

“source”: stdout

}
Metrics collection with fluent-logger
Server
Container
App
Fluentd
from fluent import sender
from fluent import event
sender.setup('app.events', host='localhost')
event.Event('purchase', {
'user_id': 21, 'item_id': 321, 'value': '1'
})
tag = app.events.purchase

{

“user_id”: 21,

“item_id”: 321

“value”: 1,

}
fluent-logger library
Logging methods for each purpose
• Collecting log messages
> --log-driver=fluentd
• Application metrics
> fluent-logger
• Access logs, logs from middleware
> Shared data volume
• System metrics (CPU usage, Disk capacity, etc.)
> Fluentd’s input plugins

(Fluentd pulls those data periodically)
Deployment Patterns
Server 1
Container A
Application
Container B
Application
Server 2
Container C
Application
Container D
Application
Kafka
elasticsearch
HDFS
Container
Container
Container
Container
Primitive deployment…
Too many connections
from many containers!
Embedding destination IPs

in ALL Docker images

makes management hard
Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd Kafka
elasticsearch
HDFS
Container
Container
Container
Container
destination is always
localhost from app’s
point of view
Source aggregation decouples config
from apps
Server 1
Container A
Application
Container B
Application
Fluentd
Server 2
Container C
Application
Container D
Application
Fluentd
active / standby /
load balancing
Destination aggregation makes storages scalable
for high traffic
Aggregation server(s)
Aggregation servers
• Logging directly from microservices makes log
storages overloaded.
> Too many RX connections
> Too frequent import API calls
• Aggregation servers make the logging infrastracture
more reliable and scalable.
> Connection aggregation
> Buffering for less frequent import API calls
> Data persistency during downtime
> Automatic retry at recovery from downtime
Fluentd Internal Architecture
Internal Architecture (simplified)
Plugin
Input Filter Buffer Output
Plugin Plugin Plugin
2012-02-04 01:33:51

myapp.buylog{

“user”:”me”,

“path”: “/buyItem”,

“price”: 150,

“referer”: “/landing”

}
Time
Tag
Record
Architecture: Input Plugins
HTTP+JSON (in_http)

File tail (in_tail)

Syslog (in_syslog)

…
Receive logs
Or pull logs from data sources
In non-blocking manner
Plugin
Input
Filter
Architecture: Filter Plugins
Transform logs
Filter out unnecessary logs
Enrich logs
Plugin
Encrypt personal data

Convert IP to countries

Parse User-Agent

…
Buffer
Architecture: Buffer Plugins
Plugin
Improve performance
Provide reliability
Provide thread-safety
Memory (buf_memory)

File (buf_file)
Architecture: Output Plugins
Output
Write or send event logs
Plugin
File (out_file)

Amazon S3 (out_s3)

MongoDB (out_mongo)

…
Buffer
Architecture: Buffer Plugins
Chunk
Plugin
Improve performance
Provide reliability
Provide thread-safety
Input
Output
Chunk
Chunk
Retry
Error
Retry
Batch
Stream Error
Retry
Retry
Divide & Conquer for retry
Divide & Conquer for recovery
Buffer
(on-disk or in-memory)
Error
Overloaded!!
recovery
recovery + flow control
queued chunks
Example Use Cases
Streaming from Apache/Nginx to Elasticsearch
in_tail
/var/log/access.log
/var/log/fluentd/buffer
but_file
Error Handling and Recovery
in_tail
/var/log/access.log
/var/log/fluentd/buffer
but_file
Buffering for any outputs
Retrying automatically
With exponential wait
and persistence on a disk
and secondary output
Tailing & parsing files
Supported built-in formats:
Read a log file
Custom regexp
Custom parser in Ruby
• apache
• apache_error
• apache2
• nginx
• json
• csv
• tsv
• ltsv
• syslog
• multiline
• none
pos fileevents.log
?
(your app)
Out to Multiple Locations
Routing based on tags
Copy to multiple storages
buffer
access.log
in_tail
Example configuration for real time batch combo
Data partitioning by time on HDFS / S3
access.log
buffer
Custom file
formatter
Slice files based on time
2016-01-01/01/access.log.gz
2016-01-01/02/access.log.gz
2016-01-01/03/access.log.gz
…
in_tail
3rd party input plugins
dstat
df AMQL
munin
jvmwatcher
SQL
3rd party output plugins
AMQL
Graphite
Real World Use Cases
Microsoft
Operations Management Suite uses Fluentd: "The core of the agent uses an existing
open source data aggregator called Fluentd. Fluentd has hundreds of existing
plugins, which will make it really easy for you to add new data sources."
Syslog
Linux Computer
Operating System
Apache
MySQL
Containers
omsconfig (DSC)
PS DSC
Providers
OMI Server
(CIM Server)
omsagent
Firewall/proxy
OMSService
Upload Data

(HTTPS)
Pull

configuration

(HTTPS)
Atlassian
"At Atlassian, we've been impressed by Fluentd and have chosen to use it in
Atlassian Cloud's logging and analytics pipeline."
Kinesis
Elasticsearch

cluster
Ingestion

service
Amazon web services
The architecture of Fluentd (Sponsored by Treasure Data) is very similar to Apache
Flume or Facebook’s Scribe. Fluentd is easier to install and maintain and has better
documentation and support than Flume and Scribe.
Types of DataStoreCollect
Transactional
• Database reads & write (OLTP)

• Cache
Search
• Logs

• Streams
File
• Log files (/val/log)

• Log collectors & frameworks
Stream
• Log records

• Sensors & IoT data
Web Apps
IoTApplicationsLogging
Mobile Apps
Database
Search
File Storage
Stream Storage
Thank you!
1 of 46

More Related Content

Viewers also liked(20)

Similar to Logging for Production Systems in The Container Era

Fluentd at HKOSconFluentd at HKOScon
Fluentd at HKOSconN Masahiro
1.4K views45 slides
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65N Masahiro
2.9K views52 slides

More from Sadayuki Furuhashi(14)

Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
Sadayuki Furuhashi2.6K views
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure Data
Sadayuki Furuhashi1.9K views
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
Sadayuki Furuhashi2.6K views
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
Sadayuki Furuhashi7.8K views
Fluentd meetupFluentd meetup
Fluentd meetup
Sadayuki Furuhashi2.4K views
upload test 1upload test 1
upload test 1
Sadayuki Furuhashi1.5K views
Gumi study7 messagepackGumi study7 messagepack
Gumi study7 messagepack
Sadayuki Furuhashi1K views
gumiStudy#7 The MessagePack ProjectgumiStudy#7 The MessagePack Project
gumiStudy#7 The MessagePack Project
Sadayuki Furuhashi1K views
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi2.4K views

Logging for Production Systems in The Container Era

  • 1. Logging for Production Systems in The Container Era Sadayuki Furuhashi
 Founder & Software Architect DOCKER MOUNTAIN VIEW
  • 2. A little about me… Sadayuki Furuhashi github: @frsyuki A founder of Treasure Data, Inc. located in Silicon Valley. Fluentd - Unifid log collection infrastracture Embulk - Plugin-based ETL tool OSS projects I founded: An open-source hacker.
  • 3. It's like JSON. but fast and small. A little about me…
  • 4. The Container Era Server Era Container Era Service Architecture Monolithic Microservices System Image Mutable Immutable Managed By Ops Team DevOps Team Local Data Persistent Ephemeral Log Collection syslogd / rsync ? Metrics Collection Nagios / Zabbix ?
  • 5. Server Era Container Era Service Architecture Monolithic Microservices System Image Mutable Immutable Managed By Ops Team DevOps Team Local Data Persistent Ephemeral Log Collection syslogd / rsync ? Metrics Collection Nagios / Zabbix ? The Container Era How should log & metrics collection be done in The Container Era?
  • 7. The traditional logrotate + rsync on containers Log Server Application Container A File FileFile Hard to analyze!! Complex text parsers Application Container C File FileFile Application Container B File FileFile High latency!! Must wait for a day Ephemeral!! Could be lost at any time
  • 8. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container Small & many containers make storages overloaded Too many connections from micro containers!
  • 9. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container System images are immutable Too many connections from micro containers! Embedding destination IPs
 in ALL Docker images
 makes management hard
  • 10. Combination explosion with microservices
 requires too many scripts for data integration LOG script to parse data cron job for loading filtering script syslog script Tweet- fetching script aggregation script aggregation script script to parse data rsync server
  • 11. A solution: centralized log collection service LOG Log Service
  • 12. The centralized log collection service LOG
  • 13. The centralized log collection service LOG We Released!
 (Apache License)
  • 14. What’s Fluentd? Simple core
 + Variety of plugins Buffering, HA (failover), Secondary output, etc. Like syslogd AN EXTENSIBLE & RELIABLE DATA COLLECTION TOOL
  • 15. How to collect logs from
 Docker containers
  • 16. Text logging with --log-driver=fluentd Server Container App FluentdSTDOUT / STDERR docker run --log-driver=fluentd 
 --log-opt fluentd-address=localhost:24224 { “container_id”: “ad6d5d32576a”, “container_name”: “myapp”, “source”: stdout }
  • 17. Metrics collection with fluent-logger Server Container App Fluentd from fluent import sender from fluent import event sender.setup('app.events', host='localhost') event.Event('purchase', { 'user_id': 21, 'item_id': 321, 'value': '1' }) tag = app.events.purchase { “user_id”: 21, “item_id”: 321 “value”: 1, } fluent-logger library
  • 18. Logging methods for each purpose • Collecting log messages > --log-driver=fluentd • Application metrics > fluent-logger • Access logs, logs from middleware > Shared data volume • System metrics (CPU usage, Disk capacity, etc.) > Fluentd’s input plugins
 (Fluentd pulls those data periodically)
  • 20. Server 1 Container A Application Container B Application Server 2 Container C Application Container D Application Kafka elasticsearch HDFS Container Container Container Container Primitive deployment… Too many connections from many containers! Embedding destination IPs
 in ALL Docker images
 makes management hard
  • 21. Server 1 Container A Application Container B Application Fluentd Server 2 Container C Application Container D Application Fluentd Kafka elasticsearch HDFS Container Container Container Container destination is always localhost from app’s point of view Source aggregation decouples config from apps
  • 22. Server 1 Container A Application Container B Application Fluentd Server 2 Container C Application Container D Application Fluentd active / standby / load balancing Destination aggregation makes storages scalable for high traffic Aggregation server(s)
  • 23. Aggregation servers • Logging directly from microservices makes log storages overloaded. > Too many RX connections > Too frequent import API calls • Aggregation servers make the logging infrastracture more reliable and scalable. > Connection aggregation > Buffering for less frequent import API calls > Data persistency during downtime > Automatic retry at recovery from downtime
  • 25. Internal Architecture (simplified) Plugin Input Filter Buffer Output Plugin Plugin Plugin 2012-02-04 01:33:51 myapp.buylog{ “user”:”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” } Time Tag Record
  • 26. Architecture: Input Plugins HTTP+JSON (in_http) File tail (in_tail) Syslog (in_syslog) … Receive logs Or pull logs from data sources In non-blocking manner Plugin Input
  • 27. Filter Architecture: Filter Plugins Transform logs Filter out unnecessary logs Enrich logs Plugin Encrypt personal data Convert IP to countries Parse User-Agent …
  • 28. Buffer Architecture: Buffer Plugins Plugin Improve performance Provide reliability Provide thread-safety Memory (buf_memory) File (buf_file)
  • 29. Architecture: Output Plugins Output Write or send event logs Plugin File (out_file) Amazon S3 (out_s3) MongoDB (out_mongo) …
  • 30. Buffer Architecture: Buffer Plugins Chunk Plugin Improve performance Provide reliability Provide thread-safety Input Output Chunk Chunk
  • 32. Divide & Conquer for recovery Buffer (on-disk or in-memory) Error Overloaded!! recovery recovery + flow control queued chunks
  • 34. Streaming from Apache/Nginx to Elasticsearch in_tail /var/log/access.log /var/log/fluentd/buffer but_file
  • 35. Error Handling and Recovery in_tail /var/log/access.log /var/log/fluentd/buffer but_file Buffering for any outputs Retrying automatically With exponential wait and persistence on a disk and secondary output
  • 36. Tailing & parsing files Supported built-in formats: Read a log file Custom regexp Custom parser in Ruby • apache • apache_error • apache2 • nginx • json • csv • tsv • ltsv • syslog • multiline • none pos fileevents.log ? (your app)
  • 37. Out to Multiple Locations Routing based on tags Copy to multiple storages buffer access.log in_tail
  • 38. Example configuration for real time batch combo
  • 39. Data partitioning by time on HDFS / S3 access.log buffer Custom file formatter Slice files based on time 2016-01-01/01/access.log.gz 2016-01-01/02/access.log.gz 2016-01-01/03/access.log.gz … in_tail
  • 40. 3rd party input plugins dstat df AMQL munin jvmwatcher SQL
  • 41. 3rd party output plugins AMQL Graphite
  • 42. Real World Use Cases
  • 43. Microsoft Operations Management Suite uses Fluentd: "The core of the agent uses an existing open source data aggregator called Fluentd. Fluentd has hundreds of existing plugins, which will make it really easy for you to add new data sources." Syslog Linux Computer Operating System Apache MySQL Containers omsconfig (DSC) PS DSC Providers OMI Server (CIM Server) omsagent Firewall/proxy OMSService Upload Data (HTTPS) Pull configuration (HTTPS)
  • 44. Atlassian "At Atlassian, we've been impressed by Fluentd and have chosen to use it in Atlassian Cloud's logging and analytics pipeline." Kinesis Elasticsearch cluster Ingestion service
  • 45. Amazon web services The architecture of Fluentd (Sponsored by Treasure Data) is very similar to Apache Flume or Facebook’s Scribe. Fluentd is easier to install and maintain and has better documentation and support than Flume and Scribe. Types of DataStoreCollect Transactional • Database reads & write (OLTP) • Cache Search • Logs • Streams File • Log files (/val/log) • Log collectors & frameworks Stream • Log records • Sensors & IoT data Web Apps IoTApplicationsLogging Mobile Apps Database Search File Storage Stream Storage