Logs in JSON? Why?
1. Machine-Readable
> machine is goint to be a main consumer of logs
2. Schema-Free
> you want to add/remove fields from logs at anytime
Write Logs for Machines, use JSON
http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
9
Tuesday, July 17, 2012
Logs As TEXT
Logs As JSON
+ Field Name
+ No Custom Parser
+ Type Information
+ Schema Free
10
Tuesday, July 17, 2012
Logs As TEXT
“2011-04-01 host1 myapp: cmessage size=12MB user=me”
Logs As JSON
2011-04-01 myapp.message {
“on_host”: ”host1”,
”combined”: true,
“size”: 12000000, + Field Name
“user”: “me” + No Custom Parser
+ Type Information
} + Schema Free
10
Tuesday, July 17, 2012
• Website
> http://fluentd.org/
• Community
> http://github.com/fluent
> 16 committers across
many organizations
> web, game, enterprise
• Mailing list
> Google groups
12
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
Burst of traffic
rsync consumes
all bandwidth
18
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
App server App server App server
Application Application Application
File File File ... File File File ... File File File ...
File
Burst of traffic High latency
rsync consumes must wait for a day
all bandwidth Log server Hard to analyze
complex text parsers
18
Tuesday, July 17, 2012
Log Collection using Fluentd
Fluentd Fluentd Fluentd
Realtime!
Fluentd Fluentd
19
Tuesday, July 17, 2012
Log Collection using Fluentd
Fluentd Fluentd Fluentd
Realtime!
Fluentd Fluentd
Amazon Ready to
Hadoop Mongo
S3 /
/ Hive DB
EMR Analyze!
19
Tuesday, July 17, 2012
Fluentd Case Study
Ruby on Rails Ruby on Rails Ruby on Rails
Fluentd Fluentd Fluentd
✓ 127 RoR servers
✓ 100,000 msgs/sec Fluentd Fluentd routing
✓ 120Mbps at peak
✓ 1TB/day
Hadoop Mongo User behavior
PV logs / Hive DB logs
20
Tuesday, July 17, 2012
# read logs from a file # forward other logs to servers
<source> # (load-balancing + fail-over)
type tail <match **>
path /var/log/httpd.log type forward
format apache <server>
tag apache.access host 192.168.0.11
</source> weight 20
</server>
# save access logs to MongoDB <server>
<match apache.access> host 192.168.0.12
type mongo weight 60
host 127.0.0.1 </server>
</match> </match>
Tuesday, July 17, 2012
Scribe’s Pros & Cons
• Pros.
• Fast (written in C++)
• Cons.
• VERY HARD to install
• nightmare of boost, thrift, libhdfs, etc.
• Unstructured Logs
• parsing must be required before the analysis
• Hard to extend
• recompiling C++ programs are required
• No longer maintained
24
Tuesday, July 17, 2012
Fluentd vs Scribe
• Easy to install
• “gem install fluentd”
• Stable RPM and Deb packages
• http://packages.treasure-data.com/
• Easy to write plugins
• you can use Ruby
• Easy plugin distribution
• “gem search -rd fluent-plugin”
25
Tuesday, July 17, 2012
Flume’s Pros & Cons
• Pros.
• Central master server manages all nodes
• Cons.
• Difficult to understand
• logical topologies, phisical servers and a
configuration of the logical/phisical mapping
• Difficult to configure
• replicated master servers, log servers and agents
• Big footprint
• 50,000 lines of Java
27
Tuesday, July 17, 2012
Fluentd vs Flume
• Easy to understand
• “syslogd that understands JSON”
• Easy to setup
• “sudo fluentd --setup && fluentd”
• Very small footprint
• small engine (3,000) lines + plugins
• small, but battle-tested!
• Easy to configure
28
Tuesday, July 17, 2012
Fluentd Scribe Flume
Installation gem/rpm/deb make jar/rpm/deb
3000 lines of 8000 lines of 50,000 lines of
Footprint Ruby C++ Java
Plugin Ruby N/A Java
Plugin distribution RubyGems.org N/A N/A
Master Server No No Yes
License Apache License Apache License Apache License
29
Tuesday, July 17, 2012
fluent-plugin-mongo
• Included within rpm/deb by default!
• http://github.com/fluent/fluent-plugin-mongo
• #1 plugin among 50+ Fluentd plugins
• Logs As JSON. WHY NOT Put Them Into Mongo??
• http://fluentd.org/plugin/
• Supports most of the MongoDB features
• Authentication
• ReplicaSet
• Capped Collection
31
Tuesday, July 17, 2012