13. Event structure(log message)
✓ Time ✓ Record
> second unit > JSON format
> from data source or > MessagePack
adding parsed time internally
✓ Tag > non-unstructured
> for message routing
16. Configuration and operation
> No central / master node
> HTTP include helps conf sharing
> Operation depends on your environment
> Use your deamon management
> Use chef in Treasure Data
> Scribe like syntax
17. # receive events via HTTP # save alerts to a file
<source> <match alert.**>
type http type file
port 8888 path /var/log/fluent/alerts
</source> </match>
# read logs from a file # forward other logs to servers
<source> <match **>
type tail type forward
path /var/log/httpd.log <server>
format apache host 192.168.0.11
tag apache.access weight 20
</source> </server>
<server>
# save access logs to MongoDB host 192.168.0.12
<match apache.access> weight 60
type mongo </server>
database apache </match>
collection log
</match> include http://example.com/conf
18. Reliability (core + plugin)
> Buffering
> Use file buffer for persistent data
> buffer chunk has ID for idempotent
> Retrying
> Error handling
> transaction, failover, etc on forward plugin
> secondary
19. Plugins - use rubygems
$ fluent-gem search -rd fluent-plugin
$ fluent-gem search -rd fluent-mixin
$ fluent-gem install fluent-plugin-mongo
※ Today, don’t talk the plugin development
30. td-agent
> Open sourced distribution package of fluentd
> ETL part of Treasure Data
> Including useful components
> ruby, jemalloc, fluentd
> 3rd party gems: td, mongo, webhdfs, etc...
td plugin is for TD
> http://packages.treasure-data.com/
31. v11
> Breaking source code compatibility
> Not protocol.
> Windows support
> Error handling enhancement
> Better DSL configuration
> etc: https://gist.github.com/frsyuki/2638703
34. Pros and Cons
>
● Pros
> Fast (written in C++)
>
● Cons
> Hard to install and extend
Are you a C++ magician?
> Deal with unstructured logs
> No longer maintained
Replaced with Calligraphus at Facebook
37. Pros and Cons
>
● Pros
> Using central master to manage all nodes
>
● Cons
> Java culture (Pros for Java-er?)
Difficult configuration and setup
> Difficult topology
> Mainly for Hadoop
less plugins?
39. Treasure Data
Frontend Worker
Job Queue Hadoop
Hadoop
Applications push
metrics to Fluentd
sums up data minutes
(via local Fluentd) Fluentd Fluentd (partial aggregation)
Treasure Librato
Data Metrics
for historical analysis for realtime analysis
40. Cookpad
hundreds of app servers
Rails app td-agent
sends event logs Daily/Hourly Google
Batch Spreadsheet
Rails app td-agent Treasure Data
sends event logs
MySQL
Rails app td-agent
Logs are available
sends event logs
after several mins.
KPI
Feedback rankings visualization
Unlimited scalability
Flexible schema
Realtime
Less performance impact ✓ Over 100 RoR servers (2012/2/4)
41. NHN Japan
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
Notifications
STREAM (IRC)
Fluentd
Watchers
Graph
Tools
webhdfs SCHEDULED
BATCH BATCH
✓ 16 nodes Hadoop Cluster hive
server
✓ 120,000+ lines/sec
CDH4 Shib ShibUI
✓ 400Mbps at peak Huahin
✓ 1.5+ TB/day (raw) (HDFS, YARN) Manager
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
43. Conclusion
> Fluentd is a widely-used log collector
> There are many use cases
> Many contributors and plugins
> Keep it simple
> Easy to integrate your environment