Fluentd Introduction
at iPROS
Masahiro Nakagawa
Treasuare Data, Inc.
Senior Software Engineer

Thursday, October 31, 13
Who are you?
>
●

Masahiro Nakagawa
>

>
●

Treasure Data, Inc
>

>
●

@repeatedly / masa@treasure-data.com
Senior Software Engineer, since 2012/11

Open Source Projects
>

D programming Language

>

MessagePack, Fluentd, etc...

Thursday, October 31, 13
Structured logging
Reliable forwarding

http://fluentd.org/
Thursday, October 31, 13

Pluggable architecture
Agenda

>

Background

>

Overview

>

Product Comparison

>

Use cases

Thursday, October 31, 13
Background

Thursday, October 31, 13
Data Processing
Data source
Collect

Reporting
Monitoring
Thursday, October 31, 13

Store

Process

Visualize
Related Products
easier & shorter time

Collect

???

Thursday, October 31, 13

Store Process

Cloudera
Horton Works
Treasure Data

Visualize

Excel
Tableau
R
Thursday, October 31, 13
Before Fluentd
Server1

Server2

Server3

Application

Application

Application

・・・

・・・

・・・

High Latency!
Log Server
Fluent
Thursday, October 31, 13

must wait for a day...
After Fluentd
Server1

Server2

Server3

Application

Application

Application

Fluentd

Fluentd

Fluentd

・・・

・・・

・・・

In streaming!
Fluentd
Thursday, October 31, 13

Fluentd
Overview

Thursday, October 31, 13
In short

>

Open sourced log collector written in Ruby

>

Using rubygems ecosystem for plugins

It’s like syslogd, but
uses JSON for log messages
Thursday, October 31, 13
Example (apache to mongo)
2013-10-30 01:33:51
apache.log

Web Server

{
"host": "127.0.0.1",
"method": "GET",
...

tail

127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1

-

-

[30/Oct/2013:07:26:27]
[30/Oct/2013:07:26:30]
[30/Oct/2013:07:26:32]
[30/Oct/2013:07:26:40]
[30/Oct/2013:07:27:01]
...

Thursday, October 31, 13

"GET
"GET
"GET
"GET
"GET

/
/
/
/
/

...
...
...
...
...

}

Fluentd

event
buffering

insert
Event structure(log message)
✓ Time
>

default second unit

>

from data source or
adding parsed time

✓ Tag
>

for message routing

Thursday, October 31, 13

✓ Record
>

JSON format
>

MessagePack
internally

>

non-unstructured
Pluggable Architecture
Pluggable

Pluggable

Output
Input

> rewrite
> ...

Engine
Buffer
> Forward
> HTTP
> File tail
> dstat
> ...

Thursday, October 31, 13

> File
> Memory

Output
> Forward
> File
> MongoDB
> ...
Client libraries
> Ruby
> Java
> Perl
> PHP
> Python
>D
> Scala
> ...

Application
Time:Tag:Record

Fluentd

# Ruby
Fluent.open(“myapp”)
Fluent.event(“login”, {“user” => 38})
#=> 2013-10-30 18:56:01 myapp.login
Thursday, October 31, 13

{“user”:38}
Configuration and operation

●
>

No central / master node
>

●
>

HTTP include helps conf sharing

Operation depends on your environment
>
>

●
>

Use your deamon management
Use Chef in Treasure Data

Apache like syntax and Ruby DSL

Thursday, October 31, 13
# receive events via HTTP
<source>
type http
port 8888
</source>

# save alerts to a file
<match alert.**>
type file
path /var/log/fluent/alerts
</match>

# read logs from a file
<source>
type tail
path /var/log/httpd.log
format apache
tag apache.access
</source>

# forward other logs to servers
<match **>
type forward
<server>
host 192.168.0.11
weight 20
</server>
<server>
host 192.168.0.12
weight 60
</server>
</match>

# save access logs to MongoDB
<match apache.access>
type mongo
database apache
collection log
</match>
Thursday, October 31, 13

include http://example.com/conf
Reliability (core + plugin)
>
●

Buffering
>

Use file buffer for persistent data

>

buffer chunk has ID for idempotent

>
●

Retrying

>
●

Error handling
>

transaction, failover, etc on forward plugin

>

secondary

Thursday, October 31, 13
Plugins - use rubygems

$ fluent-gem search -rd fluent-plugin

$ fluent-gem search -rd fluent-mixin

$ fluent-gem install fluent-plugin-mongo

Thursday, October 31, 13
http://fluentd.org/plugin/
Thursday, October 31, 13
in_tail
Fluentd

Apache

access.log
Supported format:
>

apache

>

json

>

apache2

>

csv

>

syslog

>

tsv

>

nginx

>

ltsv

Thursday, October 31, 13

✓ read a log file
✓ custom regexp
✓ custom parser in Ruby
out_mongo
Apache

access.log

Fluentd

buffer

✓ retry automatically
✓ exponential retry wait
✓ persistent on a file

Thursday, October 31, 13
out_webhdfs
Apache

✓ custom text formatter

Fluentd

access.log

buffer

✓ slice files based on time
2013-01-01/01/access.log.gz
2013-01-01/02/access.log.gz
2013-01-01/03/access.log.gz
...
Thursday, October 31, 13

HDFS
✓ retry automatically
✓ exponential retry wait
✓ persistent on a file
out_copy + other plugins
Hadoop

Apache

access.log

Fluentd

buffer

Amazon S3
✓ routing based on tags
✓ copy to multiple storages

Thursday, October 31, 13
out_forward

✓ automatic fail-over
✓ load balancing

Fluentd
apache
Apache

Fluentd

Fluentd
Fluentd

access.log

buffer

✓ retry automatically
✓ exponential retry wait
✓ persistent on a file

Thursday, October 31, 13
Forward topology
Fluentd
Fluentd

send/ack

Fluentd

send/ack

Fluentd
Fluentd
Fluentd

Thursday, October 31, 13

Fluentd
Other plugins
>
●

Filter, Aggregator, Converter
> rewrite-tag-filter, sampling-filter, ...
> *-counter, *-monitor, ...
> record-modifier, flatten, map, typecast, ...

>
●

See @tagomoris’s slide
> http://www.slideshare.net/tagomoris/fluentdmeetupfukuoka201303

Thursday, October 31, 13
Access logs
Apache

Alerting
Nagios

App logs
Frontend
Backend

Analysis
MongoDB
MySQL
Hadoop

System logs
syslogd
Databases
Thursday, October 31, 13

filter / buffer / routing

Archiving
Amazon S3
Other status
>
●

Localizing docs into Japanese
>

>
●

https://github.com/fluent/fluentd-docs/tree/
master/docs/ja

Windows support
>

Started by JBAT
https://github.com/fluent/fluentd/tree/windows

>

Thursday, October 31, 13

Feedback and patch are welcome!
v11
>
●

Spec is not fixed yet

>
●

Breaking source code compatibility

>
●

Several improvments
>
>

>
●

routing label, filter, error stream, etc.
serverengine based: multi-process, signal, etc.

http://magazine.rubyist.net/?0044FluentdV11NewFeatures

Thursday, October 31, 13
td-agent
>
●

Open sourced distribution package of Fluentd
>
>

>
●

ETL part of Treasure Data
deb, rpm, homebrew

Including useful components
>
>

>
●

ruby, jemalloc, fluentd
3rd party gems: td, mongo, webhdfs, etc...

http://packages.treasure-data.com/

Thursday, October 31, 13
Product Comparison

Thursday, October 31, 13
Flume
Flume: distributed log collector by Cloudera
Phisical
Topology

Flume Master

Flume

Logical
Topology

Thursday, October 31, 13

Flume

Flume

Hadoop
HDFS
Network topology
Master

ack

Agent

Flume OG

Agent
Agent

Collector
Collector
Collector

Agent
Master
Agent
Agent

Flume NG

Agent
Agent

Thursday, October 31, 13

Collector

send
Option

send/ack
Collector

Collector
Pros and Cons
>
●

Pros
>

>
●

Using central master to manage all nodes

Cons
>

Java culture (Pros for Java-er?)
Difficult configuration and setup

>

Difficult topology

>

Mainly for Hadoop
less plugins?

Thursday, October 31, 13
Logstash
http://logstash.net/

Thursday, October 31, 13
Pros and Cons
>
●

Pros
>
>

Built-in ElasticSearch and Kibana

>
>
●

Bundled 140 plugins (input/filter/codec/output)
Works on Windows but unstable...

Cons
>

mainly for JRuby

>

Need external daemon for centralized env
Redis, RabbitMQ or etc

Thursday, October 31, 13
Use cases

Thursday, October 31, 13
Treasure Data
Worker

Frontend

Hadoop

Job Queue

Hadoop
Applications push
metrics to Fluentd
(via local Fluentd)

Treasure
Data

for historical analysis

Thursday, October 31, 13

Fluentd

Fluentd

sums up data minutes
(partial aggregation)

Librato
Metrics

for realtime analysis
Cookpad
hundreds of app servers

Rails app

td-agent
sends event logs

Rails app

td-agent

Daily/Hourly
Batch

Treasure Data

sends event logs

Rails app

MySQL

td-agent
sends event logs

Unlimited scalability
Flexible schema
Realtime
Less performance impact

Thursday, October 31, 13

Google
Spreadsheet

Logs are available
after several mins.

Feedback rankings

KPI
visualization

✓ Over 100 RoR servers (2012/2/4)
LINE
Web
Servers

Archive
Storage
(scribed)

Fluentd
Cluster
STREAM

Fluentd
Watchers

webhdfs

✓ 16 nodes
✓ 120,000+ lines/sec
✓ 400Mbps at peak
✓ 1.5+ TB/day (raw)

Hadoop Cluster
CDH4
(HDFS, YARN)

Notifications
(IRC)

hive
server
Huahin
Manager

BATCH

Graph
Tools
SCHEDULED

BATCH

Shib

ShibUI

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
Thursday, October 31, 13
Other use-cases
>
●

Scaleout by @choplin
> データサイエンティスト養成読本
>

http://gihyo.jp/book/2013/978-4-7741-5896-9

>
●

Smartnews
> http://developer.smartnews.be/blog/tag/
fluentd/

>
●

ニンテンドー3DS すれちがい通信
>

Thursday, October 31, 13

http://www.nintendo.co.jp/3ds/interview/
streetpass_relay/vol1/index4.html
Other companies

Thursday, October 31, 13
Conclusion

>
●

Fluentd is now a widely-used project
>
>

>
●

There are many use cases
Many contributors and plugins

Keep it simple
>

Thursday, October 31, 13

Easy to use and integrate your environment
support@treasure-data.com
Thursday, October 31, 13
support@treasure-data.com
Thursday, October 31, 13

Fluentd introduction at ipros