Fluentd introduction at ipros

Fluentd Introduction
at iPROS
Masahiro Nakagawa
Treasuare Data, Inc.
Senior Software Engineer

Thursday, October 31, 13

Who are you?
>
●

Masahiro Nakagawa
>

>
●

Treasure Data, Inc
>

>
●

@repeatedly / masa@treasure-data.com
Senior Software Engineer, since 2012/11

Open Source Projects
>

D programming Language

>

MessagePack, Fluentd, etc...


Structured logging
Reliable forwarding

http://ﬂuentd.org/

Pluggable architecture

Agenda

>

Background

>

Overview

>

Product Comparison

>

Use cases


Background


Data Processing
Data source
Collect

Reporting
Monitoring

Store

Process

Visualize

Related Products
easier & shorter time

Collect

???


Store Process

Cloudera
Horton Works
Treasure Data

Visualize

Excel
Tableau
R

Before Fluentd
Server1

Server2

Server3

Application

Application

Application

･･･

･･･

･･･

High Latency!
Log Server
Fluent

must wait for a day...

After Fluentd
Server1

Server2

Server3

Application

Application

Application

Fluentd

Fluentd

Fluentd

･･･

･･･

･･･

In streaming!
Fluentd

Fluentd

Overview


In short

>

Open sourced log collector written in Ruby

>

Using rubygems ecosystem for plugins

It’s like syslogd, but
uses JSON for log messages

Example (apache to mongo)
2013-10-30 01:33:51
apache.log

Web Server

{
"host": "127.0.0.1",
"method": "GET",
...

tail

127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1

-

-

[30/Oct/2013:07:26:27]
[30/Oct/2013:07:26:30]
[30/Oct/2013:07:26:32]
[30/Oct/2013:07:26:40]
[30/Oct/2013:07:27:01]
...


"GET
"GET
"GET
"GET
"GET

/
/
/
/
/

...
...
...
...
...

}

Fluentd

event
buffering

insert

Event structure(log message)
✓ Time
>

default second unit

>

from data source or
adding parsed time

✓ Tag
>

for message routing


✓ Record
>

JSON format
>

MessagePack
internally

>

non-unstructured

Pluggable Architecture
Pluggable

Pluggable

Output
Input

> rewrite
> ...

Engine
Buffer
> Forward
> HTTP
> File tail
> dstat
> ...


> File
> Memory

Output
> Forward
> File
> MongoDB
> ...

Client libraries
> Ruby
> Java
> Perl
> PHP
> Python
>D
> Scala
> ...

Application
Time:Tag:Record

Fluentd

# Ruby
Fluent.open(“myapp”)
Fluent.event(“login”, {“user” => 38})
#=> 2013-10-30 18:56:01 myapp.login

{“user”:38}

Conﬁguration and operation

●
>

No central / master node
>

●
>

HTTP include helps conf sharing

Operation depends on your environment
>
>

●
>

Use your deamon management
Use Chef in Treasure Data

Apache like syntax and Ruby DSL


# receive events via HTTP
<source>
type http
port 8888
</source>

# save alerts to a file
<match alert.**>
type file
path /var/log/fluent/alerts
</match>

# read logs from a file
<source>
type tail
path /var/log/httpd.log
format apache
tag apache.access
</source>

# forward other logs to servers
<match **>
type forward
<server>
host 192.168.0.11
weight 20
</server>
<server>
host 192.168.0.12
weight 60
</server>
</match>

# save access logs to MongoDB
<match apache.access>
type mongo
database apache
collection log
</match>

include http://example.com/conf

Reliability (core + plugin)
>
●

Buffering
>

Use file buffer for persistent data

>

buffer chunk has ID for idempotent

>
●

Retrying

>
●

Error handling
>

transaction, failover, etc on forward plugin

>

secondary


Plugins - use rubygems

$ fluent-gem search -rd fluent-plugin

$ fluent-gem search -rd fluent-mixin

$ fluent-gem install fluent-plugin-mongo


http://fluentd.org/plugin/

in_tail
Fluentd

Apache

access.log
Supported format:
>

apache

>

json

>

apache2

>

csv

>

syslog

>

tsv

>

nginx

>

ltsv


✓ read a log ﬁle
✓ custom regexp
✓ custom parser in Ruby

out_mongo
Apache

access.log

Fluentd

buffer

✓ retry automatically
✓ exponential retry wait
✓ persistent on a ﬁle


out_webhdfs
Apache

✓ custom text formatter

Fluentd

access.log

buffer

✓ slice ﬁles based on time
2013-01-01/01/access.log.gz
...

HDFS

out_copy + other plugins
Hadoop

Apache

access.log

Fluentd

buffer

Amazon S3
✓ routing based on tags
✓ copy to multiple storages


out_forward

✓ automatic fail-over
✓ load balancing

Fluentd
apache
Apache

Fluentd

Fluentd
Fluentd

access.log

buffer



Forward topology
Fluentd
Fluentd

send/ack

Fluentd

send/ack

Fluentd
Fluentd
Fluentd


Fluentd

Other plugins
>
●

Filter, Aggregator, Converter
> rewrite-tag-filter, sampling-filter, ...
> *-counter, *-monitor, ...
> record-modifier, flatten, map, typecast, ...

>
●

See @tagomoris’s slide
> http://www.slideshare.net/tagomoris/fluentdmeetupfukuoka201303


Access logs
Apache

Alerting
Nagios

App logs
Frontend
Backend

Analysis
MongoDB
MySQL
Hadoop

System logs
syslogd
Databases

filter / buffer / routing

Archiving
Amazon S3

Other status
>
●

Localizing docs into Japanese
>

>
●

https://github.com/fluent/fluentd-docs/tree/
master/docs/ja

Windows support
>

Started by JBAT
https://github.com/fluent/fluentd/tree/windows

>


Feedback and patch are welcome!

v11
>
●

Spec is not fixed yet

>
●

Breaking source code compatibility

>
●

Several improvments
>
>

>
●

routing label, filter, error stream, etc.
serverengine based: multi-process, signal, etc.

http://magazine.rubyist.net/?0044FluentdV11NewFeatures


td-agent
>
●

Open sourced distribution package of Fluentd
>
>

>
●

ETL part of Treasure Data
deb, rpm, homebrew

Including useful components
>
>

>
●

ruby, jemalloc, fluentd
3rd party gems: td, mongo, webhdfs, etc...

http://packages.treasure-data.com/


Product Comparison


Flume
Flume: distributed log collector by Cloudera
Phisical
Topology

Flume Master

Flume

Logical
Topology


Flume

Flume

Hadoop
HDFS

Network topology
Master

ack

Agent

Flume OG

Agent
Agent

Collector
Collector
Collector

Agent
Master
Agent
Agent

Flume NG

Agent
Agent


Collector

send
Option

send/ack
Collector

Collector

Pros and Cons
>
●

Pros
>

>
●

Using central master to manage all nodes

Cons
>

Java culture (Pros for Java-er?)
Difficult configuration and setup

>

Difficult topology

>

Mainly for Hadoop
less plugins?


Logstash
http://logstash.net/


Pros and Cons
>
●

Pros
>
>

Built-in ElasticSearch and Kibana

>
>
●

Bundled 140 plugins (input/filter/codec/output)
Works on Windows but unstable...

Cons
>

mainly for JRuby

>

Need external daemon for centralized env
Redis, RabbitMQ or etc


Use cases


Treasure Data
Worker

Frontend

Hadoop

Job Queue

Hadoop
Applications push
metrics to Fluentd
(via local Fluentd)

Treasure
Data

for historical analysis


Fluentd

Fluentd

sums up data minutes
(partial aggregation)

Librato
Metrics

for realtime analysis

Cookpad
hundreds of app servers

Rails app

td-agent
sends event logs

Rails app

td-agent

Daily/Hourly
Batch

Treasure Data

sends event logs

Rails app

MySQL

td-agent
sends event logs

Unlimited scalability
Flexible schema
Realtime
Less performance impact


Google
Spreadsheet

Logs are available
after several mins.

Feedback rankings

KPI
visualization

✓ Over 100 RoR servers (2012/2/4)

LINE
Web
Servers

Archive
Storage
(scribed)

Fluentd
Cluster
STREAM

Fluentd
Watchers

webhdfs

✓ 16 nodes
✓ 120,000+ lines/sec
✓ 400Mbps at peak
✓ 1.5+ TB/day (raw)

Hadoop Cluster
CDH4
(HDFS, YARN)

Notiﬁcations
(IRC)

hive
server
Huahin
Manager

BATCH

Graph
Tools
SCHEDULED

BATCH

Shib

ShibUI

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris

Other use-cases
>
●

Scaleout by @choplin
> データサイエンティスト養成読本
>

http://gihyo.jp/book/2013/978-4-7741-5896-9

>
●

Smartnews
> http://developer.smartnews.be/blog/tag/
fluentd/

>
●

ニンテンドー3DS すれちがい通信
>


http://www.nintendo.co.jp/3ds/interview/
streetpass_relay/vol1/index4.html

Other companies


Conclusion

>
●

Fluentd is now a widely-used project
>
>

>
●

There are many use cases
Many contributors and plugins

Keep it simple
>


Easy to use and integrate your environment

support@treasure-data.com

Fluentd introduction at ipros

More Related Content

What's hot

Viewers also liked

Similar to Fluentd introduction at ipros

More from Treasure Data, Inc.

Recently uploaded

Fluentd introduction at ipros