Fluentd v0.12 master guide

Masahiro Nakagawa
June 1, 2015
Fluentd meetup 2015 Summer
Fluentd- v0.12 master guide -
#ﬂuentdmeetup

Who are you?
> Masahiro Nakagawa
> github/twitter: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> I love OSS :)
> D language - Phobos committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of Presto Source Code Reading / meetup
> etc…

Structured logging
Reliable forwarding
Pluggable architecture
http://ﬂuentd.org/

What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> Working in production
> http://www.fluentd.org/testimonials

v0.10 (old stable)
> Mainly for log forwarding
> with good performance
> working in production
> with td-agent 1 and td-agent 2.0 / 2.1
> Robust but not good for log processing

Architecture (v0.10)
Buffer Output
Input
> Forward
> HTTP
> File tail
> dstat
> ...
> Forward
> File
> MongoDB
> ...
> File
> Memory
Engine
Output
> rewrite
> ...
Pluggable Pluggable

v0.12 (current stable)
> v1 conﬁguration by default
> Event handling improvement
> Filter, Label, Error Stream
> At-least-once semantics in forwarding
> Add require_ack_response parameter
> HTTP RPC based management
> Latest release is v0.12.11

Architecture (v0.12 or later)
EngineInput
Filter Output
Buffer
> grep
> record_transfomer
> …
> Forward
> File tail
> ...
> Forward
> File
> ...
Output
> File
> Memory
not pluggable
FormatterParser

v1 conﬁguration
> hash, array and enum types are added
> hash and array are json
> Embed Ruby code using "#{}",
> easy to set variable values: "#{ENV['KEY']}"
> Add :secret option to mask parameters
> “@“ preﬁx for built-in parameters
> @type, @id and @log_level

New v1 formats
> Easy to write complex values
> No trick or additional work for common cases 
 
 
 
 
 
<source>
@type my_tail
keys ["k1", "k2", "k3"]
</source>
<match **>
@typo my_filter
add_keys {"k1" : "v1"}
</match>
<filter **>
@type my_filter
env "#{ENV['KEY']}"
</filter>
Hash,Array, etc: Embedded Ruby code:
• Socket.gethostname
• `command`
• etc...

:secret option
> For masking sensitive parameters
> In fluentd logs and in_monitor_agent 
 
 
 
 
 
 
 
2015-05-29 19:50:10 +0900 [info]: using configuration file: <ROOT>
<source>
@type forward
</source>
<match http.**>
@type test
sensitive_param xxxxxx
</match>
<ROOT>
config_param :sensitive_param, :string, :secret => true

> Apply filtering routine to event stream
> No more tag tricks and can’t modify tag 
 
 
 
 
 
<match access.**>
type record_reformer
tag reformed.${tag}
</match>
<match reformed.**>
type growthforecast
</match>
<filter access.**>
@type record_transformer
…
</filter>
v0.10: v0.12:
<match access.**>
@type growthforecast
</match>
Filter

Processing pipeline comparison
Output
Engine
Filter
Output
Output
1 transaction 1 transaction
1 transaction
v0.10
v0.12

> Mutate events
> http://docs.fluentd.org/articles/
filter_record_transformer 
 
 
 
 
 
<filter event.**>
@type record_transformer
<record>
hostname "#{Socket.gethostname}"
</record>
</filter>
<match event.**>
@type mongodb
</match>
Filter: record_transformer

> Grep event streams
> http://docs.fluentd.org/articles/filter_grep 
 
 
 
 
 
<filter event.**>
@type grep
regexp1 1 message cool
regexp2 hostname ^webd+.example.com$
exclude1 message uncool
</filter>
<match event.**>
@type mongodb
</match>
Filter: grep

> Print events to stdout
> No need copy and stdout plugins combo!
> http://docs.fluentd.org/articles/filter_stdout 
 
 
 
 
 
<filter event.**>
@type stdout
</filter>
<match event.**>
@type mongodb
</match>
Filter: stdout

> Override filter method 
 
 
 
 
 
 
module Fluent::AddTagFilter < Filter
# Same as other plugins, initialize, configure, start, shudown
# Define configurations by config_param utilities
def filter(tag, time, record)
# Process record
record["tag"] = tag
# Return processed record,
# If return nil, that records are ignored
record
end
end
Filter: Plugin development 1

> Override ﬁlter_stream method 
 
 
 
 
 
 
module Fluent::AddTagFilter < Filter
def ﬁlter_stream(tag, es)
new_es = MultiEventStream.new
es.each { |time, record|
begin
record["tag"] = tag
new_es.add(time, record)
rescue => e
router.emit_error_event(tag, time, record, e)
end
}
new_es
end
end
Filter: Plugin development 2

> Internal event routing
> Redirect events to another group
> much easier to group and share plugins 
 
 
 
 
 
<source>
@type forward
</source>
<match app1.**>
@type s3
</match>
…
<source>
@type forward
@label @APP1
</source>
<label @APP1>
<match access.**>
@type s3
</match>
</label>
v0.10: v0.12:
Label

> Use router.emit instead of Engine.emit
> Engine#emit API is deprecated 
 
 
 
 
 
tag = ""
time = Engine.now
record = {…}
Engine.emit(tag, time, record)
v0.10: v0.12:
tag = ""
time = Engine.now
record = {…}
router.emit(tag, time, record)
Label : Need to update plugin

> Redirect events to another label 
 
 
 
 
 
 
<source>
@type forward
@label @RAW
</source>
Label: relabel output
<label @RAW>
<match **>
@type copy
<store>
@type ﬂowcounter
</store>
<store>
@type relabel
@label @MAIN
</store>
</match>
</label>
<label @MAIN>
<match access.**>
@type s3
</match>
</label>

Error stream with Label
> Can handle an error at each record level
> router.emit_error_event(tag, time, record, error) 
 
 
 
 
 
  ERROR!
{"event":1, ...}
{"event":2, ...}
{"event":3, ...}
chunk1
{"event":4, ...}
{"event":5, ...}
{"event":6, ...}
chunk2
…
Input
OK
ERROR!
OK
OK
OK
Output
<label @ERROR>
<match **>
type ﬁle
...
</match>
</label>
Error stream
Built-in @ERROR is used
when error occurred in “emit”

Support at-least-once semantics
> Delivery guarantees in failure scenarios
> At-most-once: messages may be lost
> At-least-once: messages may be duplicated
> Exactly-once: No lost and duplication
> Fluentd supports at-most-once in v0.10
> Fluentd supports at-least-once since v.12!
> set require_ack_response parameter

At-most-once and At-least-once
<match app.**>
@type forward
require_ack_response
</match>
may be duplicated
Error!
<match app.**>
@type forward
</match>
may be lost
Error!
× ×

HTTP RPC based management
> Use HTTP/JSON API instead of signals
> For Windows and JRuby support
> RPC is based on HTTP RPC style, not REST
> See https://api.slack.com/web#basics
> Enabled by rpc_endpoint in <system>
> Have a plan to add more APIs
> stop input plugins, check plugins and etc

Supported RPCs
> /api/processes.interruptWorkers
> /api/processes.killWorkers
> Same as SIGINT and SIGTERM
> /api/plugins.ﬂushBuffers
> Same as SIGUSR1
> /api/conﬁg.reload
> Same as SIGHUP

RPC example
> Conﬁguration 
 
 
 
> Curl 
 
<system>
rpc_endpoint 127.0.0.1:24444
</system>
$ curl http://127.0.0.1:24444/api/plugins.ﬂushBuffers
{"ok":true}

Almost ecosystems are v0.12 based
> Treasure Agent
> v2.2 is shipped with v0.12
> docs.fluentd.org are now v0.12
> You can see v0.10 document via v0.10 prefix
> http://docs.fluentd.org/v0.10/articles/quickstart
> If your used plugins don’t use v0.12 feature, 
please contribute it!

Roadmap
> v0.10 (old stable)
> v0.12 (current stable) <- Now!
> Filter / Label / At-least-once / HTTP RPC
> v0.14 (summer, 2015)
> New plugin APIs, ServerEngine, Time…
> v1 (fall/winter, 2015)
> Fix new features / APIs
https://github.com/ﬂuent/ﬂuentd/wiki/V1-Roadmap

https://jobs.lever.co/treasure-data
Cloud service for the entire data
pipeline. We’re hiring!

Fluentd v0.12 master guide

More Related Content

What's hot

Viewers also liked

Similar to Fluentd v0.12 master guide

More from N Masahiro

Recently uploaded

Fluentd v0.12 master guide