Masahiro Nakagawa
June 1, 2015
Fluentd meetup 2015 Summer
Fluentd- v0.12 master guide -
#fluentdmeetup
Who are you?
> Masahiro Nakagawa
> github/twitter: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> I love OSS :)
> D language - Phobos committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of Presto Source Code Reading / meetup
> etc…
Structured logging
Reliable forwarding
Pluggable architecture
http://fluentd.org/
What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> Working in production
> http://www.fluentd.org/testimonials
v0.10 (old stable)
> Mainly for log forwarding
> with good performance
> working in production
> with td-agent 1 and td-agent 2.0 / 2.1
> Robust but not good for log processing
Architecture (v0.10)
Buffer Output
Input
> Forward
> HTTP
> File tail
> dstat
> ...
> Forward
> File
> MongoDB
> ...
> File
> Memory
Engine
Output
> rewrite
> ...
Pluggable Pluggable
v0.12 (current stable)
> v1 configuration by default
> Event handling improvement
> Filter, Label, Error Stream
> At-least-once semantics in forwarding
> Add require_ack_response parameter
> HTTP RPC based management
> Latest release is v0.12.11
Architecture (v0.12 or later)
EngineInput
Filter Output
Buffer
> grep
> record_transfomer
> …
> Forward
> File tail
> ...
> Forward
> File
> ...
Output
> File
> Memory
not pluggable
FormatterParser
v1 configuration
> hash, array and enum types are added
> hash and array are json
> Embed Ruby code using "#{}",
> easy to set variable values: "#{ENV['KEY']}"
> Add :secret option to mask parameters
> “@“ prefix for built-in parameters
> @type, @id and @log_level
New v1 formats
> Easy to write complex values
> No trick or additional work for common cases











<source>
@type my_tail
keys ["k1", "k2", "k3"]
</source>
<match **>
@typo my_filter
add_keys {"k1" : "v1"}
</match>
<filter **>
@type my_filter
env "#{ENV['KEY']}"
</filter>
Hash,Array, etc: Embedded Ruby code:
• Socket.gethostname
• `command`
• etc...
:secret option
> For masking sensitive parameters
> In fluentd logs and in_monitor_agent















2015-05-29 19:50:10 +0900 [info]: using configuration file: <ROOT>
<source>
@type forward
</source>
<match http.**>
@type test
sensitive_param xxxxxx
</match>
<ROOT>
config_param :sensitive_param, :string, :secret => true
> Apply filtering routine to event stream
> No more tag tricks and can’t modify tag











<match access.**>
type record_reformer
tag reformed.${tag}
</match>
<match reformed.**>
type growthforecast
</match>
<filter access.**>
@type record_transformer
…
</filter>
v0.10: v0.12:
<match access.**>
@type growthforecast
</match>
Filter
Processing pipeline comparison
Output
Engine
Filter
Output
Output
1 transaction 1 transaction
1 transaction
v0.10
v0.12
> Mutate events
> http://docs.fluentd.org/articles/
filter_record_transformer











<filter event.**>
@type record_transformer
<record>
hostname "#{Socket.gethostname}"
</record>
</filter>
<match event.**>
@type mongodb
</match>
Filter: record_transformer
> Grep event streams
> http://docs.fluentd.org/articles/filter_grep











<filter event.**>
@type grep
regexp1 1 message cool
regexp2 hostname ^webd+.example.com$
exclude1 message uncool
</filter>
<match event.**>
@type mongodb
</match>
Filter: grep
> Print events to stdout
> No need copy and stdout plugins combo!
> http://docs.fluentd.org/articles/filter_stdout











<filter event.**>
@type stdout
</filter>
<match event.**>
@type mongodb
</match>
Filter: stdout
> Override filter method













module Fluent::AddTagFilter < Filter
# Same as other plugins, initialize, configure, start, shudown
# Define configurations by config_param utilities
def filter(tag, time, record)
# Process record
record["tag"] = tag
# Return processed record,
# If return nil, that records are ignored
record
end
end
Filter: Plugin development 1
> Override filter_stream method













module Fluent::AddTagFilter < Filter
def filter_stream(tag, es)
new_es = MultiEventStream.new
es.each { |time, record|
begin
record["tag"] = tag
new_es.add(time, record)
rescue => e
router.emit_error_event(tag, time, record, e)
end
}
new_es
end
end
Filter: Plugin development 2
> Internal event routing
> Redirect events to another group
> much easier to group and share plugins











<source>
@type forward
</source>
<match app1.**>
@type s3
</match>
…
<source>
@type forward
@label @APP1
</source>
<label @APP1>
<match access.**>
@type s3
</match>
</label>
v0.10: v0.12:
Label
> Use router.emit instead of Engine.emit
> Engine#emit API is deprecated











tag = ""
time = Engine.now
record = {…}
Engine.emit(tag, time, record)
v0.10: v0.12:
tag = ""
time = Engine.now
record = {…}
router.emit(tag, time, record)
Label : Need to update plugin
> Redirect events to another label













<source>
@type forward
@label @RAW
</source>
Label: relabel output
<label @RAW>
<match **>
@type copy
<store>
@type flowcounter
</store>
<store>
@type relabel
@label @MAIN
</store>
</match>
</label>
<label @MAIN>
<match access.**>
@type s3
</match>
</label>
Error stream with Label
> Can handle an error at each record level
> router.emit_error_event(tag, time, record, error)












 ERROR!
{"event":1, ...}
{"event":2, ...}
{"event":3, ...}
chunk1
{"event":4, ...}
{"event":5, ...}
{"event":6, ...}
chunk2
…
Input
OK
ERROR!
OK
OK
OK
Output
<label @ERROR>
<match **>
type file
...
</match>
</label>
Error stream
Built-in @ERROR is used
when error occurred in “emit”
Support at-least-once semantics
> Delivery guarantees in failure scenarios
> At-most-once: messages may be lost
> At-least-once: messages may be duplicated
> Exactly-once: No lost and duplication
> Fluentd supports at-most-once in v0.10
> Fluentd supports at-least-once since v.12!
> set require_ack_response parameter
At-most-once and At-least-once
<match app.**>
@type forward
require_ack_response
</match>
may be duplicated
Error!
<match app.**>
@type forward
</match>
may be lost
Error!
× ×
HTTP RPC based management
> Use HTTP/JSON API instead of signals
> For Windows and JRuby support
> RPC is based on HTTP RPC style, not REST
> See https://api.slack.com/web#basics
> Enabled by rpc_endpoint in <system>
> Have a plan to add more APIs
> stop input plugins, check plugins and etc
Supported RPCs
> /api/processes.interruptWorkers
> /api/processes.killWorkers
> Same as SIGINT and SIGTERM
> /api/plugins.flushBuffers
> Same as SIGUSR1
> /api/config.reload
> Same as SIGHUP
RPC example
> Configuration







> Curl



<system>
rpc_endpoint 127.0.0.1:24444
</system>
$ curl http://127.0.0.1:24444/api/plugins.flushBuffers
{"ok":true}
Ecosystem
Almost ecosystems are v0.12 based
> Treasure Agent
> v2.2 is shipped with v0.12
> docs.fluentd.org are now v0.12
> You can see v0.10 document via v0.10 prefix
> http://docs.fluentd.org/v0.10/articles/quickstart
> If your used plugins don’t use v0.12 feature,

please contribute it!
Roadmap
> v0.10 (old stable)
> v0.12 (current stable) <- Now!
> Filter / Label / At-least-once / HTTP RPC
> v0.14 (summer, 2015)
> New plugin APIs, ServerEngine, Time…
> v1 (fall/winter, 2015)
> Fix new features / APIs
https://github.com/fluent/fluentd/wiki/V1-Roadmap
https://jobs.lever.co/treasure-data
Cloud service for the entire data
pipeline. We’re hiring!

Fluentd v0.12 master guide

  • 1.
    Masahiro Nakagawa June 1,2015 Fluentd meetup 2015 Summer Fluentd- v0.12 master guide - #fluentdmeetup
  • 2.
    Who are you? >Masahiro Nakagawa > github/twitter: @repeatedly > Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer > I love OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of Presto Source Code Reading / meetup > etc…
  • 3.
    Structured logging Reliable forwarding Pluggablearchitecture http://fluentd.org/
  • 4.
    What’s Fluentd? > Datacollector for unified logging layer > Streaming data transfer based on JSON > Written in Ruby > Gem based various plugins > http://www.fluentd.org/plugins > Working in production > http://www.fluentd.org/testimonials
  • 5.
    v0.10 (old stable) >Mainly for log forwarding > with good performance > working in production > with td-agent 1 and td-agent 2.0 / 2.1 > Robust but not good for log processing
  • 6.
    Architecture (v0.10) Buffer Output Input >Forward > HTTP > File tail > dstat > ... > Forward > File > MongoDB > ... > File > Memory Engine Output > rewrite > ... Pluggable Pluggable
  • 7.
    v0.12 (current stable) >v1 configuration by default > Event handling improvement > Filter, Label, Error Stream > At-least-once semantics in forwarding > Add require_ack_response parameter > HTTP RPC based management > Latest release is v0.12.11
  • 8.
    Architecture (v0.12 orlater) EngineInput Filter Output Buffer > grep > record_transfomer > … > Forward > File tail > ... > Forward > File > ... Output > File > Memory not pluggable FormatterParser
  • 9.
    v1 configuration > hash,array and enum types are added > hash and array are json > Embed Ruby code using "#{}", > easy to set variable values: "#{ENV['KEY']}" > Add :secret option to mask parameters > “@“ prefix for built-in parameters > @type, @id and @log_level
  • 10.
    New v1 formats >Easy to write complex values > No trick or additional work for common cases
 
 
 
 
 
 <source> @type my_tail keys ["k1", "k2", "k3"] </source> <match **> @typo my_filter add_keys {"k1" : "v1"} </match> <filter **> @type my_filter env "#{ENV['KEY']}" </filter> Hash,Array, etc: Embedded Ruby code: • Socket.gethostname • `command` • etc...
  • 11.
    :secret option > Formasking sensitive parameters > In fluentd logs and in_monitor_agent
 
 
 
 
 
 
 
 2015-05-29 19:50:10 +0900 [info]: using configuration file: <ROOT> <source> @type forward </source> <match http.**> @type test sensitive_param xxxxxx </match> <ROOT> config_param :sensitive_param, :string, :secret => true
  • 12.
    > Apply filteringroutine to event stream > No more tag tricks and can’t modify tag
 
 
 
 
 
 <match access.**> type record_reformer tag reformed.${tag} </match> <match reformed.**> type growthforecast </match> <filter access.**> @type record_transformer … </filter> v0.10: v0.12: <match access.**> @type growthforecast </match> Filter
  • 13.
    Processing pipeline comparison Output Engine Filter Output Output 1transaction 1 transaction 1 transaction v0.10 v0.12
  • 14.
    > Mutate events >http://docs.fluentd.org/articles/ filter_record_transformer
 
 
 
 
 
 <filter event.**> @type record_transformer <record> hostname "#{Socket.gethostname}" </record> </filter> <match event.**> @type mongodb </match> Filter: record_transformer
  • 15.
    > Grep eventstreams > http://docs.fluentd.org/articles/filter_grep
 
 
 
 
 
 <filter event.**> @type grep regexp1 1 message cool regexp2 hostname ^webd+.example.com$ exclude1 message uncool </filter> <match event.**> @type mongodb </match> Filter: grep
  • 16.
    > Print eventsto stdout > No need copy and stdout plugins combo! > http://docs.fluentd.org/articles/filter_stdout
 
 
 
 
 
 <filter event.**> @type stdout </filter> <match event.**> @type mongodb </match> Filter: stdout
  • 17.
    > Override filtermethod
 
 
 
 
 
 
 module Fluent::AddTagFilter < Filter # Same as other plugins, initialize, configure, start, shudown # Define configurations by config_param utilities def filter(tag, time, record) # Process record record["tag"] = tag # Return processed record, # If return nil, that records are ignored record end end Filter: Plugin development 1
  • 18.
    > Override filter_streammethod
 
 
 
 
 
 
 module Fluent::AddTagFilter < Filter def filter_stream(tag, es) new_es = MultiEventStream.new es.each { |time, record| begin record["tag"] = tag new_es.add(time, record) rescue => e router.emit_error_event(tag, time, record, e) end } new_es end end Filter: Plugin development 2
  • 19.
    > Internal eventrouting > Redirect events to another group > much easier to group and share plugins
 
 
 
 
 
 <source> @type forward </source> <match app1.**> @type s3 </match> … <source> @type forward @label @APP1 </source> <label @APP1> <match access.**> @type s3 </match> </label> v0.10: v0.12: Label
  • 20.
    > Use router.emitinstead of Engine.emit > Engine#emit API is deprecated
 
 
 
 
 
 tag = "" time = Engine.now record = {…} Engine.emit(tag, time, record) v0.10: v0.12: tag = "" time = Engine.now record = {…} router.emit(tag, time, record) Label : Need to update plugin
  • 21.
    > Redirect eventsto another label
 
 
 
 
 
 
 <source> @type forward @label @RAW </source> Label: relabel output <label @RAW> <match **> @type copy <store> @type flowcounter </store> <store> @type relabel @label @MAIN </store> </match> </label> <label @MAIN> <match access.**> @type s3 </match> </label>
  • 22.
    Error stream withLabel > Can handle an error at each record level > router.emit_error_event(tag, time, record, error)
 
 
 
 
 
 
 ERROR! {"event":1, ...} {"event":2, ...} {"event":3, ...} chunk1 {"event":4, ...} {"event":5, ...} {"event":6, ...} chunk2 … Input OK ERROR! OK OK OK Output <label @ERROR> <match **> type file ... </match> </label> Error stream Built-in @ERROR is used when error occurred in “emit”
  • 23.
    Support at-least-once semantics >Delivery guarantees in failure scenarios > At-most-once: messages may be lost > At-least-once: messages may be duplicated > Exactly-once: No lost and duplication > Fluentd supports at-most-once in v0.10 > Fluentd supports at-least-once since v.12! > set require_ack_response parameter
  • 24.
    At-most-once and At-least-once <matchapp.**> @type forward require_ack_response </match> may be duplicated Error! <match app.**> @type forward </match> may be lost Error! × ×
  • 25.
    HTTP RPC basedmanagement > Use HTTP/JSON API instead of signals > For Windows and JRuby support > RPC is based on HTTP RPC style, not REST > See https://api.slack.com/web#basics > Enabled by rpc_endpoint in <system> > Have a plan to add more APIs > stop input plugins, check plugins and etc
  • 26.
    Supported RPCs > /api/processes.interruptWorkers >/api/processes.killWorkers > Same as SIGINT and SIGTERM > /api/plugins.flushBuffers > Same as SIGUSR1 > /api/config.reload > Same as SIGHUP
  • 27.
    RPC example > Configuration
 
 
 
 >Curl
 
 <system> rpc_endpoint 127.0.0.1:24444 </system> $ curl http://127.0.0.1:24444/api/plugins.flushBuffers {"ok":true}
  • 28.
  • 29.
    Almost ecosystems arev0.12 based > Treasure Agent > v2.2 is shipped with v0.12 > docs.fluentd.org are now v0.12 > You can see v0.10 document via v0.10 prefix > http://docs.fluentd.org/v0.10/articles/quickstart > If your used plugins don’t use v0.12 feature,
 please contribute it!
  • 30.
    Roadmap > v0.10 (oldstable) > v0.12 (current stable) <- Now! > Filter / Label / At-least-once / HTTP RPC > v0.14 (summer, 2015) > New plugin APIs, ServerEngine, Time… > v1 (fall/winter, 2015) > Fix new features / APIs https://github.com/fluent/fluentd/wiki/V1-Roadmap
  • 31.
    https://jobs.lever.co/treasure-data Cloud service forthe entire data pipeline. We’re hiring!