Fluentd v0.14 Plugin API Details

Fluentd v0.14
Plugin API Details
Fluentd meetup 2016 Summer
Jun 1, 2016
Satoshi "Moris" Tagomori (@tagomoris)

Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.

Topics
• Why Fluentd v0.14 has a new API set for plugins
• Compatibility of v0.12 plugins/conﬁgurations
• Plugin APIs: Input, Filter, Output & Buffer
• Storage Plugin, Plugin Helpers
• New Test Drivers for plugins
• Plans for v0.14.x & v1

Why Fluentd v0.14
has a New API set for plugins?

Fluentd v0.12 Plugins
• No supports to write plugins by Fluentd core
• plugins creates threads, sockets, timers and event loops
• writing tests is very hard and messy with sleeps
• Fragmented implementations
• Output, BufferedOutput, ObjectBufferedOutput and TimeSlicedOutput
• Mixture of configuration parameters from output&buffer
• Uncontrolled plugin instance lifecycle (no "super" in start/shutdown)
• Imperfect buffering control and useless configurations
• the reason why fluent-plugin-forest exists and be used widely

Fluentd v0.12 Plugins
• Insufficient buffer chunking control
• only by size, without number of events in chunks
• Forcedly synchronized buffer flushing
• no way to flush-and-commit chunks asynchronously
• Ultimate freedom for using mix-ins
• everything overrides Plugin#emit ... (the only one entry point for
events to plugins)
• no valid hook points to get metrics or something else
• Bad Ruby coding rules and practices
• too many classes at "Fluent::*" in fluent/plugin, no "require", ...

Compatibility
of v0.12 plugins/conﬁgurations

Compatibility of plugins
• v0.12 plugins are subclass of Fluent::*
• Fluent::Input, Fluent::Filter, Fluent::Output, ...
• Compatibility layers for v0.12 plugins in v0.14
• Fluent::Compat::Klass -> Fluent::Klass (e.g., Input, Output, ...)
• it provides transformation of:
• namespaces, conﬁguration parameters
• internal APIs, argument objects
• IT SHOULD WORK, except for :P
• 3rd party buffer plugin, part of test code
• "Engine.emit"

Compatibility of conﬁgurations
• v0.14 plugins have another set of parameters
• many old-fashioned parameters are removed
• "buffer_type", "num_threads", "timezone", "time_slice_format",
"buffer_chunk_limit", "buffer_queue_limit", ...
• Plugin helper "compat_parameters"
• transform parameters between v0.12 style
conﬁguration and v0.14 plugin
v0.12 v0.14
convert
internally

FAQ:
Can we create plugins like this?
* it uses v0.14 API
* it runs on Fluentd v0.12
Impossible :P

Overview of
v0.14 Plugin classes

v0.14 plugin classes
• All files MUST be in `fluent/plugin/*.rb` (in gems)
• or just a "*.rb" file in directory specified by "-r"
• All classes MUST be under Fluent::Plugin
• All plugins MUST be subclasses of Fluent::Plugin::Base
• All plugins MUST call `super` in methods overriding
default implementation (e.g., #configure, #start, #shutdown, ...)

Classes hierarchy (v0.12)
Fluent::Input F::Filter
F::Output
BufferedOutput
Object
Buffered
Time
Sliced
Multi
Output F::Buffer
F::Parser
F::Formatter
3rd party plugins

F::P::Input F::P::Filter F::P::Output
Fluent::Plugin::Base
F::P::Buffer
F::P::Parser
F::P::Formatter
F::P::Storage
both of
buffered/non-buffered
F::P::
BareOutput
(not for 3rd party
plugins)
F::P::
MultiOutput
copy
roundrobin

Tour of New Plugin APIs:
Fluent::Plugin::Input

Fluent::Plugin::Input
• Nothing changed :)
• except for overall rules
• But it's much easier 
to write plugins 
than v0.12 :)
• fetch HTTP resource per
specified interval
• parse response body
with format specified in
config
• emit parse result

Fluent::Plugin::Filter

Fluent::Plugin::Filter
• Almost nothing changed :)
• Required: 
#filter(tag, time, record) 
#=> record | nil
• Optional: 
#filter_stream(tag, es) 
#=> event_stream

Fluent::Plugin::Output

Fluent::Plugin::Output
• Many things changed!
• Merged Output, BufferedOutput, ObjectBufferedOutput, TimeSlicedOutput
• Output plugins can be
• with buffering
• without buffering
• both (do/doesn't buffering by configuration)
• Buffers chunks events by:
• byte size, interval, tag
• number of records (new!)
• time (by any unit(new!): 30s, 5m, 15m, 3h, ...)
• any specified field in records (new!)
• any combination of above (new!)

Variations of buffering
NO MORE forest plugin!

Output Plugin:
Methods to be implemented
• Non-buffered: #process(tag, es)
• Buffered synchronous: #write(chunk)
• Buffered Asynchronous: #try_write(chunk)
• New feature for destinations with huge latency to write
chunks
• Plugins must call #commit_write(chunk_id) (otherwise,
#try_write will be retried)
• Buffered w/ custom format: #format(tag, time, record)
• Without this method, output uses standard format

implement?
#process
implement?
#process or #write or #try_write
NO error
YES
#prefer_buffered_processing
called (default true)
NO
non-buffered
YES
exists?
<buffer> section
YES implement?
#write or #try_write
error
NO
YES
implement?
#write or
#try_write
NO
NO
YES
false
implement?
#write and
#try_write
YES
#prefer_delayed_commit
called (default true)
implement?
#try_write
sync
buffered
async
buffered

In other words :P
• If users conﬁgure "<buffer>" section
• plugin try to do buffering
• Else if plugin implements both (buffering/non-buf)
• plugin call #prefer_buffer_processing to decide
• Else plugin does as implemented
• When plugin does buffering 
If plugin implements both (sync/async write)
• plugin call #prefer_delayed_commit to decide
• Else plugin does as implemented

Delayed commit (1)
• high latency #write operations locks a ﬂush thread for long time 
(e.g., ACK in forward)
destination w/ high latency
#write
Output Plugin
send data send ACK
return #write
a ﬂush thread locked

Delayed commit (2)
• #try_write & delayed #commit_write
destination w/ high latency
#try_write
Output Plugin
send data
send ACK
return
#try_write
async check thread
#commit_write

Use cases: delayed commit
• Forward protocol w/ ACK
• Distributed ﬁle systems or databases
• put data -> conﬁrm to read data -> commit
• Submit tasks to job queues
• submit a job -> detect executed -> commit

Standard chunk format
• Buffering w/o #format method
• Almost same with ObjectBufferedOutput
• No need to implement #format always
• Implement it for performance/low-latency
• Tool to dump & read buffer chunks on disk w/
standard format
• To be implemented in v0.14.x :)

<buffer CHUNK_KEYS>
• comma-separated tag, time or ANY_KEYS
• Nothing specified: all events are in same chunk
• flushed when chunk is full
• (optional) "flush_interval" after first event in chunk
• tag: events w/ same tag are in same chunks
• time: buffer chunks will be split by timekey
• timekey: unit of time to be chunked (1m, 15m, 3h, ...)
• flushed after expiration of timekey unit + timekey_wait
• ANY_KEYS: any key names in records

• comma-separated tag, time or ANY_KEYS
• Nothing specified: all events are in same chunk
• flushed when chunk is full
• (optional) "flush_interval" after first event in chunk
• tag: events w/ same tag are in same chunks
• time: buffer chunks will be split by timekey
• timekey: unit of time to be chunked (1m, 15m, 3h, ...)
• flushed after expiration of timekey unit + timekey_wait
• ANY_KEYS: any key names in records
<buffer CHUNK_KEYS>
BufferedOutput
TimeSlicedOutput
ObjectBufferedOutput
in v0.12
in v0.12
in v0.12

configurations: 
flushing buffers
• flush_mode: lazy, interval, immediate
• default: lazy if "time" specified, otherwise interval
• flush_interval, flush_thread_count
• flush_thread_count: number of threads for flushing
• delayed_commit_timeout
• output plugin will retry #try_write when expires

Retries, Secondary
• Explicit timeout for retries:
• retry_timeout: timeout not to retry anymore
• retry_max_times: how many times to retry
• retry_type: "periodic" w/ fixed retry_wait
• retry_secondary_threshold (percentage)
• output will use secondary if specified percentage
of retry_timeout elapsed after first error

Buffer parameters
• chunk_limit_size
• maximum bytesize per chunks
• chunk_records_limit (default: not speciﬁed)
• maximum number of records per chunks
• total_limit_size
• maximum bytesize which a buffer plugin can use
• (optional) queue_length_limit: no need to specify

Chunk metadata
• Stores various information of buffer chunks
• key-values of chunking unit
• number of records
• created_at, modiﬁed_at
• `chunk.metadata`
• extract_placeholders(@path, chunk.metadata)

Other plugin types

F::P::Input F::P::Filter F::P::Output
Fluent::Plugin::Base
F::P::Buffer
F::P::Parser
F::P::Formatter
F::P::Storage
both of
buffered/non-buffered
F::P::
BareOutput
(not for 3rd party
plugins)
F::P::
MultiOutput
copy
roundrobin"Owned" plugins

"Owned" plugins
• Primary plugins: Input, Output, Filter
• Instantiated by Fluentd core
• "Owned" plugins are owned by primary plugins
• Buffer, Parser, Formatter, Storage, ...
• It can refer owner's plugin id, logger, ...
• Fluent::Plugin.new_xxx("kind", parent:@input)
• "Owned" plugins can be conﬁgured by owner plugins

Owner plugins can control defaults of owned plugins
Fluentd provides standard way to conﬁgure owned
plugins

Fluent::Plugin::Storage

Storage plugins
• Pluggable Key-Value store for plugins
• configurable: autosave, persistent, save_at_shutdown
• get, fetch, put, delete, update (transactional)
• Various possible implementations
• built-in: local (json) on-disk / on-memory
• possible: Redis, Consul, 
or whatever supports serialize/deserialize of json-like object
• To store states of plugins:
• counter values of data-counter plugin
• pos data of file plugin
• To load configuration dynamically for plugins:
• load configurations from any file systems

Plugin Helpers

Plugin Helpers
• No more mixin!
• declare to use helpers by "helpers :name"
• Utility functions to support difﬁcult things
• creating threads, timers, child processes...
• created timers will be stopped automatically in
plugin's shutdown sequence
• Integrated w/ New Test Drivers
• tests runs after helpers started everything requested

Plugin Helpers Example
• Thread: thread_create, thread_current_running?
• Timer: timer_execute
• ChildProcess: child_process_execute
• command, arguments, subprocess_name, interval, immediate,
parallel, mode, stderr, env, unsetenv, chdir, ...
• EventEmitter: router (Output doesn't have router in v0.14 default)
• Storage: storage_create
• (TBD) Socket/Server for TCP/UDP/TLS, Parser, Formatter

New Test Drivers

New Test Drivers
• Instead of old drivers Fluent::Test::*TestDriver
• Fluent::Test::Driver::Input, Output or Filter
• fully emulates actual plugin behavior
• w/ override SystemConﬁg
• capturing emitted events & error event streams
• inserting TestLogger to capture/test logs of plugins
• capturing "format" result of output plugins
• controlling "ﬂush" timing of output plugins
• Running tests under control
• Plugin Helper integration
• conditions to keep/break running tests
• timeouts, number of emits/events to stop tests
• automatic start/shutdown call for plugins

New Features
• Symmetric multi processing
• to use 2 or more CPU cores!
• by sharing a conﬁguration between all processes
• "detach_process" will be deprecated
• forward: TLS + authentication/authorization support
• secure-forward integration
• Buffer supports compression & forward it
• Plugin generator & template

New APIs
• Controlling global configuration from SystemConfig
• configured via <system> tag
• root buffer path + plugin id: remove paths from
each buffers
• process total buffer size control
• Counter APIs
• counting everything over processes via RPC
• creating metrics for a whole fluentd cluster

v1: stable version of v0.14
• v0.12 plugins will be still supported at v1.0.0
• deprecated, and will be obsoleted at v1.x
• Will be obsoleted:
• v0 (traditional) conﬁguration syntax
• "detach_process" feature
• Q4 2016?

To Be Written by me :-)
• As soooooooooon as possible...
• Plugin developers' guide for
• Updating v0.12 plugins with v0.14 APIs
• Writing plugins with v0.14 APIs
• Writing tests of plugins with v0.14 APIs
• Users' guide for
• How to use buffering in general (w/ <buffer>)
• Updated plugin documents

Fluentd v0.14 Plugin API Details

More Related Content

What's hot

Viewers also liked

Similar to Fluentd v0.14 Plugin API Details

More from SATOSHI TAGOMORI

Recently uploaded

Fluentd v0.14 Plugin API Details