Fluentd meets Beats
Elasticsearch meetup #14 - Jan 7, 2015
Who are you?
• Masahiro Nakagawa
• github: @repeatedly
• Treasure Data Inc.
• Fluentd / td-agent developer
• Fluentd Enterprise support
• I love OSS :)
• D Language, MessagePack, The organizer of several meetups, etc…
Beats
• Agent for each purpose by Elastic
• https://www.elastic.co/products/beats
• official: topbeat, filebeat, packetbeat
• 3rd party: dockerbeat, nginxbeat, etc…
• Beats support several outputs: elasticsearch,
logstash, stdout and etc.
• logstash output uses lumberjack protocol so

we can use it for communicating with Beats.
Fluentd
• Pluggable streaming event collector
• Lightweight, robust and flexible
• Lots of plugins on rubygems
• Used by AWS, GCP, MS and more companies
• Resources
• http://www.fluentd.org/
• Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
fluent-plugin-beats
• Input plugin for Elastic Beats
• https://github.com/repeatedly/fluent-plugin-beats
• Use lumberjack protocol to handle events
• Tested with topbeat, filebeat, packetbeat
• Beats use same event format so it should work
with 3rd party Beats.
Configuration example
<source>
@type beats
metadata_as_tag
#format nginx # for filebeat
#bind 0.0.0.0
#port 5044
#max_connections 10
#tag beat.event
</source>
<match *beat>
@type copy
<store>
@type elasticsearch_dynamic
logstash_format true
logstash_prefix ${tag_parts[0]}
type_name ${record['type']}
</store>
<store>
@type tdlog # for backup
</store>
</match>
https://github.com/repeatedly/fluent-plugin-beats#configuration
Result
Note: Performance
• Tested on Mac Book Pro, not 2 machines.

2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3















fluentd with in_tail fluent-agent-hydra filebeat
80,000 events/sec 100,000+ events/sec 18,000 events/sec
Read nginx 100000 logs and count by flowcounter_simple
1. Lumberjack protocol doesn’t focus on throughput
• lumberjack sends/receives ack on each record









2. Beats framework is slow? [Issue #587]
• filebeat is slower than logstash-forwarder
Why filebeat is slow?
data frame
Publish events
ack
ack
Lumberjack protocol
Conclusion
• Beats are useful for collecting various metrics
• fluent-plugin-beats can handle Beats event

and route events to elasitcsearch properly
• Thanks fluent-plugin-elasticsearch plugin ;)
• Note that filebeat is slow so it is not good

on high volume environment
• Use fluentd or fluent-agent-hydra instead

fluent-plugin-beats at Elasticsearch meetup #14

  • 1.
    Fluentd meets Beats Elasticsearchmeetup #14 - Jan 7, 2015
  • 2.
    Who are you? •Masahiro Nakagawa • github: @repeatedly • Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support • I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…
  • 3.
    Beats • Agent foreach purpose by Elastic • https://www.elastic.co/products/beats • official: topbeat, filebeat, packetbeat • 3rd party: dockerbeat, nginxbeat, etc… • Beats support several outputs: elasticsearch, logstash, stdout and etc. • logstash output uses lumberjack protocol so
 we can use it for communicating with Beats.
  • 4.
    Fluentd • Pluggable streamingevent collector • Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies • Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
  • 5.
    fluent-plugin-beats • Input pluginfor Elastic Beats • https://github.com/repeatedly/fluent-plugin-beats • Use lumberjack protocol to handle events • Tested with topbeat, filebeat, packetbeat • Beats use same event format so it should work with 3rd party Beats.
  • 6.
    Configuration example <source> @type beats metadata_as_tag #formatnginx # for filebeat #bind 0.0.0.0 #port 5044 #max_connections 10 #tag beat.event </source> <match *beat> @type copy <store> @type elasticsearch_dynamic logstash_format true logstash_prefix ${tag_parts[0]} type_name ${record['type']} </store> <store> @type tdlog # for backup </store> </match> https://github.com/repeatedly/fluent-plugin-beats#configuration
  • 7.
  • 8.
    Note: Performance • Testedon Mac Book Pro, not 2 machines.
 2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3
 
 
 
 
 
 
 
 fluentd with in_tail fluent-agent-hydra filebeat 80,000 events/sec 100,000+ events/sec 18,000 events/sec Read nginx 100000 logs and count by flowcounter_simple
  • 9.
    1. Lumberjack protocoldoesn’t focus on throughput • lumberjack sends/receives ack on each record
 
 
 
 
 2. Beats framework is slow? [Issue #587] • filebeat is slower than logstash-forwarder Why filebeat is slow? data frame Publish events ack ack Lumberjack protocol
  • 10.
    Conclusion • Beats areuseful for collecting various metrics • fluent-plugin-beats can handle Beats event
 and route events to elasitcsearch properly • Thanks fluent-plugin-elasticsearch plugin ;) • Note that filebeat is slow so it is not good
 on high volume environment • Use fluentd or fluent-agent-hydra instead