Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fluentd and Kafka
Hadoop / Spark Conference Japan 2016

Feb 8, 2016
Who are you?
• Masahiro Nakagawa
• github: @repeatedly
• Treasure Data Inc.
• Fluentd / td-agent developer
• Fluentd Enter...
Fluentd
• Pluggable streaming event collector
• Lightweight, robust and flexible
• Lots of plugins on rubygems
• Used by AW...
Popular case
App
Push
Push
Forwarder Aggregator Destination
• Distributed messaging system
• Producer - Broker - Consumer pattern
• Pull model, replication, etc











Apache Kaf...
Push vs Pull
• Push:
• Easy to transfer data to multiple destinations
• Hard to control stream ratio in multiple streams
•...
There are 2 ways
• fluent-plugin-kafka
• kafka-fluentd-consumer
fluent-plugin-kafka
• Input / Output plugin for kafka
• https://github.com/htgc/fluent-plugin-kafka
• in_kafka, in_kafka_gro...
Configuration example
<source>
@type kafka
topics web,system
format json
add_prefix kafka.
# more options
</source>
<match k...
kafka fluentd consumer
• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• S...
Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command:

$ java 

-Dlog4j.configuration=file:///pa...
Properties example
fluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<te...
With Fluentd example
<source>
@type forward
</source>
<source>
@type exec
command java -
Dlog4j.configuration=file:///path/t...
Conclusion
• Kafka is now becomes important component

on data platform
• Fluentd can communicate with Kafka
• Fluentd plu...
Upcoming SlideShare
Loading in …5
×

Fluentd and Kafka

8,731 views

Published on

Talks at Hadoop / Spark Conference Japane 2016

Published in: Technology

Fluentd and Kafka

  1. 1. Fluentd and Kafka Hadoop / Spark Conference Japan 2016
 Feb 8, 2016
  2. 2. Who are you? • Masahiro Nakagawa • github: @repeatedly • Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support • I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…
  3. 3. Fluentd • Pluggable streaming event collector • Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies • Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
  4. 4. Popular case App Push Push Forwarder Aggregator Destination
  5. 5. • Distributed messaging system • Producer - Broker - Consumer pattern • Pull model, replication, etc
 
 
 
 
 
 Apache Kafka App Push Pull Producer Broker DestinationConsumer
  6. 6. Push vs Pull • Push: • Easy to transfer data to multiple destinations • Hard to control stream ratio in multiple streams • Pull: • Easy to control stream flow / ratio • Should manage consumers correctly
  7. 7. There are 2 ways • fluent-plugin-kafka • kafka-fluentd-consumer
  8. 8. fluent-plugin-kafka • Input / Output plugin for kafka • https://github.com/htgc/fluent-plugin-kafka • in_kafka, in_kafka_group, out_kafka, out_kafka_buffered • Pros • Easy to use and output support • Cons • Performance is not primary
  9. 9. Configuration example <source> @type kafka topics web,system format json add_prefix kafka. # more options </source> <match kafka.**> @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1 </match> https://github.com/htgc/fluent-plugin-kafka#usage
  10. 10. kafka fluentd consumer • Stand-alone kafka consumer for fluentd • https://github.com/treasure-data/kafka-fluentd-consumer • Send cosumed events to fluentd’s in_forward • Pros • High performance and Java API features • Cons • Need Java runtime
  11. 11. Run consumer • Edit log4j and fluentd-consumer properties • Run following command:
 $ java 
 -Dlog4j.configuration=file:///path/to/log4j.properties 
 -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar 
 path/to/fluentd-consumer.properties
  12. 12. Properties example fluentd.tag.prefix=kafka.event. fluentd.record.format=regexp # default is json fluentd.record.pattern=(?<text>.*) # for regexp format fluentd.consumer.topics=app.* # can use Java Rege fluentd.consumer.topics.pattern=blacklist # default is whitelist fluentd.consumer.threads=5 https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
  13. 13. With Fluentd example <source> @type forward </source> <source> @type exec command java - Dlog4j.configuration=file:///path/to/ log4j.properties -jar /path/to/kafka- fluentd-consumer-0.2.1-all.jar /path/ to/config/fluentd- consumer.properties tag dummy format json </source> https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
  14. 14. Conclusion • Kafka is now becomes important component
 on data platform • Fluentd can communicate with Kafka • Fluentd plugin and kafka consumer • Building reliable and flexible data pipeline with
 Fluentd and Kafka

×