The basics of fluentd

34,028 views

Published on

fluentd meetup

Published in: Technology

The basics of fluentd

  1. 1. The basics of Fluentd Masahiro Nakagawa Treasuare Data, Inc. Senior Software Engineer
  2. 2. Structured logging Reliable forwardinghttp://fluentd.org/ Pluggable architecture
  3. 3. Agenda> Background> Overview> Product Comparison> Use cases
  4. 4. Background
  5. 5. Data Processing Data source Collect Store Process Visualize ReportingMonitoring
  6. 6. Related Products easier & shorter time Collect Store Process Visualize??? Cloudera Excel Horton Works Tableau Treasure Data R
  7. 7. Before Fluentd Server1 Server2 Server3Application Application Application ・・・ ・・・ ・・・ High Latency! must wait for a day... Fluent Log Server
  8. 8. After Fluentd Server1 Server2 Server3Application Application Application Fluentd ・・・ Fluentd ・・・ Fluentd ・・・ In streaming! Fluentd Fluentd
  9. 9. Overview
  10. 10. In short> Open sourced log collector written in Ruby> Using rubygems ecosystem for plugins It’s like syslogd, butuses JSON for log messages
  11. 11. Time 2012-12-11 07:26:27 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert127.0.0.1127.0.0.1127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ...127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo
  12. 12. Event structure(log message)✓ Time ✓ Record> second unit > JSON format> from data source or > MessagePack adding parsed time internally✓ Tag > non-unstructured> for message routing
  13. 13. ArchitecturePluggable Pluggable Pluggable Input Buffer Output> Forward > Memory > Forward> HTTP > File > File> File tail > Amazon S3> dstat > MongoDB> ... > ...
  14. 14. Client libraries> Ruby> Java Application> Perl> PHP Time:Tag:Record> Python>D> Scala> ... Fluentd# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}
  15. 15. Configuration and operation> No central / master node > HTTP include helps conf sharing> Operation depends on your environment > Use your deamon management > Use chef in Treasure Data> Scribe like syntax
  16. 16. # receive events via HTTP # save alerts to a file<source> <match alert.**> type http type file port 8888 path /var/log/fluent/alerts</source> </match># read logs from a file # forward other logs to servers<source> <match **> type tail type forward path /var/log/httpd.log <server> format apache host 192.168.0.11 tag apache.access weight 20</source> </server> <server># save access logs to MongoDB host 192.168.0.12<match apache.access> weight 60 type mongo </server> database apache </match> collection log</match> include http://example.com/conf
  17. 17. Reliability (core + plugin)> Buffering > Use file buffer for persistent data > buffer chunk has ID for idempotent> Retrying> Error handling > transaction, failover, etc on forward plugin > secondary
  18. 18. Plugins - use rubygems$ fluent-gem search -rd fluent-plugin$ fluent-gem search -rd fluent-mixin$ fluent-gem install fluent-plugin-mongo※ Today, don’t talk the plugin development
  19. 19. http://fluentd.org/plugin/
  20. 20. in_tail Apache Fluentd ✓ read a log file ✓ custom regexp ✓ custom parser in Ruby access.logSupported format: > apache > json > apache2 > csv > syslog > tsv > nginx > ltsv (since v0.10.32)
  21. 21. out_mongo Apache Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  22. 22. out_webhdfs ✓ custom text formatter Apache Fluentd access.log buffer HDFS ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...
  23. 23. out_copy + other plugins Hadoop Apache Fluentd access.log buffer Amazon S3 ✓ routing based on tags ✓ copy to multiple storages
  24. 24. out_forward ✓ automatic fail-over ✓ load balancing Fluentd apache Apache Fluentd Fluentd Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  25. 25. Forward topologyFluentd send/ackFluentd Fluentd send/ack FluentdFluentd FluentdFluentd
  26. 26. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  27. 27. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  28. 28. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  29. 29. td-agent> Open sourced distribution package of fluentd> ETL part of Treasure Data> Including useful components > ruby, jemalloc, fluentd > 3rd party gems: td, mongo, webhdfs, etc... td plugin is for TD> http://packages.treasure-data.com/
  30. 30. v11> Breaking source code compatibility > Not protocol.> Windows support> Error handling enhancement> Better DSL configuration> etc: https://gist.github.com/frsyuki/2638703
  31. 31. Product Comparison
  32. 32. Scribe Scribe: log collector by Facebook
  33. 33. Pros and Cons>● Pros > Fast (written in C++)>● Cons > Hard to install and extend Are you a C++ magician? > Deal with unstructured logs > No longer maintained Replaced with Calligraphus at Facebook
  34. 34. FlumeFlume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Topology Hadoop HDFS
  35. 35. Network topology Master Agent ack Agent CollectorFlume OG Collector Agent Collector send Agent Master Option Agent send/ack Agent CollectorFlume NG Collector Agent Collector Agent
  36. 36. Pros and Cons>● Pros > Using central master to manage all nodes>● Cons > Java culture (Pros for Java-er?) Difficult configuration and setup > Difficult topology > Mainly for Hadoop less plugins?
  37. 37. Use cases
  38. 38. Treasure Data Frontend Worker Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Data Metricsfor historical analysis for realtime analysis
  39. 39. Cookpadhundreds of app servers Rails app td-agent sends event logs Daily/Hourly Google Batch Spreadsheet Rails app td-agent Treasure Data sends event logs MySQL Rails app td-agent Logs are available sends event logs after several mins. KPI Feedback rankings visualization Unlimited scalability Flexible schema Realtime Less performance impact ✓ Over 100 RoR servers (2012/2/4)
  40. 40. NHN Japan Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH✓ 16 nodes Hadoop Cluster hive server✓ 120,000+ lines/sec CDH4 Shib ShibUI✓ 400Mbps at peak Huahin✓ 1.5+ TB/day (raw) (HDFS, YARN) Manager http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
  41. 41. Other companies
  42. 42. Conclusion> Fluentd is a widely-used log collector > There are many use cases > Many contributors and plugins> Keep it simple > Easy to integrate your environment

×