The basics of fluentd
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The basics of fluentd

on

  • 23,138 views

fluentd meetup

fluentd meetup

Statistics

Views

Total Views
23,138
Views on SlideShare
6,649
Embed Views
16,489

Actions

Likes
26
Downloads
115
Comments
0

17 Embeds 16,489

http://docs.fluentd.org 7887
http://repeatedly.github.io 4984
http://matsumana.wordpress.com 1458
http://gurimmer.lolipop.jp 1446
http://repeatedly.github.com 485
http://play.daumcorp.com 94
http://localhost 36
http://wiki.home.wols.org 30
https://matsumana.wordpress.com 22
http://127.0.0.1 17
http://matsumana.info 11
https://twitter.com 5
https://www.google.co.jp 4
http://webcache.googleusercontent.com 3
http://translate.googleusercontent.com 3
http://www.feedspot.com 2
http://mobile.home.wols.org 2
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The basics of fluentd Presentation Transcript

  • 1. The basics of Fluentd Masahiro Nakagawa Treasuare Data, Inc. Senior Software Engineer
  • 2. Structured logging Reliable forwardinghttp://fluentd.org/ Pluggable architecture
  • 3. Agenda> Background> Overview> Product Comparison> Use cases
  • 4. Background
  • 5. Data Processing Data source Collect Store Process Visualize ReportingMonitoring
  • 6. Related Products easier & shorter time Collect Store Process Visualize??? Cloudera Excel Horton Works Tableau Treasure Data R
  • 7. Before Fluentd Server1 Server2 Server3Application Application Application ・・・ ・・・ ・・・ High Latency! must wait for a day... Fluent Log Server
  • 8. After Fluentd Server1 Server2 Server3Application Application Application Fluentd ・・・ Fluentd ・・・ Fluentd ・・・ In streaming! Fluentd Fluentd
  • 9. Overview
  • 10. In short> Open sourced log collector written in Ruby> Using rubygems ecosystem for plugins It’s like syslogd, butuses JSON for log messages
  • 11. Time 2012-12-11 07:26:27 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert127.0.0.1127.0.0.1127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ...127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo
  • 12. Event structure(log message)✓ Time ✓ Record> second unit > JSON format> from data source or > MessagePack adding parsed time internally✓ Tag > non-unstructured> for message routing
  • 13. ArchitecturePluggable Pluggable Pluggable Input Buffer Output> Forward > Memory > Forward> HTTP > File > File> File tail > Amazon S3> dstat > MongoDB> ... > ...
  • 14. Client libraries> Ruby> Java Application> Perl> PHP Time:Tag:Record> Python>D> Scala> ... Fluentd# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}
  • 15. Configuration and operation> No central / master node > HTTP include helps conf sharing> Operation depends on your environment > Use your deamon management > Use chef in Treasure Data> Scribe like syntax
  • 16. # receive events via HTTP # save alerts to a file<source> <match alert.**> type http type file port 8888 path /var/log/fluent/alerts</source> </match># read logs from a file # forward other logs to servers<source> <match **> type tail type forward path /var/log/httpd.log <server> format apache host 192.168.0.11 tag apache.access weight 20</source> </server> <server># save access logs to MongoDB host 192.168.0.12<match apache.access> weight 60 type mongo </server> database apache </match> collection log</match> include http://example.com/conf
  • 17. Reliability (core + plugin)> Buffering > Use file buffer for persistent data > buffer chunk has ID for idempotent> Retrying> Error handling > transaction, failover, etc on forward plugin > secondary
  • 18. Plugins - use rubygems$ fluent-gem search -rd fluent-plugin$ fluent-gem search -rd fluent-mixin$ fluent-gem install fluent-plugin-mongo※ Today, don’t talk the plugin development
  • 19. http://fluentd.org/plugin/
  • 20. in_tail Apache Fluentd ✓ read a log file ✓ custom regexp ✓ custom parser in Ruby access.logSupported format: > apache > json > apache2 > csv > syslog > tsv > nginx > ltsv (since v0.10.32)
  • 21. out_mongo Apache Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  • 22. out_webhdfs ✓ custom text formatter Apache Fluentd access.log buffer HDFS ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...
  • 23. out_copy + other plugins Hadoop Apache Fluentd access.log buffer Amazon S3 ✓ routing based on tags ✓ copy to multiple storages
  • 24. out_forward ✓ automatic fail-over ✓ load balancing Fluentd apache Apache Fluentd Fluentd Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  • 25. Forward topologyFluentd send/ackFluentd Fluentd send/ack FluentdFluentd FluentdFluentd
  • 26. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  • 27. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  • 28. Access logs Alerting Apache NagiosApp logs Analysis Frontend MongoDB Backend MySQLSystem logs Hadoop syslogd filter / buffer / routing ArchivingDatabases Amazon S3
  • 29. td-agent> Open sourced distribution package of fluentd> ETL part of Treasure Data> Including useful components > ruby, jemalloc, fluentd > 3rd party gems: td, mongo, webhdfs, etc... td plugin is for TD> http://packages.treasure-data.com/
  • 30. v11> Breaking source code compatibility > Not protocol.> Windows support> Error handling enhancement> Better DSL configuration> etc: https://gist.github.com/frsyuki/2638703
  • 31. Product Comparison
  • 32. Scribe Scribe: log collector by Facebook
  • 33. Pros and Cons>● Pros > Fast (written in C++)>● Cons > Hard to install and extend Are you a C++ magician? > Deal with unstructured logs > No longer maintained Replaced with Calligraphus at Facebook
  • 34. FlumeFlume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Topology Hadoop HDFS
  • 35. Network topology Master Agent ack Agent CollectorFlume OG Collector Agent Collector send Agent Master Option Agent send/ack Agent CollectorFlume NG Collector Agent Collector Agent
  • 36. Pros and Cons>● Pros > Using central master to manage all nodes>● Cons > Java culture (Pros for Java-er?) Difficult configuration and setup > Difficult topology > Mainly for Hadoop less plugins?
  • 37. Use cases
  • 38. Treasure Data Frontend Worker Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Data Metricsfor historical analysis for realtime analysis
  • 39. Cookpadhundreds of app servers Rails app td-agent sends event logs Daily/Hourly Google Batch Spreadsheet Rails app td-agent Treasure Data sends event logs MySQL Rails app td-agent Logs are available sends event logs after several mins. KPI Feedback rankings visualization Unlimited scalability Flexible schema Realtime Less performance impact ✓ Over 100 RoR servers (2012/2/4)
  • 40. NHN Japan Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH✓ 16 nodes Hadoop Cluster hive server✓ 120,000+ lines/sec CDH4 Shib ShibUI✓ 400Mbps at peak Huahin✓ 1.5+ TB/day (raw) (HDFS, YARN) Manager http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
  • 41. Other companies
  • 42. Conclusion> Fluentd is a widely-used log collector > There are many use cases > Many contributors and plugins> Keep it simple > Easy to integrate your environment