Fluentd: Data streams in Ruby world #rdrc2014

8,693 views
8,579 views

Published on

Published in: Software, Technology
0 Comments
30 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,693
On SlideShare
0
From Embeds
0
Number of Embeds
3,349
Actions
Shares
0
Downloads
63
Comments
0
Likes
30
Embeds 0
No embeds

No notes for slide

Fluentd: Data streams in Ruby world #rdrc2014

  1. 1. Fluentd: Data streams in Ruby world @tagomoris RedDotRubyConf 2014 Day1, 26 June 2014 14年6月26日木曜日
  2. 2. TAGOMORI Satoshi a.k.a. @tagomoris 14年6月26日木曜日
  3. 3. 14年6月26日木曜日
  4. 4. 14年6月26日木曜日
  5. 5. 14年6月26日木曜日
  6. 6. Fluentd Fluentd is an open source data collector to simplify log management. Fluentd is designed to process high-volume data streams reliably. Use cases include real-time search and monitoring, Big Data analytics, reliable archiving and more. http://www.fluentd.org/ 14年6月26日木曜日
  7. 7. Before Fluentd: Access logs Metrics Archives apache nginx graphs Amazon S3 Filesystem tail -f scp python Error handling? Buffering? 14年6月26日木曜日
  8. 8. Before Fluentd: Access logs Metrics Analytics Archives apache nginx graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd Error handling? Buffering? Routing? API Keys? 14年6月26日木曜日
  9. 9. Before Fluentd: Access logs App logs Metrics Analytics Archives apache nginx frontend backend graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  10. 10. Before Fluentd: Access logs App logs System logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  11. 11. Before Fluentd: CHAOS Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger file logger ruby cmd ruby Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  12. 12. After Fluentd: Controllable Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem 14年6月26日木曜日
  13. 13. Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem Fluentd does: Format, Buffer, Retry, Route After Fluentd: Controllable 14年6月26日木曜日
  14. 14. Fluentd Open source data collector Written in Ruby, runs on CRuby on UNIX-like OS With error handling and routing in core Plugin systems Input, Output and Buffer (w/ many built-in plugins) Distributed on rubygems.org Fluentd and its plugins: gem install fluentd rpm/deb are also available (td-agent) 14年6月26日木曜日
  15. 15. Why Fluentd? 14年6月26日木曜日
  16. 16. Why Fluentd? Fluentd’s logo is very cute! 14年6月26日木曜日
  17. 17. He is also very cute... 14年6月26日木曜日
  18. 18. Why Fluentd? Simple data structure tag, time and record(hash) Apache-like configuration syntax Simple / powerful routing Many public plugins Just few steps for custom plugins Scalability 14年6月26日木曜日
  19. 19. Fluentd Event app.device.ios 2014-06-24 16:28:50 { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } Event 14年6月26日木曜日
  20. 20. Fluentd Event app.device.ios 1403512916 (2014-06-23 16:41:56 +0800) { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } tag time record 14年6月26日木曜日
  21. 21. Fluentd Event app.device.ios 1403512916 (2014-06-23 16:41:56 +0800) { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } tag for routing record structured data time by unix time 14年6月26日木曜日
  22. 22. # read from a file and parse <source> type tail path /var/log/httpd.log format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to MongoDB and S3 <match app.**> type copy <store> type mongo host mongo.example.com capped capped_size 200m </store> <store> type s3 path archive/ </store> </match> Fluentd Configuration 14年6月26日木曜日
  23. 23. # read from a file and parse <source> type tail path /var/log/httpd.log format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to MongoDB and S3 <match app.**> type copy <store> type mongo host mongo.example.com capped capped_size 200m </store> <store> type s3 path archive/ </store> </match> Fluentd Configuration for input for output 14年6月26日木曜日
  24. 24. # read from a file and parse source { type ”tail” path “/var/log/httpd.log” format “apache2” tag ”web.access” } # logs from client libraries source { type ”forward” port 24224 } # store logs to MongoDB and S3 match(“app.**”) { type ”copy” store { type ”mongo” host “mongo.example.com” capped capped_size “200m” } store { type ”s3” path “archive/” } } Fluentd Configuration DSL 14年6月26日木曜日
  25. 25. Tag based routing input input output output input output output core tag time record web.log sys.* app.** ** 14年6月26日木曜日
  26. 26. Tag based routing input input output output input output output core tag time record web.log sys.* app.** ** converted.web.log 14年6月26日木曜日
  27. 27. 300+ Public Plugins access, add, aes-forward, airbrake-python, amazon_sns, amplifier-filter, amqp, amqp2, andon, anomalydetect, anonymizer, arango, arduino, axlsx, backlog, bigquery, boundio, buffer- lightening, buffered-filter, buffered-hipchat, buffered-stdout, bufferize, calc, cassandra, cassandra-cql, cloudstack, cloudwatch, cloudwatch_ya, combiner, conditional_filter, config- expander, config_pit, config_reloader, convert-value-to-sha, copy_ex, couch, couch-sharded, couchbase, dashing, data-rejecter, datacalculator, datacounter, dbi, dd, debug, delay- inspector, delayed, derive, df, droonga, dstat, dummydata-producer, dynamodb, ec2-metadata, elapsed-time, elasticsearch, elasticsearch-cluster, elasticsearch-ruby, elb-log, embedded- elasticsearch, eval-filter, event-tail, extract_query_params, file-alternative, file-sprintf, filter, filter_keys, flatten, flatten-hash, flowcounter, flowcounter-simple, flume, fnordmetric, forest, fork, format, forward-aws, ftp, gamobile, ganglia, gc, geoip, glusterfs, graphite, grassland, gree_community, grep, grepcounter, groonga, groupcounter, growl, growthforecast, gstore, hash-forward, hato, hbase, hekk_redshift, heroku-postgres, heroku- syslog, hipchat, histogram, hoop, hostname, hrforecast, http-enhanced, http-ex, http-list, http-status, https-json, idobata, ikachan, imagefile, imkayac, in-udp-event, incremental, influxdb, influxdb_metrics, inline-classifier, irc, jabber, json-api, json-nest2flat, jsonbucket, jstat, jubatus, jvmwatcher, kafka, kanicounter, keep-forward, kestrel, kibana- server, kinesis-alt, latency, leftronic, librato-metrics, loggly, lossycount, mackerel, mail, map, measure_time, mecab, metricsense, mixi_community, mixpanel, mobile-carrier, mongo, mongo-typed, mongokpi, mqtt, msgpack-rpc, mssql, multiprocess, munin, mysql, mysql-binlog, mysql-bulk, mysql-load, mysql-prepared-statement, mysql-query, mysql-replicator, mysqlslowquery, mysqlslowquerylog, nats, network-probe, nginx-status, nicorepo, norikra, notifier, numeric-counter, numeric-monitor, onlineuser, openldap-monitor, opentsdb, order, out-http, out-http-buffered, out-solr, parser, pgdist, pghstore, pgjson, ping-message, postgres, qqwry, rambler, rawexec, rds-log, rds-slowlog, reassemble, record http://www.fluentd.org/plugins 14年6月26日木曜日
  28. 28. Fluentd patterns 14年6月26日木曜日
  29. 29. 1. read logs from file and write these on storages file in_tail read, parse out_file format, write file 14年6月26日木曜日
  30. 30. 1. read logs from file and write these on storages file read, parse insert MongoDBout_mongo https://github.com/fluent/fluent-plugin-mongo in_tail 14年6月26日木曜日
  31. 31. 1. read logs from file and write these on storages file read, parse out_mysql insert MySQL https://github.com/tagomoris/fluent-plugin-mysql in_tail 14年6月26日木曜日
  32. 32. 1. read logs from file and write these on storages file read, parse out_elasticsearch send Elasticsearch https://github.com/uken/fluent-plugin-elasticsearch in_tail 14年6月26日木曜日
  33. 33. 1. read logs from file and write these on storages file read, parse out_webhdfs format, write Hadoop HDFS https://github.com/fluent/fluent-plugin-webhdfs in_tail 14年6月26日木曜日
  34. 34. 1. read logs from file and write these on storages file read, parse out_s3 format, write Amazon S3 https://github.com/fluent/fluent-plugin-s3 in_tail 14年6月26日木曜日
  35. 35. 1. read logs from file and write these on storages file read, parse out_redshift insert Amazon Redshift https://github.com/hapyrus/fluent-plugin-redshift in_tail 14年6月26日木曜日
  36. 36. 1. read logs from file and write these on storages file read, parse out_bigquery insert Google BigQuery https://github.com/tagomoris/fluent-plugin-bigquery in_tail 14年6月26日木曜日
  37. 37. 2. receive and forward data from/to other node forward forward forward input events input events output events fluent-logger-ruby fluent-logger-java ... send events over TCP 14年6月26日木曜日
  38. 38. 2. receive and forward data from/to other node forward forward forward load balance, active-standby forward forward forward 14年6月26日木曜日
  39. 39. datacenter 2’. receive and forward data from/to other node, over internet & SSL secure-forward secure-forward datacenter secure-forward send events over SSL with authentication https://github.com/tagomoris/fluent-plugin-secure-forward 14年6月26日木曜日
  40. 40. 3. connect with other middleware in_syslog syslog Flume Scribe Kafka in_flume in_scribe in_kafka out_flume in_scribe in_kafka Flume Scribe Kafka https://github.com/fluent/fluent-plugin-flume https://github.com/fluent/fluent-plugin-scribe https://github.com/htgc/fluent-plugin-kafka/ 14年6月26日木曜日
  41. 41. 4. copy events forward copy forward webhdfs Hadoop HDFS 14年6月26日木曜日
  42. 42. 5. count events by string values forward any outputs count records by regexp patterns events { “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ... } datacounter https://github.com/tagomoris/fluent-plugin-datacounter 14年6月26日木曜日
  43. 43. 5. count events by numeric values forward numeric-counter any outputs count records by numerical range https://github.com/tagomoris/fluent-plugin-numeric-counter events { “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ... } 14年6月26日木曜日
  44. 44. 5. aggregate numeric values forward numeric-monitor any outputs calculate real-time metrics of numeric values { “max”: 128, “min”: 16, “avg”: 64.0, “sum”: 1024, “num”: 20, “percentile_50”: 48, “percentile_90”: 112, ... } https://github.com/tagomoris/fluent-plugin-numeric-monitor events 14年6月26日木曜日
  45. 45. 6. various inputs: Linux performance (dstat) in_dstatdstat collect server performance data https://github.com/shun0102/fluent-plugin-dstat 14年6月26日木曜日
  46. 46. 6. various inputs: SQL execution in_sql input from SELECT RDBMS https://github.com/fluent/fluent-plugin-sql 14年6月26日木曜日
  47. 47. 6. various inputs: external command in_execany commands input from STDOUT of any commands 14年6月26日木曜日
  48. 48. 7. various outpus: notification on IRC out_ikachan notice on IRC channel IRC https://github.com/tagomoris/fluent-plugin-ikachan 14年6月26日木曜日
  49. 49. 7. various outpus: notification on IRC out_ikachan notice on IRC channel IRC https://github.com/tagomoris/fluent-plugin-ikachan 14:56 ikachan: HTTP status_4xx crit [2014-06-23 14:56:29 +0900] serviceX: 100.00 (threshold 75.0) http://graph.tool.local/view_graph/accesslog/httpstatus/serviceX_4xx_percentage 14:57 kazeburo: ↑ 40x 100%... 14年6月26日木曜日
  50. 50. 7. various outpus: notification on HipChat out_hipchat notice on HipChat HipChat https://github.com/hotchpotch/fluent-plugin-hipchat 14年6月26日木曜日
  51. 51. 7. various outpus: graph tools out_growthforecast POST data into graph tools GrowthForecast or Focuslight https://github.com/tagomoris/fluent-plugin-growthforecast 14年6月26日木曜日
  52. 52. 7. various outpus out_growthforecast POST data into graph tools GrowthForecast or Focuslight https://github.com/tagomoris/fluent-plugin-growthforecast 14年6月26日木曜日
  53. 53. 7. various outpus: external command out_exec any commands output into STDIN of any commands 14年6月26日木曜日
  54. 54. 8. filters: stream processing: external command any inputs any outputs format & write into STDIN exec_filter any commands read & parse from STDOUT read from STDIN do WHATEVER you want write into STDOUT ex: tail -f | grep ... | sed ... | cat events 14年6月26日木曜日
  55. 55. 8. filters: stream processing w/ external server RPC any inputs any outputs send out_norikra fetch stream processing w/ SQL in_norikra http://norikra.github.io/ SELECT stage, score, COUNT(*) AS c FROM results.win:time_batch(1 min) WHERE stage > 1 AND user.valid GROUP BY stage, score events 14年6月26日木曜日
  56. 56. ... And, Fluentd does error handling and retries for all of these plugins! 14年6月26日木曜日
  57. 57. Before Fluentd: CHAOS Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger file logger ruby cmd ruby 14年6月26日木曜日
  58. 58. After Fluentd: Controllable Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem 14年6月26日木曜日
  59. 59. Fluentd: Now and then 14年6月26日木曜日
  60. 60. Fluentd versions Latest: v0.10.50 released on Jun 17, 2014 v0.10.x: Stable versions many minor feature updates, bug fixes new features for v1 14年6月26日木曜日
  61. 61. Fluentd v1 Planned as the first major release someday in 2014 (?) 100% Compatible with v0.10.x New (and additional) features on v1.x loadmap https://github.com/fluent/fluentd/issues/251 new configuration syntax, plugin backends daemon process management multi core CPU supports 14年6月26日木曜日
  62. 62. Fluentd on JRuby Under development! trying to fix Cool.io to support JRuby 14年6月26日木曜日
  63. 63. Fluentd on Windows Under development! “windows” branch on github fluent/fluend 14年6月26日木曜日
  64. 64. Use case in LINE 14年6月26日木曜日
  65. 65. Analytics data flow overview servers Fluentd Cluster archive visualization notifications Hadoop Fluentd Norikra application metrics 14年6月26日木曜日
  66. 66. servers Fluentd Cluster archive visualization notifications Hadoop Fluentd Norikra application metrics delivery/stream-map aggregate/stream-reduce 14年6月26日木曜日
  67. 67. archive visualization notifications Hadoop Norikra application metrics fluent-agent-lite non-parsed raw logs non-parsed access logs deliver: receive/archive/load-balance worker: parse/store/forward watcher: monitor/notify cep: general-purpose stream processing 14年6月26日木曜日
  68. 68. Fluentd cluster statistics Fluentd nodes access/application logs from 600+ nodes receiver: 5 server (60 process) parser/converter: 10 server (90 process) stream processing: 3 server 14年6月26日木曜日
  69. 69. Fluentd cluster statistics Daily: 5.5+ Billion events, 1.5TB+ data Peak time: 150,000+ events /sec, 300+ Mbps 14年6月26日木曜日
  70. 70. Fluentd is the best partner for stream-processing newbies and rubyists! Check out sites and code! http://fluentd.org/ https://github.com/fluent/fluentd 14年6月26日木曜日
  71. 71. FAQ 14年6月26日木曜日
  72. 72. Fault-tolerance? Node level fault-tolerance File buffer: processing data can be serialized on disk Cluster level fault-tolerance Copy + Forward(load balance, active-standby) Event level assurance: ACK? NO (for performance reason) 14年6月26日木曜日
  73. 73. Performance? NOT SO BAD: real throughput depends on plugin/configuration simple event transferring: 10-20k events/sec 14年6月26日木曜日
  74. 74. vs Scribe? vs Flume? 14年6月26日木曜日
  75. 75. vs Storm? 14年6月26日木曜日
  76. 76. Eco-system? Clones? ik fluent-agent-lite fluenpy 14年6月26日木曜日

×