Fluentd: Data streams in Ruby world #rdrc2014

  • 6,367 views
Uploaded on

 

More in: Software , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,367
On Slideshare
0
From Embeds
0
Number of Embeds
18

Actions

Shares
Downloads
45
Comments
0
Likes
23

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Fluentd: Data streams in Ruby world @tagomoris RedDotRubyConf 2014 Day1, 26 June 2014 14年6月26日木曜日
  • 2. TAGOMORI Satoshi a.k.a. @tagomoris 14年6月26日木曜日
  • 3. 14年6月26日木曜日
  • 4. 14年6月26日木曜日
  • 5. 14年6月26日木曜日
  • 6. Fluentd Fluentd is an open source data collector to simplify log management. Fluentd is designed to process high-volume data streams reliably. Use cases include real-time search and monitoring, Big Data analytics, reliable archiving and more. http://www.fluentd.org/ 14年6月26日木曜日
  • 7. Before Fluentd: Access logs Metrics Archives apache nginx graphs Amazon S3 Filesystem tail -f scp python Error handling? Buffering? 14年6月26日木曜日
  • 8. Before Fluentd: Access logs Metrics Analytics Archives apache nginx graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd Error handling? Buffering? Routing? API Keys? 14年6月26日木曜日
  • 9. Before Fluentd: Access logs App logs Metrics Analytics Archives apache nginx frontend backend graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  • 10. Before Fluentd: Access logs App logs System logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  • 11. Before Fluentd: CHAOS Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger file logger ruby cmd ruby Error handling? Buffering? Routing? API Keys? Formats? 14年6月26日木曜日
  • 12. After Fluentd: Controllable Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem 14年6月26日木曜日
  • 13. Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem Fluentd does: Format, Buffer, Retry, Route After Fluentd: Controllable 14年6月26日木曜日
  • 14. Fluentd Open source data collector Written in Ruby, runs on CRuby on UNIX-like OS With error handling and routing in core Plugin systems Input, Output and Buffer (w/ many built-in plugins) Distributed on rubygems.org Fluentd and its plugins: gem install fluentd rpm/deb are also available (td-agent) 14年6月26日木曜日
  • 15. Why Fluentd? 14年6月26日木曜日
  • 16. Why Fluentd? Fluentd’s logo is very cute! 14年6月26日木曜日
  • 17. He is also very cute... 14年6月26日木曜日
  • 18. Why Fluentd? Simple data structure tag, time and record(hash) Apache-like configuration syntax Simple / powerful routing Many public plugins Just few steps for custom plugins Scalability 14年6月26日木曜日
  • 19. Fluentd Event app.device.ios 2014-06-24 16:28:50 { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } Event 14年6月26日木曜日
  • 20. Fluentd Event app.device.ios 1403512916 (2014-06-23 16:41:56 +0800) { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } tag time record 14年6月26日木曜日
  • 21. Fluentd Event app.device.ios 1403512916 (2014-06-23 16:41:56 +0800) { “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ... } tag for routing record structured data time by unix time 14年6月26日木曜日
  • 22. # read from a file and parse <source> type tail path /var/log/httpd.log format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to MongoDB and S3 <match app.**> type copy <store> type mongo host mongo.example.com capped capped_size 200m </store> <store> type s3 path archive/ </store> </match> Fluentd Configuration 14年6月26日木曜日
  • 23. # read from a file and parse <source> type tail path /var/log/httpd.log format apache2 tag web.access </source> # logs from client libraries <source> type forward port 24224 </source> # store logs to MongoDB and S3 <match app.**> type copy <store> type mongo host mongo.example.com capped capped_size 200m </store> <store> type s3 path archive/ </store> </match> Fluentd Configuration for input for output 14年6月26日木曜日
  • 24. # read from a file and parse source { type ”tail” path “/var/log/httpd.log” format “apache2” tag ”web.access” } # logs from client libraries source { type ”forward” port 24224 } # store logs to MongoDB and S3 match(“app.**”) { type ”copy” store { type ”mongo” host “mongo.example.com” capped capped_size “200m” } store { type ”s3” path “archive/” } } Fluentd Configuration DSL 14年6月26日木曜日
  • 25. Tag based routing input input output output input output output core tag time record web.log sys.* app.** ** 14年6月26日木曜日
  • 26. Tag based routing input input output output input output output core tag time record web.log sys.* app.** ** converted.web.log 14年6月26日木曜日
  • 27. 300+ Public Plugins access, add, aes-forward, airbrake-python, amazon_sns, amplifier-filter, amqp, amqp2, andon, anomalydetect, anonymizer, arango, arduino, axlsx, backlog, bigquery, boundio, buffer- lightening, buffered-filter, buffered-hipchat, buffered-stdout, bufferize, calc, cassandra, cassandra-cql, cloudstack, cloudwatch, cloudwatch_ya, combiner, conditional_filter, config- expander, config_pit, config_reloader, convert-value-to-sha, copy_ex, couch, couch-sharded, couchbase, dashing, data-rejecter, datacalculator, datacounter, dbi, dd, debug, delay- inspector, delayed, derive, df, droonga, dstat, dummydata-producer, dynamodb, ec2-metadata, elapsed-time, elasticsearch, elasticsearch-cluster, elasticsearch-ruby, elb-log, embedded- elasticsearch, eval-filter, event-tail, extract_query_params, file-alternative, file-sprintf, filter, filter_keys, flatten, flatten-hash, flowcounter, flowcounter-simple, flume, fnordmetric, forest, fork, format, forward-aws, ftp, gamobile, ganglia, gc, geoip, glusterfs, graphite, grassland, gree_community, grep, grepcounter, groonga, groupcounter, growl, growthforecast, gstore, hash-forward, hato, hbase, hekk_redshift, heroku-postgres, heroku- syslog, hipchat, histogram, hoop, hostname, hrforecast, http-enhanced, http-ex, http-list, http-status, https-json, idobata, ikachan, imagefile, imkayac, in-udp-event, incremental, influxdb, influxdb_metrics, inline-classifier, irc, jabber, json-api, json-nest2flat, jsonbucket, jstat, jubatus, jvmwatcher, kafka, kanicounter, keep-forward, kestrel, kibana- server, kinesis-alt, latency, leftronic, librato-metrics, loggly, lossycount, mackerel, mail, map, measure_time, mecab, metricsense, mixi_community, mixpanel, mobile-carrier, mongo, mongo-typed, mongokpi, mqtt, msgpack-rpc, mssql, multiprocess, munin, mysql, mysql-binlog, mysql-bulk, mysql-load, mysql-prepared-statement, mysql-query, mysql-replicator, mysqlslowquery, mysqlslowquerylog, nats, network-probe, nginx-status, nicorepo, norikra, notifier, numeric-counter, numeric-monitor, onlineuser, openldap-monitor, opentsdb, order, out-http, out-http-buffered, out-solr, parser, pgdist, pghstore, pgjson, ping-message, postgres, qqwry, rambler, rawexec, rds-log, rds-slowlog, reassemble, record http://www.fluentd.org/plugins 14年6月26日木曜日
  • 28. Fluentd patterns 14年6月26日木曜日
  • 29. 1. read logs from file and write these on storages file in_tail read, parse out_file format, write file 14年6月26日木曜日
  • 30. 1. read logs from file and write these on storages file read, parse insert MongoDBout_mongo https://github.com/fluent/fluent-plugin-mongo in_tail 14年6月26日木曜日
  • 31. 1. read logs from file and write these on storages file read, parse out_mysql insert MySQL https://github.com/tagomoris/fluent-plugin-mysql in_tail 14年6月26日木曜日
  • 32. 1. read logs from file and write these on storages file read, parse out_elasticsearch send Elasticsearch https://github.com/uken/fluent-plugin-elasticsearch in_tail 14年6月26日木曜日
  • 33. 1. read logs from file and write these on storages file read, parse out_webhdfs format, write Hadoop HDFS https://github.com/fluent/fluent-plugin-webhdfs in_tail 14年6月26日木曜日
  • 34. 1. read logs from file and write these on storages file read, parse out_s3 format, write Amazon S3 https://github.com/fluent/fluent-plugin-s3 in_tail 14年6月26日木曜日
  • 35. 1. read logs from file and write these on storages file read, parse out_redshift insert Amazon Redshift https://github.com/hapyrus/fluent-plugin-redshift in_tail 14年6月26日木曜日
  • 36. 1. read logs from file and write these on storages file read, parse out_bigquery insert Google BigQuery https://github.com/tagomoris/fluent-plugin-bigquery in_tail 14年6月26日木曜日
  • 37. 2. receive and forward data from/to other node forward forward forward input events input events output events fluent-logger-ruby fluent-logger-java ... send events over TCP 14年6月26日木曜日
  • 38. 2. receive and forward data from/to other node forward forward forward load balance, active-standby forward forward forward 14年6月26日木曜日
  • 39. datacenter 2’. receive and forward data from/to other node, over internet & SSL secure-forward secure-forward datacenter secure-forward send events over SSL with authentication https://github.com/tagomoris/fluent-plugin-secure-forward 14年6月26日木曜日
  • 40. 3. connect with other middleware in_syslog syslog Flume Scribe Kafka in_flume in_scribe in_kafka out_flume in_scribe in_kafka Flume Scribe Kafka https://github.com/fluent/fluent-plugin-flume https://github.com/fluent/fluent-plugin-scribe https://github.com/htgc/fluent-plugin-kafka/ 14年6月26日木曜日
  • 41. 4. copy events forward copy forward webhdfs Hadoop HDFS 14年6月26日木曜日
  • 42. 5. count events by string values forward any outputs count records by regexp patterns events { “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ... } datacounter https://github.com/tagomoris/fluent-plugin-datacounter 14年6月26日木曜日
  • 43. 5. count events by numeric values forward numeric-counter any outputs count records by numerical range https://github.com/tagomoris/fluent-plugin-numeric-counter events { “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ... } 14年6月26日木曜日
  • 44. 5. aggregate numeric values forward numeric-monitor any outputs calculate real-time metrics of numeric values { “max”: 128, “min”: 16, “avg”: 64.0, “sum”: 1024, “num”: 20, “percentile_50”: 48, “percentile_90”: 112, ... } https://github.com/tagomoris/fluent-plugin-numeric-monitor events 14年6月26日木曜日
  • 45. 6. various inputs: Linux performance (dstat) in_dstatdstat collect server performance data https://github.com/shun0102/fluent-plugin-dstat 14年6月26日木曜日
  • 46. 6. various inputs: SQL execution in_sql input from SELECT RDBMS https://github.com/fluent/fluent-plugin-sql 14年6月26日木曜日
  • 47. 6. various inputs: external command in_execany commands input from STDOUT of any commands 14年6月26日木曜日
  • 48. 7. various outpus: notification on IRC out_ikachan notice on IRC channel IRC https://github.com/tagomoris/fluent-plugin-ikachan 14年6月26日木曜日
  • 49. 7. various outpus: notification on IRC out_ikachan notice on IRC channel IRC https://github.com/tagomoris/fluent-plugin-ikachan 14:56 ikachan: HTTP status_4xx crit [2014-06-23 14:56:29 +0900] serviceX: 100.00 (threshold 75.0) http://graph.tool.local/view_graph/accesslog/httpstatus/serviceX_4xx_percentage 14:57 kazeburo: ↑ 40x 100%... 14年6月26日木曜日
  • 50. 7. various outpus: notification on HipChat out_hipchat notice on HipChat HipChat https://github.com/hotchpotch/fluent-plugin-hipchat 14年6月26日木曜日
  • 51. 7. various outpus: graph tools out_growthforecast POST data into graph tools GrowthForecast or Focuslight https://github.com/tagomoris/fluent-plugin-growthforecast 14年6月26日木曜日
  • 52. 7. various outpus out_growthforecast POST data into graph tools GrowthForecast or Focuslight https://github.com/tagomoris/fluent-plugin-growthforecast 14年6月26日木曜日
  • 53. 7. various outpus: external command out_exec any commands output into STDIN of any commands 14年6月26日木曜日
  • 54. 8. filters: stream processing: external command any inputs any outputs format & write into STDIN exec_filter any commands read & parse from STDOUT read from STDIN do WHATEVER you want write into STDOUT ex: tail -f | grep ... | sed ... | cat events 14年6月26日木曜日
  • 55. 8. filters: stream processing w/ external server RPC any inputs any outputs send out_norikra fetch stream processing w/ SQL in_norikra http://norikra.github.io/ SELECT stage, score, COUNT(*) AS c FROM results.win:time_batch(1 min) WHERE stage > 1 AND user.valid GROUP BY stage, score events 14年6月26日木曜日
  • 56. ... And, Fluentd does error handling and retries for all of these plugins! 14年6月26日木曜日
  • 57. Before Fluentd: CHAOS Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem tail -f scp python ruby ruby cmd file ruby logger file logger ruby cmd ruby 14年6月26日木曜日
  • 58. After Fluentd: Controllable Access logs App logs System logs Various logs Metrics Analytics Archives apache nginx frontend backend syslogd snmp data graphs Hadoop MySQL MongoDB Redshift Amazon S3 Filesystem 14年6月26日木曜日
  • 59. Fluentd: Now and then 14年6月26日木曜日
  • 60. Fluentd versions Latest: v0.10.50 released on Jun 17, 2014 v0.10.x: Stable versions many minor feature updates, bug fixes new features for v1 14年6月26日木曜日
  • 61. Fluentd v1 Planned as the first major release someday in 2014 (?) 100% Compatible with v0.10.x New (and additional) features on v1.x loadmap https://github.com/fluent/fluentd/issues/251 new configuration syntax, plugin backends daemon process management multi core CPU supports 14年6月26日木曜日
  • 62. Fluentd on JRuby Under development! trying to fix Cool.io to support JRuby 14年6月26日木曜日
  • 63. Fluentd on Windows Under development! “windows” branch on github fluent/fluend 14年6月26日木曜日
  • 64. Use case in LINE 14年6月26日木曜日
  • 65. Analytics data flow overview servers Fluentd Cluster archive visualization notifications Hadoop Fluentd Norikra application metrics 14年6月26日木曜日
  • 66. servers Fluentd Cluster archive visualization notifications Hadoop Fluentd Norikra application metrics delivery/stream-map aggregate/stream-reduce 14年6月26日木曜日
  • 67. archive visualization notifications Hadoop Norikra application metrics fluent-agent-lite non-parsed raw logs non-parsed access logs deliver: receive/archive/load-balance worker: parse/store/forward watcher: monitor/notify cep: general-purpose stream processing 14年6月26日木曜日
  • 68. Fluentd cluster statistics Fluentd nodes access/application logs from 600+ nodes receiver: 5 server (60 process) parser/converter: 10 server (90 process) stream processing: 3 server 14年6月26日木曜日
  • 69. Fluentd cluster statistics Daily: 5.5+ Billion events, 1.5TB+ data Peak time: 150,000+ events /sec, 300+ Mbps 14年6月26日木曜日
  • 70. Fluentd is the best partner for stream-processing newbies and rubyists! Check out sites and code! http://fluentd.org/ https://github.com/fluent/fluentd 14年6月26日木曜日
  • 71. FAQ 14年6月26日木曜日
  • 72. Fault-tolerance? Node level fault-tolerance File buffer: processing data can be serialized on disk Cluster level fault-tolerance Copy + Forward(load balance, active-standby) Event level assurance: ACK? NO (for performance reason) 14年6月26日木曜日
  • 73. Performance? NOT SO BAD: real throughput depends on plugin/configuration simple event transferring: 10-20k events/sec 14年6月26日木曜日
  • 74. vs Scribe? vs Flume? 14年6月26日木曜日
  • 75. vs Storm? 14年6月26日木曜日
  • 76. Eco-system? Clones? ik fluent-agent-lite fluenpy 14年6月26日木曜日