Advertisement
Advertisement

More Related Content

Advertisement

More from Treasure Data, Inc.(20)

Advertisement

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

  1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc. Tuesday, July 17, 2012
  2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2 Tuesday, July 17, 2012
  3. Logging? Why? Tuesday, July 17, 2012
  4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4 Tuesday, July 17, 2012
  5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5 Tuesday, July 17, 2012
  6. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  8. About Fluentd Tuesday, July 17, 2012
  9. It's like syslogd, but uses JSON for log messages 8 Tuesday, July 17, 2012
  10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9 Tuesday, July 17, 2012
  11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10 Tuesday, July 17, 2012
  12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10 Tuesday, July 17, 2012
  13. http://fluentd.org/ 11 Tuesday, July 17, 2012
  14. Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12 Tuesday, July 17, 2012
  15. Fluentd Architecture Tuesday, July 17, 2012
  16. Fluentd: Log Format Application Fluentd Storage 14 Tuesday, July 17, 2012
  17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14 Tuesday, July 17, 2012
  18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14 Tuesday, July 17, 2012
  19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15 Tuesday, July 17, 2012
  20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15 Tuesday, July 17, 2012
  21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  23. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17 Tuesday, July 17, 2012
  24. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17 Tuesday, July 17, 2012
  25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18 Tuesday, July 17, 2012
  26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18 Tuesday, July 17, 2012
  27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19 Tuesday, July 17, 2012
  28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19 Tuesday, July 17, 2012
  29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20 Tuesday, July 17, 2012
  30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match> Tuesday, July 17, 2012
  31. Comparison Tuesday, July 17, 2012
  32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23 Tuesday, July 17, 2012
  33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24 Tuesday, July 17, 2012
  34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25 Tuesday, July 17, 2012
  35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26 Tuesday, July 17, 2012
  36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27 Tuesday, July 17, 2012
  37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28 Tuesday, July 17, 2012
  38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29 Tuesday, July 17, 2012
  39. Fluentd Plugin for Tuesday, July 17, 2012
  40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31 Tuesday, July 17, 2012
  41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34 Tuesday, July 17, 2012
  46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35 Tuesday, July 17, 2012
  47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36 Tuesday, July 17, 2012
  48. Exercise: Apache Logs into MongoDB Tuesday, July 17, 2012
  49. Log File 38 Tuesday, July 17, 2012
  50. 39 Tuesday, July 17, 2012
  51. 40 Tuesday, July 17, 2012
  52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41 Tuesday, July 17, 2012
Advertisement