Fluentd ♥ MongoDB                          Log Everything As JSON                 Kazuki Ohta, CTO at Treasure Data, Inc.T...
Self-Introduction           •       Kazuki Ohta                   >     twitter: @kzk_mover                   >     github...
Logging? Why?Tuesday, July 17, 2012
Figure 1: Common Logging Purposes                                                  Analytics                              ...
Figure 2: Types of Logs                                           App Log                                           Access...
From “Scaling Lessons learned at Dropbox”                                                            6Tuesday, July 17, 2012
Fragile for format change,                         No type information,                         No field name, etc.        ...
About FluentdTuesday, July 17, 2012
Its like syslogd, but uses JSON for log                 messages                                                          ...
Logs in JSON? Why?                     1. Machine-Readable                     > machine is goint to be a main consumer of...
Logs As TEXT   Logs As JSON                         + Field Name                         + No Custom Parser               ...
Logs As TEXT         “2011-04-01 host1 myapp: cmessage size=12MB user=me”   Logs As JSON                         2011-04-0...
http://fluentd.org/                                              11Tuesday, July 17, 2012
•       Website                   >     http://fluentd.org/           •       Community                   >     http://gith...
Fluentd ArchitectureTuesday, July 17, 2012
Fluentd: Log Format                         Application                          Fluentd                          Storage ...
Fluentd: Log Format                         Application                                       2012-02-04 01:33:51         ...
Fluentd: Log Format                                                       time                         Application        ...
Fluentd: Plugins                             Application                                           filter / buffer /       ...
Fluentd: Plugins                                       Application                                                     filt...
Fluentd: Plugins                                       Application                                                     filt...
Fluentd: Plugins            syslogd         Scribe     Application          File Plug-in                                  ...
•       Client libraries                   > Ruby                   > Perl             Application         Buffering      ...
•       Client libraries                   > Ruby                   > Perl               Application         Buffering    ...
Typical Log Collection by `rsync`               Burst of traffic               rsync consumes               all bandwidth ...
Typical Log Collection by `rsync`                     App server              App server              App server          ...
Log Collection using Fluentd                         Fluentd        Fluentd          Fluentd                              ...
Log Collection using Fluentd                         Fluentd        Fluentd          Fluentd                              ...
Fluentd Case Study               Ruby on Rails              Ruby on Rails          Ruby on Rails                         F...
# read logs from a file         # forward other logs to servers      <source>                        # (load-balancing + f...
ComparisonTuesday, July 17, 2012
Scribe: log collector by                               Facebook                         Frontend servers                  ...
Scribe’s Pros & Cons                • Pros.                         • Fast (written in C++)                • Cons.        ...
Fluentd vs Scribe                • Easy to install                         • “gem install fluentd”                        ...
Flume: distributed log collector by Cloudera           Phisical                                 Flume Master          Topo...
Flume’s Pros & Cons                • Pros.                         • Central master server manages all nodes              ...
Fluentd vs Flume                 • Easy to understand                         • “syslogd that understands JSON”           ...
Fluentd           Scribe           Flume          Installation          gem/rpm/deb          make          jar/rpm/deb    ...
Fluentd Plugin forTuesday, July 17, 2012
fluent-plugin-mongo                • Included within rpm/deb by default!                         • http://github.com/fluen...
• MongoDB Output Plugin                     Application                           • Maintain JSON Structure               ...
• MongoDB Output Plugin                     Application                           • Maintain JSON Structure               ...
ReplicaSet                                          (Capped Collection)             Single Instance           (Capped Coll...
ReplicaSet                                          (Capped Collection)             Single Instance           (Capped Coll...
Realtime Analytics with Fluentd + MongoDB                          App                    App                 App         ...
Realtime or Batch? No, BOTH!                          App                          App                 App                ...
Intro of our company’s service: Treasure Data                          App                    App                    App  ...
Exercise: Apache Logs into MongoDBTuesday, July 17, 2012
Log File                                    38Tuesday, July 17, 2012
39Tuesday, July 17, 2012
40Tuesday, July 17, 2012
Conclusion                • Log Everything as JSON                         • Machine Readability                         •...
Upcoming SlideShare
Loading in...5
×

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

22,714

Published on

The presentation given at MongoSV User Group > http://www.meetup.com/MongoDB-SV-User-Group/events/72760092/

Published in: Technology, Education
1 Comment
40 Likes
Statistics
Notes
  • my roomate's aunt makes $83/hr on the laptop. She has been without work for 8 months but last month her pay was $8682 just working on the laptop for a few hours. Read more on this site...N u t t ÿ R î ç h D Ö t co m
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
22,714
On Slideshare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
110
Comments
1
Likes
40
Embeds 0
No embeds

No notes for slide

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

  1. 1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc.Tuesday, July 17, 2012
  2. 2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2Tuesday, July 17, 2012
  3. 3. Logging? Why?Tuesday, July 17, 2012
  4. 4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4Tuesday, July 17, 2012
  5. 5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5Tuesday, July 17, 2012
  6. 6. From “Scaling Lessons learned at Dropbox” 6Tuesday, July 17, 2012
  7. 7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6Tuesday, July 17, 2012
  8. 8. About FluentdTuesday, July 17, 2012
  9. 9. Its like syslogd, but uses JSON for log messages 8Tuesday, July 17, 2012
  10. 10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9Tuesday, July 17, 2012
  11. 11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10Tuesday, July 17, 2012
  12. 12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10Tuesday, July 17, 2012
  13. 13. http://fluentd.org/ 11Tuesday, July 17, 2012
  14. 14. • Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12Tuesday, July 17, 2012
  15. 15. Fluentd ArchitectureTuesday, July 17, 2012
  16. 16. Fluentd: Log Format Application Fluentd Storage 14Tuesday, July 17, 2012
  17. 17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14Tuesday, July 17, 2012
  18. 18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14Tuesday, July 17, 2012
  19. 19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15Tuesday, July 17, 2012
  20. 20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15Tuesday, July 17, 2012
  21. 21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16Tuesday, July 17, 2012
  22. 22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16Tuesday, July 17, 2012
  23. 23. • Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17Tuesday, July 17, 2012
  24. 24. • Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17Tuesday, July 17, 2012
  25. 25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18Tuesday, July 17, 2012
  26. 26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18Tuesday, July 17, 2012
  27. 27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19Tuesday, July 17, 2012
  28. 28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19Tuesday, July 17, 2012
  29. 29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20Tuesday, July 17, 2012
  30. 30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match>Tuesday, July 17, 2012
  31. 31. ComparisonTuesday, July 17, 2012
  32. 32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23Tuesday, July 17, 2012
  33. 33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24Tuesday, July 17, 2012
  34. 34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25Tuesday, July 17, 2012
  35. 35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26Tuesday, July 17, 2012
  36. 36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27Tuesday, July 17, 2012
  37. 37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28Tuesday, July 17, 2012
  38. 38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29Tuesday, July 17, 2012
  39. 39. Fluentd Plugin forTuesday, July 17, 2012
  40. 40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31Tuesday, July 17, 2012
  41. 41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32Tuesday, July 17, 2012
  42. 42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32Tuesday, July 17, 2012
  43. 43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33Tuesday, July 17, 2012
  44. 44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33Tuesday, July 17, 2012
  45. 45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34Tuesday, July 17, 2012
  46. 46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35Tuesday, July 17, 2012
  47. 47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36Tuesday, July 17, 2012
  48. 48. Exercise: Apache Logs into MongoDBTuesday, July 17, 2012
  49. 49. Log File 38Tuesday, July 17, 2012
  50. 50. 39Tuesday, July 17, 2012
  51. 51. 40Tuesday, July 17, 2012
  52. 52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41Tuesday, July 17, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×