Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

on

  • 22,023 views

The presentation given at MongoSV User Group > http://www.meetup.com/MongoDB-SV-User-Group/events/72760092/

The presentation given at MongoSV User Group > http://www.meetup.com/MongoDB-SV-User-Group/events/72760092/

Statistics

Views

Total Views
22,023
Views on SlideShare
9,711
Embed Views
12,312

Actions

Likes
38
Downloads
94
Comments
1

23 Embeds 12,312

http://docs.fluentd.org 9020
http://blog.treasure-data.com 1775
http://judydba.tistory.com 565
https://twitter.com 348
http://dev.gluesys.com 340
http://css.dzone.com 117
http://rg443blog.wordpress.com 54
http://localhost 22
http://us-w1.rockmelt.com 18
http://127.0.0.1 17
http://tweetedtimes.com 11
http://www.hanrss.com 4
http://twitter.com 4
http://www.tumblr.com 3
http://www.dzone.com 3
http://192.168.0.100 3
http://www.weebly.com 2
http://snews.rage2.yandex.ru 1
http://translate.googleusercontent.com 1
http://54.64.11.67 1
http://hubot-clb-2081983768.ap-northeast-1.elb.amazonaws.com 1
http://twimblr.appspot.com 1
http://54.199.180.60 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • my roomate's aunt makes $83/hr on the laptop. She has been without work for 8 months but last month her pay was $8682 just working on the laptop for a few hours. Read more on this site...N u t t ÿ R î ç h D Ö t co m
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012 Presentation Transcript

  • 1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc.Tuesday, July 17, 2012
  • 2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2Tuesday, July 17, 2012
  • 3. Logging? Why?Tuesday, July 17, 2012
  • 4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4Tuesday, July 17, 2012
  • 5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5Tuesday, July 17, 2012
  • 6. From “Scaling Lessons learned at Dropbox” 6Tuesday, July 17, 2012
  • 7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6Tuesday, July 17, 2012
  • 8. About FluentdTuesday, July 17, 2012
  • 9. Its like syslogd, but uses JSON for log messages 8Tuesday, July 17, 2012
  • 10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9Tuesday, July 17, 2012
  • 11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10Tuesday, July 17, 2012
  • 12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10Tuesday, July 17, 2012
  • 13. http://fluentd.org/ 11Tuesday, July 17, 2012
  • 14. • Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12Tuesday, July 17, 2012
  • 15. Fluentd ArchitectureTuesday, July 17, 2012
  • 16. Fluentd: Log Format Application Fluentd Storage 14Tuesday, July 17, 2012
  • 17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14Tuesday, July 17, 2012
  • 18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14Tuesday, July 17, 2012
  • 19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15Tuesday, July 17, 2012
  • 20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15Tuesday, July 17, 2012
  • 21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16Tuesday, July 17, 2012
  • 22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16Tuesday, July 17, 2012
  • 23. • Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17Tuesday, July 17, 2012
  • 24. • Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17Tuesday, July 17, 2012
  • 25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18Tuesday, July 17, 2012
  • 26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18Tuesday, July 17, 2012
  • 27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19Tuesday, July 17, 2012
  • 28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19Tuesday, July 17, 2012
  • 29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20Tuesday, July 17, 2012
  • 30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match>Tuesday, July 17, 2012
  • 31. ComparisonTuesday, July 17, 2012
  • 32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23Tuesday, July 17, 2012
  • 33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24Tuesday, July 17, 2012
  • 34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25Tuesday, July 17, 2012
  • 35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26Tuesday, July 17, 2012
  • 36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27Tuesday, July 17, 2012
  • 37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28Tuesday, July 17, 2012
  • 38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29Tuesday, July 17, 2012
  • 39. Fluentd Plugin forTuesday, July 17, 2012
  • 40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31Tuesday, July 17, 2012
  • 41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32Tuesday, July 17, 2012
  • 42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32Tuesday, July 17, 2012
  • 43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33Tuesday, July 17, 2012
  • 44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33Tuesday, July 17, 2012
  • 45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34Tuesday, July 17, 2012
  • 46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35Tuesday, July 17, 2012
  • 47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36Tuesday, July 17, 2012
  • 48. Exercise: Apache Logs into MongoDBTuesday, July 17, 2012
  • 49. Log File 38Tuesday, July 17, 2012
  • 50. 39Tuesday, July 17, 2012
  • 51. 40Tuesday, July 17, 2012
  • 52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41Tuesday, July 17, 2012