Fluentd
                              Structured logging

                              Pluggable architecture

                              Reliable forwarding
The Event Collector Service



Sadayuki Furuhashi
Treasure Data, Inc.
@frsyuki
Fluentd in brief



It's like syslogd, but uses JSON for log messages
Fluentd :: format of logs

            Application

                          2012-02-04 01:33:51
                          myapp.buylog {
             Fluentd
                              “user”: ”me”,
                              “path”: “/buyItem”,
                              “price”: 150,
                              “referer”: “/landing”
             Storage      }
Fluentd :: format of logs
                                                time
            Application                           tag
                          2012-02-04 01:33:51
                          myapp.buylog {
             Fluentd
                              “user”: ”me”,
                              “path”: “/buyItem”,
                              “price”: 150,
                              “referer”: “/landing”
             Storage      }                   record
Fluentd :: plugins

             Application



              Fluentd      filter / buffer / routing




    SaaS      Storage              Fluentd

   Plug-in   Plug-in             Plug-in
Fluentd :: plugins

 syslogd     Scribe   Application          File   Plug-in
                                    tail
Plug-in Plug-in
                       Fluentd       filter / buffer / routing




           SaaS        Storage                    Fluentd

       Plug-in        Plug-in                 Plug-in
Fluentd :: client libraries
•   Client libraries
    > Ruby
    > Perl                     Application
    > PHP
    > Python
    > Java                       Fluentd
    > ...


Fluent.open(“myapp”)
Fluent.event(“login”, {“user”=>38})
#=> 2012-02-04 04:56:01 myapp.login   {“user”:38}
Typical architecture before Fluentd

   App server                  App server              App server

   Application                 Application            Application


 File File File ...          File File File ...     File File File ...


                      File
                                                  High latency
                                                  must wait for a day

                                Log server        Hard to analyze
                                                  complex text parsers
Architecture after Fluentd
 App server        App server         App server

 Application       Application        Application


  Fluentd            Fluentd           Fluentd



                                    Realtime!
               Fluentd    Fluentd
Architecture after Fluentd
  Fluentd          Fluentd                 Fluentd



                                        Realtime!
             Fluentd    Fluentd




  Hadoop                     Amazon       Ready to
            MongoDB
   / Hive                    S3 / EMR     Analyze!
Case study
  Ruby on Rails          Ruby on Rails          Ruby on Rails


     Fluentd                  Fluentd             Fluentd


✓ 127 RoR servers
✓ 70,000 msgs/sec    Fluentd       Fluentd   routing
✓ 120Mbps at peak
✓ 650GB/day

                    Hadoop                     User behavior
         PV logs                    MongoDB    logs
                     / Hive
# read logs from a file         # forward other logs to servers
<source>                        # (load-balancing + fail-over)
  type tail                     <match **>
  path /var/log/httpd.log         type forward
  format apache                   <server>
  tag apache.access                 host 192.168.0.11
</source>                           weight 20
                                  </server>
# save access logs to MongoDB     <server>
<match apache.access>               host 192.168.0.12
  type mongo                        weight 60
  host 127.0.0.1                  </server>
</match>                        </match>
Scribe
         Scribe: log collector by Facebook

   Frontend servers

                      Aggregator nodes
       scribe
                          scribe
       scribe
                                         Hadoop
                                          HDFS
       scribe
                          scribe
       scribe
Scribe’s Pros & Cons
•   Pros.
    >   Fast (C++)

•   Cons.
    >   VERY hard to install
    >   Deals with unstructured logs
          you must parse logs before analyzing them
    >   Hard to extend
          you must re-compile C++ programs
    >   No longer maintained?
Fluentd vs Scribe
•   Easy to install
    >   “gem install fluentd”
    >   stable RPM and DEB packages
          http://packages.treasure-data.com/

•   Easy to write plugins
    >   you can use Ruby

•   Easy to distribute plugins
    >   “gem search -rd fluent-plugin”
Flume
Flume: distributed log collector by Cloudera

 Phisical            Flume Master
Topology

             Flume      Flume       Flume



 Logical
Topology                                       Hadoop
                                                HDFS
Flume’s Pros & Cons
•   Pros.
    >   Central master server manages all nodes

•   Cons.
    >   Difficult to understand
          logical topologies, phisical servers and a configuration of
          the logical/phisical mapping
    >   Dificult to configure
          replicated master servers, log servers and agents
    >   Big footprint
          50,000 lines of Java codes
Fluentd vs Flume
•   Easy to understand
    >   “syslogd that understands JSON”

•   Easy to setup
    >   “sudo fluentd --setup && fluentd”

•   Very small footprint
    >   small engine (3,000 lines) + plugins

•   Easy to configure
Fluentd vs Scribe/Flume
                       Fluentd           Scribe           Flume
Installation          gem/rpm/deb          make           rpm/deb

                       3000 lines of    8000 lines of   50,000 lines of
Footprint                 Ruby             C++              Java

Plugin                    Ruby              N/A             Java

Plugin distribution   RubyGems.org          N/A              N/A

Master Server              No               No               Yes

License               Apache License   Apache License   Apache License
Fluentd
•   Documents
    >   http://fluentd.org

•   Source code
    >   http://github.com/fluent
    >   14 committers across
        many organizations

•   Mailing list
    >   Google groups
•   Sadayuki Furuhashi
    >   twitter: @frsyuki

•   Treasure Data, Inc.
    >   Software Engineer; founder

•   Author of MessagePack

•   Author of Fluentd

Fluentd meetup

  • 1.
    Fluentd Structured logging Pluggable architecture Reliable forwarding The Event Collector Service Sadayuki Furuhashi Treasure Data, Inc. @frsyuki
  • 2.
    Fluentd in brief It'slike syslogd, but uses JSON for log messages
  • 3.
    Fluentd :: formatof logs Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage }
  • 4.
    Fluentd :: formatof logs time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record
  • 5.
    Fluentd :: plugins Application Fluentd filter / buffer / routing SaaS Storage Fluentd Plug-in Plug-in Plug-in
  • 6.
    Fluentd :: plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in Fluentd filter / buffer / routing SaaS Storage Fluentd Plug-in Plug-in Plug-in
  • 7.
    Fluentd :: clientlibraries • Client libraries > Ruby > Perl Application > PHP > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38}
  • 8.
    Typical architecture beforeFluentd App server App server App server Application Application Application File File File ... File File File ... File File File ... File High latency must wait for a day Log server Hard to analyze complex text parsers
  • 9.
    Architecture after Fluentd App server App server App server Application Application Application Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd
  • 10.
    Architecture after Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Hadoop Amazon Ready to MongoDB / Hive S3 / EMR Analyze!
  • 11.
    Case study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 70,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 650GB/day Hadoop User behavior PV logs MongoDB logs / Hive
  • 12.
    # read logsfrom a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match>
  • 13.
    Scribe Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe
  • 14.
    Scribe’s Pros &Cons • Pros. > Fast (C++) • Cons. > VERY hard to install > Deals with unstructured logs you must parse logs before analyzing them > Hard to extend you must re-compile C++ programs > No longer maintained?
  • 15.
    Fluentd vs Scribe • Easy to install > “gem install fluentd” > stable RPM and DEB packages http://packages.treasure-data.com/ • Easy to write plugins > you can use Ruby • Easy to distribute plugins > “gem search -rd fluent-plugin”
  • 16.
    Flume Flume: distributed logcollector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Topology Hadoop HDFS
  • 17.
    Flume’s Pros &Cons • Pros. > Central master server manages all nodes • Cons. > Difficult to understand logical topologies, phisical servers and a configuration of the logical/phisical mapping > Dificult to configure replicated master servers, log servers and agents > Big footprint 50,000 lines of Java codes
  • 18.
    Fluentd vs Flume • Easy to understand > “syslogd that understands JSON” • Easy to setup > “sudo fluentd --setup && fluentd” • Very small footprint > small engine (3,000 lines) + plugins • Easy to configure
  • 19.
    Fluentd vs Scribe/Flume Fluentd Scribe Flume Installation gem/rpm/deb make rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License
  • 20.
    Fluentd • Documents > http://fluentd.org • Source code > http://github.com/fluent > 14 committers across many organizations • Mailing list > Google groups
  • 21.
    Sadayuki Furuhashi > twitter: @frsyuki • Treasure Data, Inc. > Software Engineer; founder • Author of MessagePack • Author of Fluentd