SlideShare a Scribd company logo
1 of 52
Download to read offline
Fluentd ♥ MongoDB
                          Log Everything As JSON


                 Kazuki Ohta, CTO at Treasure Data, Inc.




Tuesday, July 17, 2012
Self-Introduction
           •       Kazuki Ohta
                   >     twitter: @kzk_mover
                   >     github: kzk

           •       Treasure Data, Inc.
                   >     Chief Technology Officer; Founder
                   >     Original Fluentd Author @frsyuki is another co-founder.

           •       Open-Source Enthusiast
                   >     KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc.
                   >     Fluentd rpm/deb package manager
                                                                              2
Tuesday, July 17, 2012
Logging? Why?




Tuesday, July 17, 2012
Figure 1: Common Logging Purposes




                                                  Analytics

                                                  Error Notification

                                                  Recommendation


                                                                   4
Tuesday, July 17, 2012
Figure 2: Types of Logs




                                           App Log

                                           Access Log
                                           (Apache, Rails, etc.)
                                           System Log
                                           (syslog etc.)
                                           Others
                                                                   5
Tuesday, July 17, 2012
From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
Fragile for format change,
                         No type information,
                         No field name, etc.


                         From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
About Fluentd




Tuesday, July 17, 2012
It's like syslogd, but uses JSON for log
                 messages


                                                            8
Tuesday, July 17, 2012
Logs in JSON? Why?

                     1. Machine-Readable
                     > machine is goint to be a main consumer of logs


                     2. Schema-Free
                     > you want to add/remove fields from logs at anytime



    Write Logs for Machines, use JSON
    http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
                                                                            9
Tuesday, July 17, 2012
Logs As TEXT


   Logs As JSON



                         + Field Name
                         + No Custom Parser
                         + Type Information
                         + Schema Free

                                       10
Tuesday, July 17, 2012
Logs As TEXT
         “2011-04-01 host1 myapp: cmessage size=12MB user=me”


   Logs As JSON
                         2011-04-01 myapp.message {
                             “on_host”: ”host1”,
                             ”combined”: true,
                             “size”: 12000000,     + Field Name
                             “user”: “me”          + No Custom Parser
                                                   + Type Information
                         }                         + Schema Free

                                                                 10
Tuesday, July 17, 2012
http://fluentd.org/




                                              11
Tuesday, July 17, 2012
•       Website
                   >     http://fluentd.org/

           •       Community
                   >     http://github.com/fluent
                   >     16 committers across
                         many organizations
                   >     web, game, enterprise

           •       Mailing list
                   >     Google groups

                                                   12
Tuesday, July 17, 2012
Fluentd Architecture




Tuesday, July 17, 2012
Fluentd: Log Format

                         Application



                          Fluentd




                          Storage


                                                14
Tuesday, July 17, 2012
Fluentd: Log Format

                         Application

                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }


                                                                   14
Tuesday, July 17, 2012
Fluentd: Log Format

                                                       time
                         Application                    tag
                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }
                                                    record

                                                                   14
Tuesday, July 17, 2012
Fluentd: Plugins

                             Application



                                           filter / buffer /
                              Fluentd
                                           routing




                              Storage


                                                              15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        16
Tuesday, July 17, 2012
Fluentd: Plugins

            syslogd         Scribe     Application          File Plug-in

                                                     tail
           Plug-in Plug-in
                                                      filter / buffer /
                                        Fluentd
                                                      routing




                          SaaS          Storage                Fluentd

                         Plug-in        Plug-in               Plug-in
                                                                           16
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl             Application         Buffering

                   > PHP
                                            HTTP / TCP / UDS
                   > Python
                   > Java              Fluentd
                   > ...




                                                                17
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl               Application         Buffering

                   > PHP
                                              HTTP / TCP / UDS
                   > Python
                   > Java                Fluentd
                   > ...


            Fluent.open(“myapp”)
            Fluent.event(“login”, {“user”=>38})
            #=> 2012-02-04 04:56:01 myapp.login    {“user”:38}

                                                                  17
Tuesday, July 17, 2012
Typical Log Collection by `rsync`




               Burst of traffic
               rsync consumes
               all bandwidth



                                                             18
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
                     App server              App server              App server

                   Application              Application            Application


               File File File ...          File File File ...     File File File ...


                                    File
               Burst of traffic                                 High latency
               rsync consumes                                   must wait for a day
               all bandwidth                 Log server         Hard to analyze
                                                                complex text parsers

                                                                                  18
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd




                                                                   19
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd



                                              Amazon     Ready to
                         Hadoop    Mongo
                                               S3 /
                          / Hive    DB
                                               EMR       Analyze!

                                                                    19
Tuesday, July 17, 2012
Fluentd Case Study
               Ruby on Rails              Ruby on Rails          Ruby on Rails


                         Fluentd              Fluentd               Fluentd




      ✓    127 RoR servers
      ✓    100,000 msgs/sec             Fluentd    Fluentd      routing
      ✓    120Mbps at peak
      ✓    1TB/day

                                      Hadoop            Mongo     User behavior
                           PV logs     / Hive            DB       logs

                                                                                 20
Tuesday, July 17, 2012
# read logs from a file         # forward other logs to servers
      <source>                        # (load-balancing + fail-over)
        type tail                     <match **>
        path /var/log/httpd.log         type forward
        format apache                   <server>
        tag apache.access                 host 192.168.0.11
      </source>                           weight 20
                                        </server>
      # save access logs to MongoDB     <server>
      <match apache.access>               host 192.168.0.12
        type mongo                        weight 60
        host 127.0.0.1                  </server>
      </match>                        </match>




Tuesday, July 17, 2012
Comparison




Tuesday, July 17, 2012
Scribe: log collector by
                               Facebook
                         Frontend servers

                                            Aggregator nodes
                             scribe
                                                scribe
                             scribe
                                                               Hadoop
                                                                HDFS
                             scribe
                                                scribe
                             scribe

                                                                        23
Tuesday, July 17, 2012
Scribe’s Pros & Cons
                • Pros.
                         • Fast (written in C++)
                • Cons.
                         • VERY HARD to install
                            • nightmare of boost, thrift, libhdfs, etc.
                         • Unstructured Logs
                            • parsing must be required before the analysis
                         • Hard to extend
                            • recompiling C++ programs are required
                         • No longer maintained

                                                                             24
Tuesday, July 17, 2012
Fluentd vs Scribe
                • Easy to install
                         • “gem install fluentd”
                         • Stable RPM and Deb packages
                           • http://packages.treasure-data.com/
                • Easy to write plugins
                         • you can use Ruby
                • Easy plugin distribution
                         • “gem search -rd fluent-plugin”


                                                                  25
Tuesday, July 17, 2012
Flume: distributed log collector by Cloudera

           Phisical
                                 Flume Master
          Topology

                         Flume      Flume       Flume




           Logical                                      Hadoop
          Topology                                       HDFS


                                                             26
Tuesday, July 17, 2012
Flume’s Pros & Cons
                • Pros.
                         • Central master server manages all nodes
                • Cons.
                         • Difficult to understand
                            • logical topologies, phisical servers and a
                              configuration of the logical/phisical mapping
                         • Difficult to configure
                            • replicated master servers, log servers and agents
                         • Big footprint
                            • 50,000 lines of Java

                                                                                  27
Tuesday, July 17, 2012
Fluentd vs Flume
                 • Easy to understand
                         • “syslogd that understands JSON”
                 • Easy to setup
                         • “sudo fluentd --setup && fluentd”
                 • Very small footprint
                         • small engine (3,000) lines + plugins
                         • small, but battle-tested!
                 • Easy to configure


                                                                  28
Tuesday, July 17, 2012
Fluentd           Scribe           Flume
          Installation          gem/rpm/deb          make          jar/rpm/deb

                                 3000 lines of    8000 lines of   50,000 lines of
          Footprint                 Ruby             C++              Java

          Plugin                    Ruby              N/A             Java

          Plugin distribution   RubyGems.org          N/A              N/A

          Master Server              No               No               Yes

          License               Apache License   Apache License   Apache License


                                                                                 29
Tuesday, July 17, 2012
Fluentd Plugin for




Tuesday, July 17, 2012
fluent-plugin-mongo
                • Included within rpm/deb by default!
                         • http://github.com/fluent/fluent-plugin-mongo
                • #1 plugin among 50+ Fluentd plugins
                         • Logs As JSON. WHY NOT Put Them Into Mongo??
                         • http://fluentd.org/plugin/
                • Supports most of the MongoDB features
                         • Authentication
                         • ReplicaSet
                         • Capped Collection

                                                                          31
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
Realtime Analytics with Fluentd + MongoDB

                          App                    App                 App


                         Fluentd             Fluentd                Fluentd




                             routing   Fluentd         Fluentd


          Nagios, Zabbix, etc.
                                            Mongo          query
                                                                   Charting
                         Alert               DB
                                                                              34
Tuesday, July 17, 2012
Realtime or Batch? No, BOTH!

                          App                          App                 App


                         Fluentd                   Fluentd                Fluentd




                             routing         Fluentd         Fluentd




        Hadoop                     Amazon         Mongo          query
                                                                         Charting
         / Hive                      S3            DB
             batch                 archive         realtime                         35
Tuesday, July 17, 2012
Intro of our company’s service: Treasure Data

                          App                    App                    App


                         Fluentd             Fluentd                   Fluentd




                             routing   Fluentd         Fluentd




      Treasure                              Mongo                Hadoop-based
        Data                                 DB                  Cloud Data Warehouse
             batch                           realtime                            36
Tuesday, July 17, 2012
Exercise: Apache Logs into MongoDB




Tuesday, July 17, 2012
Log File




                                    38
Tuesday, July 17, 2012
39
Tuesday, July 17, 2012
40
Tuesday, July 17, 2012
Conclusion
                • Log Everything as JSON
                         • Machine Readability
                         • Schema Freeness
                • MongoDB fits into Fluentd’s backend perfectly
                         • Both using JSON representation




                                                                  41
Tuesday, July 17, 2012

More Related Content

What's hot

Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David AndersonVerverica
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정PgDay.Seoul
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Webinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanWebinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanGabriele Bartolini
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best PracticesMatillion
 
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in Production
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in ProductionKong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in Production
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in ProductionFIWARE
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data SolutionsGuido Schmutz
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...Andrew Lamb
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Amazon Web Services
 

What's hot (20)

Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Webinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanWebinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with Barman
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in Production
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in ProductionKong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in Production
Kong, Keyrock, Keycloak, i4Trust - Options to Secure FIWARE in Production
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
2022-06-23 Apache Arrow and DataFusion_ Changing the Game for implementing Da...
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 

Similar to Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Fluentd: the missing log collector
Fluentd: the missing log collectorFluentd: the missing log collector
Fluentd: the missing log collectortd_kiyoto
 
Symfony2 and MongoDB
Symfony2 and MongoDBSymfony2 and MongoDB
Symfony2 and MongoDBPablo Godel
 
Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Pablo Godel
 
oEmbed in Drupal
oEmbed in DrupaloEmbed in Drupal
oEmbed in DrupalPure Sign
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBNicola Iarocci
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Swiss Big Data User Group
 
Who Pulls the Strings?
Who Pulls the Strings?Who Pulls the Strings?
Who Pulls the Strings?Ronny Trommer
 
Multilingual solutions florian loretan
Multilingual solutions florian loretanMultilingual solutions florian loretan
Multilingual solutions florian loretandrupalconf
 
You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODMateja Verlic
 
Presentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaPresentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaSyahman Mohamad
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.jsRichard Rodger
 
Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Wooga
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionGudmundur Thorisson
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was builtEuropean Data Forum
 
Osm techniques and developemnt
Osm techniques and developemntOsm techniques and developemnt
Osm techniques and developemntDongpo Deng
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupScott Hernandez
 
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Wooga
 

Similar to Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012 (20)

Fluentd: the missing log collector
Fluentd: the missing log collectorFluentd: the missing log collector
Fluentd: the missing log collector
 
Symfony2 and MongoDB
Symfony2 and MongoDBSymfony2 and MongoDB
Symfony2 and MongoDB
 
Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012Symfony2 y MongoDB - deSymfony 2012
Symfony2 y MongoDB - deSymfony 2012
 
oEmbed in Drupal
oEmbed in DrupaloEmbed in Drupal
oEmbed in Drupal
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDB
 
The Heron Mapping Client
The Heron Mapping ClientThe Heron Mapping Client
The Heron Mapping Client
 
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
Large Scale Log Analysis with HBase and Solr at Amadeus (Martin Alig, ETH Zur...
 
Who Pulls the Strings?
Who Pulls the Strings?Who Pulls the Strings?
Who Pulls the Strings?
 
Multilingual solutions florian loretan
Multilingual solutions florian loretanMultilingual solutions florian loretan
Multilingual solutions florian loretan
 
Jenkins Evolutions
Jenkins EvolutionsJenkins Evolutions
Jenkins Evolutions
 
You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LOD
 
Presentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysiaPresentation mongodb public sector dbsig malaysia
Presentation mongodb public sector dbsig malaysia
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.js
 
Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)Games for the Masses (QCon London 2012)
Games for the Masses (QCon London 2012)
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout session
 
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
EDF2012   Chris Taggart - How the biggest Open Database of Companies was builtEDF2012   Chris Taggart - How the biggest Open Database of Companies was built
EDF2012 Chris Taggart - How the biggest Open Database of Companies was built
 
Osm techniques and developemnt
Osm techniques and developemntOsm techniques and developemnt
Osm techniques and developemnt
 
Orientação a objetos v2
Orientação a objetos v2Orientação a objetos v2
Orientação a objetos v2
 
MongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF MeetupMongoDB Aug2010 SF Meetup
MongoDB Aug2010 SF Meetup
 
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
Games for the Masses - Wie DevOps die Entwicklung von Architektur verändert (...
 

More from Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 

More from Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

  • 1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc. Tuesday, July 17, 2012
  • 2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2 Tuesday, July 17, 2012
  • 4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4 Tuesday, July 17, 2012
  • 5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5 Tuesday, July 17, 2012
  • 6. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 9. It's like syslogd, but uses JSON for log messages 8 Tuesday, July 17, 2012
  • 10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9 Tuesday, July 17, 2012
  • 11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10 Tuesday, July 17, 2012
  • 12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10 Tuesday, July 17, 2012
  • 13. http://fluentd.org/ 11 Tuesday, July 17, 2012
  • 14. Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12 Tuesday, July 17, 2012
  • 16. Fluentd: Log Format Application Fluentd Storage 14 Tuesday, July 17, 2012
  • 17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14 Tuesday, July 17, 2012
  • 18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14 Tuesday, July 17, 2012
  • 19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15 Tuesday, July 17, 2012
  • 20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15 Tuesday, July 17, 2012
  • 21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 23. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17 Tuesday, July 17, 2012
  • 24. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17 Tuesday, July 17, 2012
  • 25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18 Tuesday, July 17, 2012
  • 26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18 Tuesday, July 17, 2012
  • 27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19 Tuesday, July 17, 2012
  • 28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19 Tuesday, July 17, 2012
  • 29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20 Tuesday, July 17, 2012
  • 30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match> Tuesday, July 17, 2012
  • 32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23 Tuesday, July 17, 2012
  • 33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24 Tuesday, July 17, 2012
  • 34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25 Tuesday, July 17, 2012
  • 35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26 Tuesday, July 17, 2012
  • 36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27 Tuesday, July 17, 2012
  • 37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28 Tuesday, July 17, 2012
  • 38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29 Tuesday, July 17, 2012
  • 40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31 Tuesday, July 17, 2012
  • 41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34 Tuesday, July 17, 2012
  • 46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35 Tuesday, July 17, 2012
  • 47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36 Tuesday, July 17, 2012
  • 48. Exercise: Apache Logs into MongoDB Tuesday, July 17, 2012
  • 49. Log File 38 Tuesday, July 17, 2012
  • 52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41 Tuesday, July 17, 2012