Muga NishizawaTreasure Data, Inc.the missing log collector
Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data
3Treasure Data Overview Founded to deliver big data analytics in days not months withoutspecialist IT resources for one-t...
=Fluentdsyslogd+many
=Fluentdsyslogd+many✓ Plugins✓ JSON
> Open sourced log collector written in Ruby> Using rubygems ecosystem for pluginsIn shortIt’s like syslogd, butuses JSON ...
Make log collection easyusing Fluentd
Reporting & Monitoring
Reporting & MonitoringCollect Store Process Visualize
Collect Store Process Visualizeeasier & shorter timeHadoop / HiveMongoDBTreasure DataTableauExcelRReporting & Monitoring
Collect Store Process Visualizeeasier & shorter timeHow to shorten here?Hadoop / HiveMongoDBTreasure DataTableauExcelR
Collect Store Process Visualizeeasier & shorter timeHow to shorten here?Hadoop / HiveMongoDBTreasure DataTableauExcelR
Before FluentdApplication・・・Server2Application・・・Server3Application・・・Server1FluentLog ServerHigh Latency!must wait for a ...
After FluentdApplication・・・Server2Application・・・Server3Application・・・Server1In streaming!Fluentd Fluentd FluentdFluentd Fl...
Many Users
Many Meetups
Growth by Community
Why did we develop Fluentd?
ApacheAppAppOther data sourcestd-agentRDBMSTreasure Datacolumnar datawarehouseQueryProcessingClusterQueryAPIHIVE, PIG (to ...
ApacheAppAppOther data sourcestd-agentRDBMSTreasure Datacolumnar datawarehouseQueryProcessingClusterQueryAPIHIVE, PIG (to ...
writes logs to text filesRails appGoogleSpreadsheetMySQLMySQLMySQLMySQLwrites logs to text filesNightlyINSERThundreds of a...
hundreds of app serverssends event logssends event logssends event logsRails app td-agenttd-agenttd-agentGoogleSpreadsheet...
td-agent> Open sourced distribution package of fluentd> ETL part of Treasure Data> Including useful components> ruby, jema...
How Fluentd works?
=Fluentdsyslogd+many✓ Plugins✓ JSON
NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDataba...
NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDataba...
NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDataba...
Input Plugins Output PluginsBuffer Plugins(Filter Plugins)NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApache...
ArchitectureBuffer OutputInput> Forward> HTTP> File tail> dstat> ...> Forward> File> Amazon S3> MongoDB> ...> Memory> File...
ArchitectureBuffer OutputInput> Forward> HTTP> File tail> dstat> ...> Forward> File> Amazon S3> MongoDB> ...> Memory> File...
Input Plugins Output Plugins2012-02-04 01:33:51myapp.buylog {“user”: ”me”,“path”: “/buyItem”,“price”: 150,“referer”: “/lan...
> second unit> from data source oradding parsed timeEvent structure(log message)✓ Time> for message routing✓ Tag> JSON for...
in_tail: reads file and parses linesfluentdapacheaccess.log✓ read a log file✓ custom regexp✓ custom parser in Rubyin_tail
out_mongo: writes bufferedchunksfluentdapacheaccess.log bufferin_tail
failure handling & retryingfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a file...
out_s3fluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a fileAmazon S3✓ slice files...
out_hdfsfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a file✓ slice files based ...
routing / copyingfluentdapacheaccess.log buffer✓ routing based on tags✓ copy to multiple storagesin_tailAmazon S3Hadoop
Fluentd# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}> Rub...
# logs from a file<source>type tailpath /var/log/httpd.logformat apache2tag web.access</source># logs from client librarie...
out_forwardfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a file✓ slice files bas...
forwardingfluentdfluentdfluentdfluentdfluentdfluentdfluentdsend / ackFluentd
Fluentd - plugin distributionplatform$ fluent-gem search -rd fluent-plugin$ fluent-gem install fluent-plugin-mongo
Use cases
hundreds of app serverssends event logssends event logssends event logsRails app td-agenttd-agenttd-agentGoogleSpreadsheet...
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013NHN Japanby @tagomoris✓ 16 nodes✓ 120,000+ li...
Treasure DataFrontendJob QueueWorkerHadoopHadoopFluentdApplications pushmetrics to Fluentd(via local Fluentd)LibratoMetric...
Key to Fluentd’s growth is...
=Fluentdsyslogd+many+Community✓ Plugins✓ JSON
Muga NishizawaTreasure Data, Inc.the missing log collector
Upcoming SlideShare
Loading in …5
×

fluentd -- the missing log collector

1,717 views

Published on

It was talked on QCon Tokyo 2013.

Published in: Technology
  • Be the first to comment

fluentd -- the missing log collector

  1. 1. Muga NishizawaTreasure Data, Inc.the missing log collector
  2. 2. Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data
  3. 3. 3Treasure Data Overview Founded to deliver big data analytics in days not months withoutspecialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc. Treasure Data is in production• 60+ customers incl. Fortune 500 companies• 400+ billion records stored Processing 40,000 messages per second
  4. 4. =Fluentdsyslogd+many
  5. 5. =Fluentdsyslogd+many✓ Plugins✓ JSON
  6. 6. > Open sourced log collector written in Ruby> Using rubygems ecosystem for pluginsIn shortIt’s like syslogd, butuses JSON for log messages
  7. 7. Make log collection easyusing Fluentd
  8. 8. Reporting & Monitoring
  9. 9. Reporting & MonitoringCollect Store Process Visualize
  10. 10. Collect Store Process Visualizeeasier & shorter timeHadoop / HiveMongoDBTreasure DataTableauExcelRReporting & Monitoring
  11. 11. Collect Store Process Visualizeeasier & shorter timeHow to shorten here?Hadoop / HiveMongoDBTreasure DataTableauExcelR
  12. 12. Collect Store Process Visualizeeasier & shorter timeHow to shorten here?Hadoop / HiveMongoDBTreasure DataTableauExcelR
  13. 13. Before FluentdApplication・・・Server2Application・・・Server3Application・・・Server1FluentLog ServerHigh Latency!must wait for a day...
  14. 14. After FluentdApplication・・・Server2Application・・・Server3Application・・・Server1In streaming!Fluentd Fluentd FluentdFluentd Fluentd
  15. 15. Many Users
  16. 16. Many Meetups
  17. 17. Growth by Community
  18. 18. Why did we develop Fluentd?
  19. 19. ApacheAppAppOther data sourcestd-agentRDBMSTreasure Datacolumnar datawarehouseQueryProcessingClusterQueryAPIHIVE, PIG (to be supported)JDBC, RESTMAPREDUCE JOBSUsertd-commandBI appsTreasure Data Service Architecture
  20. 20. ApacheAppAppOther data sourcestd-agentRDBMSTreasure Datacolumnar datawarehouseQueryProcessingClusterQueryAPIHIVE, PIG (to be supported)JDBC, RESTMAPREDUCE JOBSUsertd-commandBI appsTreasure Data Service ArchitectureOpen Sourced
  21. 21. writes logs to text filesRails appGoogleSpreadsheetMySQLMySQLMySQLMySQLwrites logs to text filesNightlyINSERThundreds of app serversDaily/HourlyBatchKPIvisualizationFeedback rankingsRails appwrites logs to text filesRails app- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latencyExample Use Case – MySQL to TD
  22. 22. hundreds of app serverssends event logssends event logssends event logsRails app td-agenttd-agenttd-agentGoogleSpreadsheetTreasure DataMySQLLogs are availableafter several mins.Daily/HourlyBatchKPIvisualizationFeedback rankingsRails appRails appUnlimited scalabilityFlexible schemaRealtimeLess performance impactExample Use Case – MySQL to TD
  23. 23. td-agent> Open sourced distribution package of fluentd> ETL part of Treasure Data> Including useful components> ruby, jemalloc, fluentd> 3rd party gems: td, mongo, webhdfs, etc...td plugin is for TD> http://packages.treasure-data.com/
  24. 24. How Fluentd works?
  25. 25. =Fluentdsyslogd+many✓ Plugins✓ JSON
  26. 26. NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDatabasesfilter / buffer / routing
  27. 27. NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDatabasesfilter / buffer / routing
  28. 28. NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDatabasesfilter / buffer / routing
  29. 29. Input Plugins Output PluginsBuffer Plugins(Filter Plugins)NagiosMongoDBHadoopAlertingAmazon S3AnalysisArchivingMySQLApacheFrontendAccess logssyslogdApp logsSystem logsBackendDatabasesfilter / buffer / routing
  30. 30. ArchitectureBuffer OutputInput> Forward> HTTP> File tail> dstat> ...> Forward> File> Amazon S3> MongoDB> ...> Memory> FilePluggable Pluggable Pluggable
  31. 31. ArchitectureBuffer OutputInput> Forward> HTTP> File tail> dstat> ...> Forward> File> Amazon S3> MongoDB> ...> Memory> FilePluggable Pluggable Pluggable117 plugins!Contributions by Community
  32. 32. Input Plugins Output Plugins2012-02-04 01:33:51myapp.buylog {“user”: ”me”,“path”: “/buyItem”,“price”: 150,“referer”: “/landing”}timetagrecordJSONlog
  33. 33. > second unit> from data source oradding parsed timeEvent structure(log message)✓ Time> for message routing✓ Tag> JSON format> MessagePackinternally> non-unstructured✓ Record
  34. 34. in_tail: reads file and parses linesfluentdapacheaccess.log✓ read a log file✓ custom regexp✓ custom parser in Rubyin_tail
  35. 35. out_mongo: writes bufferedchunksfluentdapacheaccess.log bufferin_tail
  36. 36. failure handling & retryingfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a filein_tail
  37. 37. out_s3fluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a fileAmazon S3✓ slice files based on timein_tail2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...
  38. 38. out_hdfsfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a file✓ slice files based on timein_tail2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...HDFS✓ custom text formater
  39. 39. routing / copyingfluentdapacheaccess.log buffer✓ routing based on tags✓ copy to multiple storagesin_tailAmazon S3Hadoop
  40. 40. Fluentd# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}> Ruby> Java> Perl> PHP> Python> D> Scala> ...ApplicationTime:Tag:RecordClient libraries
  41. 41. # logs from a file<source>type tailpath /var/log/httpd.logformat apache2tag web.access</source># logs from client libraries<source>type forwardport 24224</source># store logs to MongoDB and S3<match **>type copy<match>type mongohost mongo.example.comcappedcapped_size 200m</match><match>type s3path archive/</match></match>Fluentd
  42. 42. out_forwardfluentdapacheaccess.log buffer✓ retry automatically✓ exponential retry wait✓ persistent on a file✓ slice files based on timein_tail2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...fluentdfluentdfluentd✓ automatic fail-over✓ load balancing
  43. 43. forwardingfluentdfluentdfluentdfluentdfluentdfluentdfluentdsend / ackFluentd
  44. 44. Fluentd - plugin distributionplatform$ fluent-gem search -rd fluent-plugin$ fluent-gem install fluent-plugin-mongo
  45. 45. Use cases
  46. 46. hundreds of app serverssends event logssends event logssends event logsRails app td-agenttd-agenttd-agentGoogleSpreadsheetTreasure DataMySQLLogs are availableafter several mins.Daily/HourlyBatchKPIvisualizationFeedback rankingsRails appRails appUnlimited scalabilityFlexible schemaRealtimeLess performance impactCookpad✓ Over 100 RoR servers (2012/2/4)
  47. 47. http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013NHN Japanby @tagomoris✓ 16 nodes✓ 120,000+ lines/sec✓ 400Mbps at peak✓ 1.5+ TB/day (raw)WebServers FluentdClusterArchiveStorage(scribed)FluentdWatchersGraphToolsNotifications(IRC)Hadoop ClusterCDH4(HDFS, YARN)webhdfsHuahinManagerhiveserverSTREAMShib ShibUIBATCHSCHEDULEDBATCH
  48. 48. Treasure DataFrontendJob QueueWorkerHadoopHadoopFluentdApplications pushmetrics to Fluentd(via local Fluentd)LibratoMetricsfor realtime analysisTreasureDatafor historical analysisFluentd sums up data minutes(partial aggregation)
  49. 49. Key to Fluentd’s growth is...
  50. 50. =Fluentdsyslogd+many+Community✓ Plugins✓ JSON
  51. 51. Muga NishizawaTreasure Data, Inc.the missing log collector

×