Successfully reported this slideshow.
Fluentd and WebHDFS      & what makes it possible to write out_webhdfs in 30min.                            TAGOMORI Satos...
@tagomoris              NHN Japan Corp (Web Service Division)              Fluentd committer, plugin developer            ...
Usecase of Fluentd         Monitoring, Notification and Visualization              growthforecast, notifier, ikachan, ....  ...
Log Collection !12年11月8日木曜日
Fluentd as log collector         Many many output plugins for various storages              file, file-alternative          ...
Fluentd with HDFS         To write data on HDFS:              Java native protocol: HDFSClient.java              hadoop fs...
fluent-plugin-webhdfs         Output plugin to write data into HDFS         Supports WebHDFS and HttpFs         First relea...
WebHDFS         HTTP REST API of HDFS         Clients communicate all of NameNode and DataNodes         (like HDFSClient) ...
HttpFs         Proxy server httpfs, provides REST API for HDFS         Same method set with WebHDFS (not like Hoop)       ...
WebHDFS or HttpFs         WebHDFS: Peer-to-Peer communication              Jetty based HTTP server              High throu...
Configuration: WebHDFS         Use Apache 1.0.0(or later), CDH3u5 or CDH4(or later)         In Namenode/Datanode           ...
WebHDFS in NHN Japan         BEFORE: 1400 Timeouts/day with Hoop         Tue Aug 14 15:04:34 2012 +0900         "fix to use...
CONCLUSION 1         WebHDFS is good enough for:              continuous appending into log file              daily operati...
break12年11月8日木曜日
fluent-plugin-webhdfs      commit log         Thu May 17 18:20:15 2012 on fluent-plugin-webhdfs              "writing code":...
30min!?         fluent-plugin-webhdfs              120 lines (including blank line and end)              65 lines of configu...
webhdfs gem commit log         Sun May 20 17:00:57 2012         (15 commits)         Sun May 20 19:01:26 2012         "v0....
fluent-mixin-*         fluent-mixin-plaintextformatter              output text data formatter              webhdfs, file-alt...
CONCLUSION 2         Output plugins have many (complex) problems:              communication, formatting, configuration for...
Questions?               Thanks!                      photo: crouton                   thanks to @kbysmnr12年11月8日木曜日
Upcoming SlideShare
Loading in …5
×

Fluentd and WebHDFS

19,449 views

Published on

Published in: Technology
  • Be the first to comment

Fluentd and WebHDFS

  1. 1. Fluentd and WebHDFS & what makes it possible to write out_webhdfs in 30min. TAGOMORI Satoshi (@tagomoris) NHN Japan Fluentd meetup 3 (2012/11/08)12年11月8日木曜日
  2. 2. @tagomoris NHN Japan Corp (Web Service Division) Fluentd committer, plugin developer fluent-agent-lite, ...12年11月8日木曜日
  3. 3. Usecase of Fluentd Monitoring, Notification and Visualization growthforecast, notifier, ikachan, .... Real-time aggregation datacounter, numeric-counter, numeric-aggregator, .. Real-time processing parser, exec_filter, ....12年11月8日木曜日
  4. 4. Log Collection !12年11月8日木曜日
  5. 5. Fluentd as log collector Many many output plugins for various storages file, file-alternative mongo, couch, cassandra, redis, s3, .... Hadoooooooooooooooooooooooooooooooooooooop12年11月8日木曜日
  6. 6. Fluentd with HDFS To write data on HDFS: Java native protocol: HDFSClient.java hadoop fs -put libhdfs and its binding (like scribed) Cloudera Hoop (2011/07-) +WebHDFS (Apache 1.0-), +HttpFs (Apache 2.0-)12年11月8日木曜日
  7. 7. fluent-plugin-webhdfs Output plugin to write data into HDFS Supports WebHDFS and HttpFs First release: 2012/05/20 by tagomoris v0.1.0 bundled within td-agent v1.1.10 (or later)12年11月8日木曜日
  8. 8. WebHDFS HTTP REST API of HDFS Clients communicate all of NameNode and DataNodes (like HDFSClient) NameNode DataNode Client DataNode DataNode HTTP12年11月8日木曜日
  9. 9. HttpFs Proxy server httpfs, provides REST API for HDFS Same method set with WebHDFS (not like Hoop) Clients communicate with httpfs server only NameNode DataNode httpfs Client server DataNode HTTP Java Native DataNode12年11月8日木曜日
  10. 10. WebHDFS or HttpFs WebHDFS: Peer-to-Peer communication Jetty based HTTP server High throughput and stability HttpFs: Proxyed and Centralized communication Tomcat based HTTP server Simple network topology Relatively low performance and SPOF12年11月8日木曜日
  11. 11. Configuration: WebHDFS Use Apache 1.0.0(or later), CDH3u5 or CDH4(or later) In Namenode/Datanode dfs.webhdfs.enabled=true dfs.support.append=true (only CDH3u5 ?) dfs.support.broken.append=true (only CDH3u5 ?) In fluent-plugin-webhdfs (type webhdfs) host hostname.of.namenode port 50070 path /hdfs/access.%Y%m%d_%H.${hostname}.log12年11月8日木曜日
  12. 12. WebHDFS in NHN Japan BEFORE: 1400 Timeouts/day with Hoop Tue Aug 14 15:04:34 2012 +0900 "fix to use webhdfs to write into hdfs" "2012-08-14 15:08:18 +0900: starting fluentd-0.10.25" Wed Aug 15 13:11:04 2012 +0900 "fix timeouts for busy AM2-5" AFTER: 130 Timeouts from 08/16 to 11/07 1.2-1.5 TB/day from 10 fluentd nodes12年11月8日木曜日
  13. 13. CONCLUSION 1 WebHDFS is good enough for: continuous appending into log file daily operations to move/remove/copy/head/tail over client libraries (and your scripts) Fluentd and td-agent is good enough for: log collector before Hadoop/HDFS12年11月8日木曜日
  14. 14. break12年11月8日木曜日
  15. 15. fluent-plugin-webhdfs commit log Thu May 17 18:20:15 2012 on fluent-plugin-webhdfs "writing code": in fact, no lines of ruby code.... Sun May 20 19:01:26 2012 on xxxxx (some commits) Sun May 20 19:35:34 2012 on fluent-plugin-webhdfs "fix typo": tagged as v0.0.112年11月8日木曜日
  16. 16. 30min!? fluent-plugin-webhdfs 120 lines (including blank line and end) 65 lines of configurations very few lines of actual code WebHDFS operations by webhdfs gem Output formatting by PlainTextFormatterMixin12年11月8日木曜日
  17. 17. webhdfs gem commit log Sun May 20 17:00:57 2012 (15 commits) Sun May 20 19:01:26 2012 "v0.3: add WebHDFS::Client"12年11月8日木曜日
  18. 18. fluent-mixin-* fluent-mixin-plaintextformatter output text data formatter webhdfs, file-alternative, hoop fluent-mixin-config-placeholders provide placeholders like ${hostname}, ${uuid} in configurations webhdfs, ping-message12年11月8日木曜日
  19. 19. CONCLUSION 2 Output plugins have many (complex) problems: communication, formatting, configuration formats, ... We CAN/MUST depends on existing GEMS! We SHOULD write fluent-mixin gems for other plugin developers! many features/codes may be shared by many plugins unified syntax/features over plugins12年11月8日木曜日
  20. 20. Questions? Thanks! photo: crouton thanks to @kbysmnr12年11月8日木曜日

×