Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hadoop and                Subsystems                     in                 livedoor                Hadoop Conference Japa...
2011   9   26
we are hiring!2011   9   26
whats livedoor?2011   9   26
2011   9   26
large scale web services                     2800+ servers                      3200+ hosts                   530+ web ser...
20 Aug 20092011   9   26
Aug 2011                15Gbps                (10Gbps + CDN 5Gbps)2011   9   26
Hadoop in livedoor                • 10 nodes (1+9)                 • 36 core, 32TB HDFS                • CDH3b2           ...
Hadoop in livedoor                   data mining                     reporting                  page views, unique users, ...
super large scale                 sed | grep | wc                         with                Hadoop Streaming + Hive2011 ...
httpd logs                from 96 servers                   (apache / nginx)                580GB/day (raw)2011   9   26
overview                           hourly                            daily     on                  hourly                 ...
topics                •log delivery network with scribe                 •and scribeline                •hive client web ap...
overview                           hourly                            daily     on                  hourly                 ...
scribe                       log delivery daemon                          based on Thrift                         scalable...
scribe nodes                               scribed                               scribed                               scr...
deliver node traffic2011   9   26
scribe nodes                               scribed                               scribed                               scr...
what we want                from scribe agent                •easy to deploy                •works w/o any httpd configura...
scribe nodes                               scribed                               scribed                scribeline        ...
scribeline                     log delivery agent tool                        python 2.4, thrift              easy to setu...
how to setup scribeline                     in livedoor                       1. yum install scribeline                (ta...
scribe nodes                               scribed                               scribed                               scr...
overview                           hourly                            daily     on                  hourly                 ...
what we want                 about hive client                •easy to experiment                 •from PC on our desks   ...
shib                   hive client web application                   node.js, thrift, kyoto tycoon                       q...
2011   9   26
shib system overview2011   9   26
what shib cannot                     do now                •access control                •graph & chart                •h...
what we are trying now                •New cluster                 •more nodes                 •CDH3b2 + Hive 0.6.0 -> CDH...
thanks!2011   9   26
Upcoming SlideShare
Loading in …5
×

Hadoop and subsystems in livedoor #Hcj11f

11,728 views

Published on

Published in: Technology

Hadoop and subsystems in livedoor #Hcj11f

  1. 1. Hadoop and Subsystems in livedoor Hadoop Conference Japan 2011 Fall 2011/09/26 tagomoris2011 9 26
  2. 2. 2011 9 26
  3. 3. we are hiring!2011 9 26
  4. 4. whats livedoor?2011 9 26
  5. 5. 2011 9 26
  6. 6. large scale web services 2800+ servers 3200+ hosts 530+ web servers2011 9 26
  7. 7. 20 Aug 20092011 9 26
  8. 8. Aug 2011 15Gbps (10Gbps + CDN 5Gbps)2011 9 26
  9. 9. Hadoop in livedoor • 10 nodes (1+9) • 36 core, 32TB HDFS • CDH3b2 • with libhdfs, fuse-hdfs • Hive 0.6.0 (community package)2011 9 26
  10. 10. Hadoop in livedoor data mining reporting page views, unique users, traffic amount per page, ...2011 9 26
  11. 11. super large scale sed | grep | wc with Hadoop Streaming + Hive2011 9 26
  12. 12. httpd logs from 96 servers (apache / nginx) 580GB/day (raw)2011 9 26
  13. 13. overview hourly daily on hourly demand2011 9 26
  14. 14. topics •log delivery network with scribe •and scribeline •hive client web application shib2011 9 26
  15. 15. overview hourly daily on hourly demand2011 9 26
  16. 16. scribe log delivery daemon based on Thrift scalable, reliable supports HDFS https://github.com/facebook/scribe2011 9 26
  17. 17. scribe nodes scribed scribed scribed2011 9 26
  18. 18. deliver node traffic2011 9 26
  19. 19. scribe nodes scribed scribed scribed2011 9 26
  20. 20. what we want from scribe agent •easy to deploy •works w/o any httpd configurations •delivery target failover/takeback •lightweight (without JVM) •stable2011 9 26
  21. 21. scribe nodes scribed scribed scribeline scribed2011 9 26
  22. 22. scribeline log delivery agent tool python 2.4, thrift easy to setup and start/stop works without any httpd configurations works with logrotate-ed log files automatic delivery target failover/takeback https://github.com/tagomoris/scribe_line2011 9 26
  23. 23. how to setup scribeline in livedoor 1. yum install scribeline (tar xzf && cd && sudo make install) 2. vi /etc/scribeline.conf blog /var/log/httpd/access_log blogimg /var/log/nginx/access_log 3. /etc/init.d/scribeline start2011 9 26
  24. 24. scribe nodes scribed scribed scribed2011 9 26
  25. 25. overview hourly daily on hourly demand2011 9 26
  26. 26. what we want about hive client •easy to experiment •from PC on our desks •result caching •protection against data loss •friendly look & feel2011 9 26
  27. 27. shib hive client web application node.js, thrift, kyoto tycoon query history browser query editor, based on copy&paste result caching & download tsv/csv filter INSERT/DROP/CREATE ... https://github.com/tagomoris/shib2011 9 26
  28. 28. 2011 9 26
  29. 29. shib system overview2011 9 26
  30. 30. what shib cannot do now •access control •graph & chart •hive 0.7.0+ features support •database, authentication and ... •mapreduce status notification2011 9 26
  31. 31. what we are trying now •New cluster •more nodes •CDH3b2 + Hive 0.6.0 -> CDH3u1 •New tools •Hoop (instead of fuse-hdfs) •Any stream processing framework2011 9 26
  32. 32. thanks!2011 9 26

×