Your SlideShare is downloading. ×
0
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Hadoop and subsystems in livedoor #Hcj11f
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop and subsystems in livedoor #Hcj11f

10,821

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,821
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
78
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop and Subsystems in livedoor Hadoop Conference Japan 2011 Fall 2011/09/26 tagomoris2011 9 26
  • 2. 2011 9 26
  • 3. we are hiring!2011 9 26
  • 4. whats livedoor?2011 9 26
  • 5. 2011 9 26
  • 6. large scale web services 2800+ servers 3200+ hosts 530+ web servers2011 9 26
  • 7. 20 Aug 20092011 9 26
  • 8. Aug 2011 15Gbps (10Gbps + CDN 5Gbps)2011 9 26
  • 9. Hadoop in livedoor • 10 nodes (1+9) • 36 core, 32TB HDFS • CDH3b2 • with libhdfs, fuse-hdfs • Hive 0.6.0 (community package)2011 9 26
  • 10. Hadoop in livedoor data mining reporting page views, unique users, traffic amount per page, ...2011 9 26
  • 11. super large scale sed | grep | wc with Hadoop Streaming + Hive2011 9 26
  • 12. httpd logs from 96 servers (apache / nginx) 580GB/day (raw)2011 9 26
  • 13. overview hourly daily on hourly demand2011 9 26
  • 14. topics •log delivery network with scribe •and scribeline •hive client web application shib2011 9 26
  • 15. overview hourly daily on hourly demand2011 9 26
  • 16. scribe log delivery daemon based on Thrift scalable, reliable supports HDFS https://github.com/facebook/scribe2011 9 26
  • 17. scribe nodes scribed scribed scribed2011 9 26
  • 18. deliver node traffic2011 9 26
  • 19. scribe nodes scribed scribed scribed2011 9 26
  • 20. what we want from scribe agent •easy to deploy •works w/o any httpd configurations •delivery target failover/takeback •lightweight (without JVM) •stable2011 9 26
  • 21. scribe nodes scribed scribed scribeline scribed2011 9 26
  • 22. scribeline log delivery agent tool python 2.4, thrift easy to setup and start/stop works without any httpd configurations works with logrotate-ed log files automatic delivery target failover/takeback https://github.com/tagomoris/scribe_line2011 9 26
  • 23. how to setup scribeline in livedoor 1. yum install scribeline (tar xzf && cd && sudo make install) 2. vi /etc/scribeline.conf blog /var/log/httpd/access_log blogimg /var/log/nginx/access_log 3. /etc/init.d/scribeline start2011 9 26
  • 24. scribe nodes scribed scribed scribed2011 9 26
  • 25. overview hourly daily on hourly demand2011 9 26
  • 26. what we want about hive client •easy to experiment •from PC on our desks •result caching •protection against data loss •friendly look & feel2011 9 26
  • 27. shib hive client web application node.js, thrift, kyoto tycoon query history browser query editor, based on copy&paste result caching & download tsv/csv filter INSERT/DROP/CREATE ... https://github.com/tagomoris/shib2011 9 26
  • 28. 2011 9 26
  • 29. shib system overview2011 9 26
  • 30. what shib cannot do now •access control •graph & chart •hive 0.7.0+ features support •database, authentication and ... •mapreduce status notification2011 9 26
  • 31. what we are trying now •New cluster •more nodes •CDH3b2 + Hive 0.6.0 -> CDH3u1 •New tools •Hoop (instead of fuse-hdfs) •Any stream processing framework2011 9 26
  • 32. thanks!2011 9 26

×