Hadoop and subsystems in livedoor #Hcj11f
Upcoming SlideShare
Loading in...5
×
 

Hadoop and subsystems in livedoor #Hcj11f

on

  • 11,033 views

 

Statistics

Views

Total Views
11,033
Views on SlideShare
5,297
Embed Views
5,736

Actions

Likes
9
Downloads
77
Comments
0

19 Embeds 5,736

http://d.hatena.ne.jp 2739
http://infra-engineer.com 2448
http://www.lifexweb.com 311
http://seikoudoku2000.hatenablog.com 101
http://hatenatunnel.appspot.com 34
http://webcache.googleusercontent.com 20
http://paper.li 20
http://infra.rrdtool.net 15
http://twitter.com 11
http://strawberryj.am 7
http://www.slideshare.net 7
http://us-w1.rockmelt.com 6
http://nuevospowerpoints.blogspot.com 4
http://a0.twimg.com 4
http://translate.googleusercontent.com 3
https://twitter.com 2
http://feedly.com 2
http://jp.hanrss.com 1
http://cloud.feedly.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop and subsystems in livedoor #Hcj11f Hadoop and subsystems in livedoor #Hcj11f Presentation Transcript

  • Hadoop and Subsystems in livedoor Hadoop Conference Japan 2011 Fall 2011/09/26 tagomoris2011 9 26
  • 2011 9 26
  • we are hiring!2011 9 26
  • whats livedoor?2011 9 26
  • 2011 9 26
  • large scale web services 2800+ servers 3200+ hosts 530+ web servers2011 9 26
  • 20 Aug 20092011 9 26
  • Aug 2011 15Gbps (10Gbps + CDN 5Gbps)2011 9 26
  • Hadoop in livedoor • 10 nodes (1+9) • 36 core, 32TB HDFS • CDH3b2 • with libhdfs, fuse-hdfs • Hive 0.6.0 (community package)2011 9 26
  • Hadoop in livedoor data mining reporting page views, unique users, traffic amount per page, ...2011 9 26
  • super large scale sed | grep | wc with Hadoop Streaming + Hive2011 9 26
  • httpd logs from 96 servers (apache / nginx) 580GB/day (raw)2011 9 26
  • overview hourly daily on hourly demand2011 9 26
  • topics •log delivery network with scribe •and scribeline •hive client web application shib2011 9 26
  • overview hourly daily on hourly demand2011 9 26
  • scribe log delivery daemon based on Thrift scalable, reliable supports HDFS https://github.com/facebook/scribe2011 9 26
  • scribe nodes scribed scribed scribed2011 9 26
  • deliver node traffic2011 9 26
  • scribe nodes scribed scribed scribed2011 9 26
  • what we want from scribe agent •easy to deploy •works w/o any httpd configurations •delivery target failover/takeback •lightweight (without JVM) •stable2011 9 26
  • scribe nodes scribed scribed scribeline scribed2011 9 26
  • scribeline log delivery agent tool python 2.4, thrift easy to setup and start/stop works without any httpd configurations works with logrotate-ed log files automatic delivery target failover/takeback https://github.com/tagomoris/scribe_line2011 9 26
  • how to setup scribeline in livedoor 1. yum install scribeline (tar xzf && cd && sudo make install) 2. vi /etc/scribeline.conf blog /var/log/httpd/access_log blogimg /var/log/nginx/access_log 3. /etc/init.d/scribeline start2011 9 26
  • scribe nodes scribed scribed scribed2011 9 26
  • overview hourly daily on hourly demand2011 9 26
  • what we want about hive client •easy to experiment •from PC on our desks •result caching •protection against data loss •friendly look & feel2011 9 26
  • shib hive client web application node.js, thrift, kyoto tycoon query history browser query editor, based on copy&paste result caching & download tsv/csv filter INSERT/DROP/CREATE ... https://github.com/tagomoris/shib2011 9 26
  • 2011 9 26
  • shib system overview2011 9 26
  • what shib cannot do now •access control •graph & chart •hive 0.7.0+ features support •database, authentication and ... •mapreduce status notification2011 9 26
  • what we are trying now •New cluster •more nodes •CDH3b2 + Hive 0.6.0 -> CDH3u1 •New tools •Hoop (instead of fuse-hdfs) •Any stream processing framework2011 9 26
  • thanks!2011 9 26