• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop入門とクラウド利用
 

Hadoop入門とクラウド利用

on

  • 18,900 views

 

Statistics

Views

Total Views
18,900
Views on SlideShare
16,892
Embed Views
2,008

Actions

Likes
38
Downloads
0
Comments
0

12 Embeds 2,008

http://forza.cocolog-nifty.com 920
http://d.hatena.ne.jp 468
http://blog.yanaoki.org 457
http://www.slideshare.net 90
http://paper.li 22
http://webcache.googleusercontent.com 20
http://app.m-cocolog.jp 19
http://www.forza.cocolog-nifty.com 5
http://translate.googleusercontent.com 2
http://a0.twimg.com 2
url_unknown 2
http://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hadoop入門とクラウド利用 Hadoop入門とクラウド利用 Presentation Transcript

    • Hadoop! 2010/05/16 naoki yanai id:yanaoki 1
    • Hadoop Hadoop (Elastic MapReduce) 2
    • naoki yanai (id:yanaoki) Web Hadooop m m iPhone Ruby Java 3
    • Hadoop 4
    • Hadoop Java Apache 5
    • Hadoop Google 2004 MapReduce http://labs.google.com/papers/mapreduce.html Google File System (GFS) http://labs.google.com/papers/gfs.html 2010 Google 6
    • Hadoop Web → 7
    • 8
    • Hadoop 9
    • Hadoop Yahoo Yahoo Hadoop Facebook Amazon 10
    • Hadoop RDBMS Join mapreduce join SQL Hadoop 11
    • Hadoop MapReduce web HDFS RDB Web Hadoop 12
    • Hadoop MapReduce web HDFS RDB Web Hadoop 13
    • Hadoop N Hadoop 14
    • Hadoop MapReduce HDFS Hadoop MapReduce HDFS 15
    • MapReduce → map → reduce → map reduce hadoop key-value Hadoop 16
    • HDFS Hadoop MapReduce 17
    • MapReduce slave MR:TaskTracker master MR:JobTracker slave MR:TaskTracker (Job) (map reduce 18
    • HDFS slave HDFS:DataNode master HDFS:NameNode slave HDFS:DataNode 19
    • Hadoop MapReduce HDFS slave MR:TaskTracker master HDFS:DataNode MR:JobTracker slave HDFS:NameNode MR:TaskTracker HDFS:DataNode Hadoop HDFS MapReduce map reduce JobTracker map reduce 20
    • MapReduce AA A3 AB B2 BC C1 input output map reduce 21
    • MapReduce Example Google map(String key, String value): / key: document name / / value: document contents / for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values) / key: a word / / values: a list of counts / int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 22
    • MapReduce A:1 A:1 map A:<1,1,1> A:3 C:1 AA AB C:<1> reduce BC A:1 B:1 map B:2 HDFS B:<1,1> reduce B:1 input map C:1 HDFS shuffle output map reduce (sort) 23
    • MapReduce Google 24
    • Hadoop Mahout Hadoop Apache CollaborativeFiltering Classifier Clustering DecisionForest 25
    • Hadoop 26
    • Hadoop 27
    • Amazon Web Service EC2 28
    • Amazon Web Service WebAPI 29
    • Amazon Web Service EC2 ( Elastic Compute Cloud ) root/admin S3 ( Simple Storage Service ) EMR ( Elastic MapReduce ) Web Hadoop → MapReduce EC2 S3 +α 30
    • Elastic MapReduce Hadoop Hadoop input output S3 31
    • Elastic MapReduce Amazon 32
    • Elastic MapReduce client cloud master API Job input/output slave S3 slave slave 33
    • Elastic MapReduce MapReduce 34
    • Elastic MapReduce Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming http://developer.amazonwebservices.com/connect/ entry.jspa?externalID=2294 Item 35
    • Elastic MapReduce map/reduce map/reduce input http://www.grouplens.org/ 5 36
    • Elastic MapReduce input S3 [ ID] [ ID] [ ] map/reduce output S3 [ ID] [ ID] [ ] 37
    • Elastic MapReduce S S map map reduce map map reduce map map reduce map reduce reduce map reduce reduce map reduce reduce 38
    • Elastic MapReduce step1 : input key:[] value:[ ID_ ID_ ] map ID key:[ ID] values[ ID_ ] reduce ID output ID ¥t ID_ | ID_ |... 39
    • Elastic MapReduce step2 : input key:[ ID] value:[ ID_ | ID_ |...] ID map key:[ IDx_ IDy] values[ x_ y] ID reduce output IDx_ _ IDy 40
    • Elastic MapReduce step3 : input IDx_ _ IDy IDx_(1- ) key map map map key: < IDx_(1- )> values < IDy> reduce 1- output IDx_ IDy_ 41
    • Elastic MapReduce 42
    • Elastic MapReduce 1 elastic-mapreduce --create --name "item similarity job" --alive --log-uri s3n://bucket /logs --num-instances 10 --instance-type m1.small --availability-zone us-west-1a 43
    • EC2 EC2 44
    • Elastic MapReduce WAITING 45
    • Elastic MapReduce 2 S3 (s3cmd input map/reduce python s3cmd.rb put bucket :input/input.tsv input.tsv s3cmd.rb put bucket :script/map.py map1.py s3cmd.rb put bucket :script/reduce1.py reduce1.py ... 46
    • Elastic MapReduce 4 Job elastic-mapreduce --job-flow-id j-2ROU0QKL6KOV6 --json item_similarity.json 47
    • Elastic MapReduce Step1 RUNNING 48
    • Elastic MapReduce 5 output s3sync.rb -r --make-dirs bucket :output . elastic-mapreduce --terminate --job-flow-id j-2ROU0QKL6KOV6 49
    • Hadoop x 50
    • Hadoop Tom White ( ) ( ) ( ) ¥4,830 51
    • 52