• Save
Hadoop入門とクラウド利用
Upcoming SlideShare
Loading in...5
×
 

Hadoop入門とクラウド利用

on

  • 19,135 views

 

Statistics

Views

Total Views
19,135
Views on SlideShare
17,112
Embed Views
2,023

Actions

Likes
38
Downloads
0
Comments
0

13 Embeds 2,023

http://forza.cocolog-nifty.com 926
http://d.hatena.ne.jp 475
http://blog.yanaoki.org 458
http://www.slideshare.net 90
http://paper.li 22
http://webcache.googleusercontent.com 20
http://app.m-cocolog.jp 19
http://www.forza.cocolog-nifty.com 5
http://a0.twimg.com 2
url_unknown 2
http://translate.googleusercontent.com 2
http://twitter.com 1
http://pmomale-ld1 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop入門とクラウド利用 Hadoop入門とクラウド利用 Presentation Transcript

  • Hadoop! 2010/05/16 naoki yanai id:yanaoki 1
  • Hadoop Hadoop (Elastic MapReduce) 2
  • naoki yanai (id:yanaoki) Web Hadooop m m iPhone Ruby Java 3
  • Hadoop 4
  • Hadoop Java Apache 5
  • Hadoop Google 2004 MapReduce http://labs.google.com/papers/mapreduce.html Google File System (GFS) http://labs.google.com/papers/gfs.html 2010 Google 6
  • Hadoop Web → 7
  • 8
  • Hadoop 9
  • Hadoop Yahoo Yahoo Hadoop Facebook Amazon 10
  • Hadoop RDBMS Join mapreduce join SQL Hadoop 11
  • Hadoop MapReduce web HDFS RDB Web Hadoop 12
  • Hadoop MapReduce web HDFS RDB Web Hadoop 13
  • Hadoop N Hadoop 14
  • Hadoop MapReduce HDFS Hadoop MapReduce HDFS 15
  • MapReduce → map → reduce → map reduce hadoop key-value Hadoop 16
  • HDFS Hadoop MapReduce 17
  • MapReduce slave MR:TaskTracker master MR:JobTracker slave MR:TaskTracker (Job) (map reduce 18
  • HDFS slave HDFS:DataNode master HDFS:NameNode slave HDFS:DataNode 19
  • Hadoop MapReduce HDFS slave MR:TaskTracker master HDFS:DataNode MR:JobTracker slave HDFS:NameNode MR:TaskTracker HDFS:DataNode Hadoop HDFS MapReduce map reduce JobTracker map reduce 20
  • MapReduce AA A3 AB B2 BC C1 input output map reduce 21
  • MapReduce Example Google map(String key, String value): / key: document name / / value: document contents / for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values) / key: a word / / values: a list of counts / int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 22
  • MapReduce A:1 A:1 map A:<1,1,1> A:3 C:1 AA AB C:<1> reduce BC A:1 B:1 map B:2 HDFS B:<1,1> reduce B:1 input map C:1 HDFS shuffle output map reduce (sort) 23
  • MapReduce Google 24
  • Hadoop Mahout Hadoop Apache CollaborativeFiltering Classifier Clustering DecisionForest 25
  • Hadoop 26
  • Hadoop 27
  • Amazon Web Service EC2 28
  • Amazon Web Service WebAPI 29
  • Amazon Web Service EC2 ( Elastic Compute Cloud ) root/admin S3 ( Simple Storage Service ) EMR ( Elastic MapReduce ) Web Hadoop → MapReduce EC2 S3 +α 30
  • Elastic MapReduce Hadoop Hadoop input output S3 31
  • Elastic MapReduce Amazon 32
  • Elastic MapReduce client cloud master API Job input/output slave S3 slave slave 33
  • Elastic MapReduce MapReduce 34
  • Elastic MapReduce Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming http://developer.amazonwebservices.com/connect/ entry.jspa?externalID=2294 Item 35
  • Elastic MapReduce map/reduce map/reduce input http://www.grouplens.org/ 5 36
  • Elastic MapReduce input S3 [ ID] [ ID] [ ] map/reduce output S3 [ ID] [ ID] [ ] 37
  • Elastic MapReduce S S map map reduce map map reduce map map reduce map reduce reduce map reduce reduce map reduce reduce 38
  • Elastic MapReduce step1 : input key:[] value:[ ID_ ID_ ] map ID key:[ ID] values[ ID_ ] reduce ID output ID ¥t ID_ | ID_ |... 39
  • Elastic MapReduce step2 : input key:[ ID] value:[ ID_ | ID_ |...] ID map key:[ IDx_ IDy] values[ x_ y] ID reduce output IDx_ _ IDy 40
  • Elastic MapReduce step3 : input IDx_ _ IDy IDx_(1- ) key map map map key: < IDx_(1- )> values < IDy> reduce 1- output IDx_ IDy_ 41
  • Elastic MapReduce 42
  • Elastic MapReduce 1 elastic-mapreduce --create --name "item similarity job" --alive --log-uri s3n://bucket /logs --num-instances 10 --instance-type m1.small --availability-zone us-west-1a 43
  • EC2 EC2 44
  • Elastic MapReduce WAITING 45
  • Elastic MapReduce 2 S3 (s3cmd input map/reduce python s3cmd.rb put bucket :input/input.tsv input.tsv s3cmd.rb put bucket :script/map.py map1.py s3cmd.rb put bucket :script/reduce1.py reduce1.py ... 46
  • Elastic MapReduce 4 Job elastic-mapreduce --job-flow-id j-2ROU0QKL6KOV6 --json item_similarity.json 47
  • Elastic MapReduce Step1 RUNNING 48
  • Elastic MapReduce 5 output s3sync.rb -r --make-dirs bucket :output . elastic-mapreduce --terminate --job-flow-id j-2ROU0QKL6KOV6 49
  • Hadoop x 50
  • Hadoop Tom White ( ) ( ) ( ) ¥4,830 51
  • 52