More Related Content Similar to Hadoop入門とクラウド利用 Similar to Hadoop入門とクラウド利用 (20) Hadoop入門とクラウド利用6. Hadoop
Google 2004
MapReduce
http://labs.google.com/papers/mapreduce.html
Google File System (GFS)
http://labs.google.com/papers/gfs.html
2010 Google
6
12. Hadoop
MapReduce
web
HDFS
RDB
Web Hadoop
12
13. Hadoop
MapReduce
web
HDFS
RDB
Web Hadoop
13
16. MapReduce
→ map → reduce →
map reduce hadoop
key-value
Hadoop
16
18. MapReduce
slave
MR:TaskTracker
master
MR:JobTracker
slave
MR:TaskTracker
(Job)
(map reduce
18
19. HDFS
slave
HDFS:DataNode
master
HDFS:NameNode
slave
HDFS:DataNode
19
20. Hadoop
MapReduce HDFS
slave
MR:TaskTracker
master
HDFS:DataNode
MR:JobTracker
slave
HDFS:NameNode
MR:TaskTracker
HDFS:DataNode
Hadoop
HDFS
MapReduce map reduce JobTracker
map reduce
20
22. MapReduce
Example Google
map(String key, String value):
/ key: document name
/
/ value: document contents
/
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values)
/ key: a word
/
/ values: a list of counts
/
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
22
23. MapReduce
A:1
A:1
map A:<1,1,1>
A:3
C:1
AA
AB
C:<1> reduce
BC A:1
B:1
map B:2
HDFS
B:<1,1>
reduce
B:1
input map
C:1
HDFS
shuffle output
map reduce
(sort)
23
30. Amazon Web Service
EC2 ( Elastic Compute Cloud )
root/admin
S3 ( Simple Storage Service )
EMR ( Elastic MapReduce )
Web
Hadoop → MapReduce
EC2 S3 +α
30
35. Elastic MapReduce
Finding Similar Items with Amazon Elastic MapReduce,
Python, and Hadoop Streaming
http://developer.amazonwebservices.com/connect/
entry.jspa?externalID=2294
Item
35
38. Elastic MapReduce
S S
map
map reduce map
map reduce map
map reduce
map reduce
reduce map reduce
reduce map reduce
reduce
38
39. Elastic MapReduce
step1 :
input
key:[] value:[ ID_ ID_ ]
map ID
key:[ ID] values[ ID_ ]
reduce ID
output
ID ¥t ID_ | ID_ |...
39
40. Elastic MapReduce
step2 :
input
key:[ ID] value:[ ID_ | ID_ |...]
ID
map
key:[ IDx_ IDy] values[ x_ y]
ID
reduce
output
IDx_ _ IDy
40
41. Elastic MapReduce
step3 :
input
IDx_ _ IDy
IDx_(1- ) key map
map map
key: < IDx_(1- )> values < IDy>
reduce 1-
output
IDx_ IDy_
41
43. Elastic MapReduce
1
elastic-mapreduce
--create
--name "item similarity job"
--alive
--log-uri s3n://bucket /logs
--num-instances 10
--instance-type m1.small
--availability-zone us-west-1a
43
46. Elastic MapReduce
2
S3 (s3cmd
input
map/reduce python
s3cmd.rb put bucket :input/input.tsv input.tsv
s3cmd.rb put bucket :script/map.py map1.py
s3cmd.rb put bucket :script/reduce1.py reduce1.py
...
46
47. Elastic MapReduce
4
Job
elastic-mapreduce
--job-flow-id j-2ROU0QKL6KOV6
--json item_similarity.json
47
49. Elastic MapReduce
5
output
s3sync.rb -r --make-dirs bucket :output .
elastic-mapreduce
--terminate
--job-flow-id j-2ROU0QKL6KOV6
49