Hadoop入門とクラウド利用

  • 16,929 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
16,929
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
0
Comments
0
Likes
38

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop! 2010/05/16 naoki yanai id:yanaoki 1
  • 2. Hadoop Hadoop (Elastic MapReduce) 2
  • 3. naoki yanai (id:yanaoki) Web Hadooop m m iPhone Ruby Java 3
  • 4. Hadoop 4
  • 5. Hadoop Java Apache 5
  • 6. Hadoop Google 2004 MapReduce http://labs.google.com/papers/mapreduce.html Google File System (GFS) http://labs.google.com/papers/gfs.html 2010 Google 6
  • 7. Hadoop Web → 7
  • 8. 8
  • 9. Hadoop 9
  • 10. Hadoop Yahoo Yahoo Hadoop Facebook Amazon 10
  • 11. Hadoop RDBMS Join mapreduce join SQL Hadoop 11
  • 12. Hadoop MapReduce web HDFS RDB Web Hadoop 12
  • 13. Hadoop MapReduce web HDFS RDB Web Hadoop 13
  • 14. Hadoop N Hadoop 14
  • 15. Hadoop MapReduce HDFS Hadoop MapReduce HDFS 15
  • 16. MapReduce → map → reduce → map reduce hadoop key-value Hadoop 16
  • 17. HDFS Hadoop MapReduce 17
  • 18. MapReduce slave MR:TaskTracker master MR:JobTracker slave MR:TaskTracker (Job) (map reduce 18
  • 19. HDFS slave HDFS:DataNode master HDFS:NameNode slave HDFS:DataNode 19
  • 20. Hadoop MapReduce HDFS slave MR:TaskTracker master HDFS:DataNode MR:JobTracker slave HDFS:NameNode MR:TaskTracker HDFS:DataNode Hadoop HDFS MapReduce map reduce JobTracker map reduce 20
  • 21. MapReduce AA A3 AB B2 BC C1 input output map reduce 21
  • 22. MapReduce Example Google map(String key, String value): / key: document name / / value: document contents / for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values) / key: a word / / values: a list of counts / int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 22
  • 23. MapReduce A:1 A:1 map A:<1,1,1> A:3 C:1 AA AB C:<1> reduce BC A:1 B:1 map B:2 HDFS B:<1,1> reduce B:1 input map C:1 HDFS shuffle output map reduce (sort) 23
  • 24. MapReduce Google 24
  • 25. Hadoop Mahout Hadoop Apache CollaborativeFiltering Classifier Clustering DecisionForest 25
  • 26. Hadoop 26
  • 27. Hadoop 27
  • 28. Amazon Web Service EC2 28
  • 29. Amazon Web Service WebAPI 29
  • 30. Amazon Web Service EC2 ( Elastic Compute Cloud ) root/admin S3 ( Simple Storage Service ) EMR ( Elastic MapReduce ) Web Hadoop → MapReduce EC2 S3 +α 30
  • 31. Elastic MapReduce Hadoop Hadoop input output S3 31
  • 32. Elastic MapReduce Amazon 32
  • 33. Elastic MapReduce client cloud master API Job input/output slave S3 slave slave 33
  • 34. Elastic MapReduce MapReduce 34
  • 35. Elastic MapReduce Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming http://developer.amazonwebservices.com/connect/ entry.jspa?externalID=2294 Item 35
  • 36. Elastic MapReduce map/reduce map/reduce input http://www.grouplens.org/ 5 36
  • 37. Elastic MapReduce input S3 [ ID] [ ID] [ ] map/reduce output S3 [ ID] [ ID] [ ] 37
  • 38. Elastic MapReduce S S map map reduce map map reduce map map reduce map reduce reduce map reduce reduce map reduce reduce 38
  • 39. Elastic MapReduce step1 : input key:[] value:[ ID_ ID_ ] map ID key:[ ID] values[ ID_ ] reduce ID output ID ¥t ID_ | ID_ |... 39
  • 40. Elastic MapReduce step2 : input key:[ ID] value:[ ID_ | ID_ |...] ID map key:[ IDx_ IDy] values[ x_ y] ID reduce output IDx_ _ IDy 40
  • 41. Elastic MapReduce step3 : input IDx_ _ IDy IDx_(1- ) key map map map key: < IDx_(1- )> values < IDy> reduce 1- output IDx_ IDy_ 41
  • 42. Elastic MapReduce 42
  • 43. Elastic MapReduce 1 elastic-mapreduce --create --name "item similarity job" --alive --log-uri s3n://bucket /logs --num-instances 10 --instance-type m1.small --availability-zone us-west-1a 43
  • 44. EC2 EC2 44
  • 45. Elastic MapReduce WAITING 45
  • 46. Elastic MapReduce 2 S3 (s3cmd input map/reduce python s3cmd.rb put bucket :input/input.tsv input.tsv s3cmd.rb put bucket :script/map.py map1.py s3cmd.rb put bucket :script/reduce1.py reduce1.py ... 46
  • 47. Elastic MapReduce 4 Job elastic-mapreduce --job-flow-id j-2ROU0QKL6KOV6 --json item_similarity.json 47
  • 48. Elastic MapReduce Step1 RUNNING 48
  • 49. Elastic MapReduce 5 output s3sync.rb -r --make-dirs bucket :output . elastic-mapreduce --terminate --job-flow-id j-2ROU0QKL6KOV6 49
  • 50. Hadoop x 50
  • 51. Hadoop Tom White ( ) ( ) ( ) ¥4,830 51
  • 52. 52