Your SlideShare is downloading. ×
0
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Hadoop入門とクラウド利用
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop入門とクラウド利用

17,171

Published on

0 Comments
38 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
17,171
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
38
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop! 2010/05/16 naoki yanai id:yanaoki 1
  • 2. Hadoop Hadoop (Elastic MapReduce) 2
  • 3. naoki yanai (id:yanaoki) Web Hadooop m m iPhone Ruby Java 3
  • 4. Hadoop 4
  • 5. Hadoop Java Apache 5
  • 6. Hadoop Google 2004 MapReduce http://labs.google.com/papers/mapreduce.html Google File System (GFS) http://labs.google.com/papers/gfs.html 2010 Google 6
  • 7. Hadoop Web → 7
  • 8. 8
  • 9. Hadoop 9
  • 10. Hadoop Yahoo Yahoo Hadoop Facebook Amazon 10
  • 11. Hadoop RDBMS Join mapreduce join SQL Hadoop 11
  • 12. Hadoop MapReduce web HDFS RDB Web Hadoop 12
  • 13. Hadoop MapReduce web HDFS RDB Web Hadoop 13
  • 14. Hadoop N Hadoop 14
  • 15. Hadoop MapReduce HDFS Hadoop MapReduce HDFS 15
  • 16. MapReduce → map → reduce → map reduce hadoop key-value Hadoop 16
  • 17. HDFS Hadoop MapReduce 17
  • 18. MapReduce slave MR:TaskTracker master MR:JobTracker slave MR:TaskTracker (Job) (map reduce 18
  • 19. HDFS slave HDFS:DataNode master HDFS:NameNode slave HDFS:DataNode 19
  • 20. Hadoop MapReduce HDFS slave MR:TaskTracker master HDFS:DataNode MR:JobTracker slave HDFS:NameNode MR:TaskTracker HDFS:DataNode Hadoop HDFS MapReduce map reduce JobTracker map reduce 20
  • 21. MapReduce AA A3 AB B2 BC C1 input output map reduce 21
  • 22. MapReduce Example Google map(String key, String value): / key: document name / / value: document contents / for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values) / key: a word / / values: a list of counts / int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 22
  • 23. MapReduce A:1 A:1 map A:<1,1,1> A:3 C:1 AA AB C:<1> reduce BC A:1 B:1 map B:2 HDFS B:<1,1> reduce B:1 input map C:1 HDFS shuffle output map reduce (sort) 23
  • 24. MapReduce Google 24
  • 25. Hadoop Mahout Hadoop Apache CollaborativeFiltering Classifier Clustering DecisionForest 25
  • 26. Hadoop 26
  • 27. Hadoop 27
  • 28. Amazon Web Service EC2 28
  • 29. Amazon Web Service WebAPI 29
  • 30. Amazon Web Service EC2 ( Elastic Compute Cloud ) root/admin S3 ( Simple Storage Service ) EMR ( Elastic MapReduce ) Web Hadoop → MapReduce EC2 S3 +α 30
  • 31. Elastic MapReduce Hadoop Hadoop input output S3 31
  • 32. Elastic MapReduce Amazon 32
  • 33. Elastic MapReduce client cloud master API Job input/output slave S3 slave slave 33
  • 34. Elastic MapReduce MapReduce 34
  • 35. Elastic MapReduce Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming http://developer.amazonwebservices.com/connect/ entry.jspa?externalID=2294 Item 35
  • 36. Elastic MapReduce map/reduce map/reduce input http://www.grouplens.org/ 5 36
  • 37. Elastic MapReduce input S3 [ ID] [ ID] [ ] map/reduce output S3 [ ID] [ ID] [ ] 37
  • 38. Elastic MapReduce S S map map reduce map map reduce map map reduce map reduce reduce map reduce reduce map reduce reduce 38
  • 39. Elastic MapReduce step1 : input key:[] value:[ ID_ ID_ ] map ID key:[ ID] values[ ID_ ] reduce ID output ID ¥t ID_ | ID_ |... 39
  • 40. Elastic MapReduce step2 : input key:[ ID] value:[ ID_ | ID_ |...] ID map key:[ IDx_ IDy] values[ x_ y] ID reduce output IDx_ _ IDy 40
  • 41. Elastic MapReduce step3 : input IDx_ _ IDy IDx_(1- ) key map map map key: < IDx_(1- )> values < IDy> reduce 1- output IDx_ IDy_ 41
  • 42. Elastic MapReduce 42
  • 43. Elastic MapReduce 1 elastic-mapreduce --create --name "item similarity job" --alive --log-uri s3n://bucket /logs --num-instances 10 --instance-type m1.small --availability-zone us-west-1a 43
  • 44. EC2 EC2 44
  • 45. Elastic MapReduce WAITING 45
  • 46. Elastic MapReduce 2 S3 (s3cmd input map/reduce python s3cmd.rb put bucket :input/input.tsv input.tsv s3cmd.rb put bucket :script/map.py map1.py s3cmd.rb put bucket :script/reduce1.py reduce1.py ... 46
  • 47. Elastic MapReduce 4 Job elastic-mapreduce --job-flow-id j-2ROU0QKL6KOV6 --json item_similarity.json 47
  • 48. Elastic MapReduce Step1 RUNNING 48
  • 49. Elastic MapReduce 5 output s3sync.rb -r --make-dirs bucket :output . elastic-mapreduce --terminate --job-flow-id j-2ROU0QKL6KOV6 49
  • 50. Hadoop x 50
  • 51. Hadoop Tom White ( ) ( ) ( ) ¥4,830 51
  • 52. 52

×