Hadoop!
2010/05/16
  naoki yanai
   id:yanaoki




                1
Hadoop

Hadoop
(Elastic MapReduce)


                      2
naoki yanai (id:yanaoki)
Web

Hadooop



                           m        m

                           iPhone

       ...
Hadoop




         4
Hadoop




Java

Apache




                  5
Hadoop
Google 2004

MapReduce
  http://labs.google.com/papers/mapreduce.html

Google File System (GFS)
  http://labs.googl...
Hadoop

Web




  →




               7
8
Hadoop
         9
Hadoop
Yahoo
  Yahoo      Hadoop




     Facebook Amazon

                       10
Hadoop
RDBMS




 Join   mapreduce join




SQL     Hadoop

                         11
Hadoop


                         MapReduce
        web
                     HDFS




          RDB

  Web           Hadoo...
Hadoop


                         MapReduce
        web
                     HDFS




          RDB

  Web           Hadoo...
Hadoop


            N




Hadoop



                14
Hadoop
MapReduce HDFS

Hadoop

         MapReduce HDFS




                          15
MapReduce


      → map       → reduce      →

map   reduce    hadoop

                    key-value

               Hadoo...
HDFS

Hadoop


MapReduce




                   17
MapReduce

                          slave
                                   MR:TaskTracker
master
         MR:JobTracker...
HDFS

                       slave
                               HDFS:DataNode
master
     HDFS:NameNode
                ...
Hadoop
                  MapReduce HDFS
                                         slave
                                   ...
MapReduce


AA                A3
AB                B2
BC                C1
input            output


 map    reduce

     ...
MapReduce
Example                               Google

map(String key, String value):
 / key: document name
  /
 / value:...
MapReduce

              A:1
              A:1
        map              A:<1,1,1>
                                        ...
MapReduce




  Google
            24
Hadoop
Mahout

Hadoop

Apache




  CollaborativeFiltering
  Classifier
  Clustering
  DecisionForest

                    ...
Hadoop




         26
Hadoop




         27
Amazon Web Service   EC2




                           28
Amazon Web Service




WebAPI




                     29
Amazon Web Service
 EC2 ( Elastic Compute Cloud )
                                 root/admin



 S3 ( Simple Storage Serv...
Elastic MapReduce
            Hadoop




                  Hadoop


input    output     S3


                           31
Elastic MapReduce




 Amazon




                    32
Elastic MapReduce
client           cloud         master
          API            Job




         input/output            ...
Elastic MapReduce




      MapReduce




                    34
Elastic MapReduce
Finding Similar Items with Amazon Elastic MapReduce,
Python, and Hadoop Streaming
http://developer.amazo...
Elastic MapReduce
                          map/reduce


               map/reduce
input       http://www.grouplens.org/
 ...
Elastic MapReduce

input S3
[       ID] [        ID] [     ]

    map/reduce

    output      S3
[         ID] [        ID...
Elastic MapReduce
S                                                   S



    map
     map reduce     map
               ...
Elastic MapReduce
          step1 :
input
  key:[] value:[   ID_           ID_       ]




           map                 ...
Elastic MapReduce
 step2 :
input
  key:[     ID] value:[       ID_          |     ID_          |...]



                  ...
Elastic MapReduce
         step3 :
input
         IDx_        _           IDy


                             IDx_(1-      ...
Elastic MapReduce




                    42
Elastic MapReduce
              1



elastic-mapreduce 
--create 
--name "item similarity job" 
--alive 
--log-uri s3n://b...
EC2




EC2
            44
Elastic MapReduce




WAITING
                         45
Elastic MapReduce
             2
                            S3                     (s3cmd
         input
            map/...
Elastic MapReduce
           4
     Job



elastic-mapreduce 
--job-flow-id j-2ROU0QKL6KOV6 
--json item_similarity.json


...
Elastic MapReduce




Step1       RUNNING
                      48
Elastic MapReduce
            5
      output



s3sync.rb -r --make-dirs bucket   :output .

elastic-mapreduce 
--terminat...
Hadoop
         x




             50
Hadoop

Tom White ( )
         (      )
         (      )




¥4,830




                    51
52
Upcoming SlideShare
Loading in...5
×

Hadoop入門とクラウド利用

17,205

Published on

0 Comments
38 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
17,205
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
38
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop入門とクラウド利用"

  1. 1. Hadoop! 2010/05/16 naoki yanai id:yanaoki 1
  2. 2. Hadoop Hadoop (Elastic MapReduce) 2
  3. 3. naoki yanai (id:yanaoki) Web Hadooop m m iPhone Ruby Java 3
  4. 4. Hadoop 4
  5. 5. Hadoop Java Apache 5
  6. 6. Hadoop Google 2004 MapReduce http://labs.google.com/papers/mapreduce.html Google File System (GFS) http://labs.google.com/papers/gfs.html 2010 Google 6
  7. 7. Hadoop Web → 7
  8. 8. 8
  9. 9. Hadoop 9
  10. 10. Hadoop Yahoo Yahoo Hadoop Facebook Amazon 10
  11. 11. Hadoop RDBMS Join mapreduce join SQL Hadoop 11
  12. 12. Hadoop MapReduce web HDFS RDB Web Hadoop 12
  13. 13. Hadoop MapReduce web HDFS RDB Web Hadoop 13
  14. 14. Hadoop N Hadoop 14
  15. 15. Hadoop MapReduce HDFS Hadoop MapReduce HDFS 15
  16. 16. MapReduce → map → reduce → map reduce hadoop key-value Hadoop 16
  17. 17. HDFS Hadoop MapReduce 17
  18. 18. MapReduce slave MR:TaskTracker master MR:JobTracker slave MR:TaskTracker (Job) (map reduce 18
  19. 19. HDFS slave HDFS:DataNode master HDFS:NameNode slave HDFS:DataNode 19
  20. 20. Hadoop MapReduce HDFS slave MR:TaskTracker master HDFS:DataNode MR:JobTracker slave HDFS:NameNode MR:TaskTracker HDFS:DataNode Hadoop HDFS MapReduce map reduce JobTracker map reduce 20
  21. 21. MapReduce AA A3 AB B2 BC C1 input output map reduce 21
  22. 22. MapReduce Example Google map(String key, String value): / key: document name / / value: document contents / for each word w in value: EmitIntermediate(w, "1"); reduce(String key, Iterator values) / key: a word / / values: a list of counts / int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result)); 22
  23. 23. MapReduce A:1 A:1 map A:<1,1,1> A:3 C:1 AA AB C:<1> reduce BC A:1 B:1 map B:2 HDFS B:<1,1> reduce B:1 input map C:1 HDFS shuffle output map reduce (sort) 23
  24. 24. MapReduce Google 24
  25. 25. Hadoop Mahout Hadoop Apache CollaborativeFiltering Classifier Clustering DecisionForest 25
  26. 26. Hadoop 26
  27. 27. Hadoop 27
  28. 28. Amazon Web Service EC2 28
  29. 29. Amazon Web Service WebAPI 29
  30. 30. Amazon Web Service EC2 ( Elastic Compute Cloud ) root/admin S3 ( Simple Storage Service ) EMR ( Elastic MapReduce ) Web Hadoop → MapReduce EC2 S3 +α 30
  31. 31. Elastic MapReduce Hadoop Hadoop input output S3 31
  32. 32. Elastic MapReduce Amazon 32
  33. 33. Elastic MapReduce client cloud master API Job input/output slave S3 slave slave 33
  34. 34. Elastic MapReduce MapReduce 34
  35. 35. Elastic MapReduce Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming http://developer.amazonwebservices.com/connect/ entry.jspa?externalID=2294 Item 35
  36. 36. Elastic MapReduce map/reduce map/reduce input http://www.grouplens.org/ 5 36
  37. 37. Elastic MapReduce input S3 [ ID] [ ID] [ ] map/reduce output S3 [ ID] [ ID] [ ] 37
  38. 38. Elastic MapReduce S S map map reduce map map reduce map map reduce map reduce reduce map reduce reduce map reduce reduce 38
  39. 39. Elastic MapReduce step1 : input key:[] value:[ ID_ ID_ ] map ID key:[ ID] values[ ID_ ] reduce ID output ID ¥t ID_ | ID_ |... 39
  40. 40. Elastic MapReduce step2 : input key:[ ID] value:[ ID_ | ID_ |...] ID map key:[ IDx_ IDy] values[ x_ y] ID reduce output IDx_ _ IDy 40
  41. 41. Elastic MapReduce step3 : input IDx_ _ IDy IDx_(1- ) key map map map key: < IDx_(1- )> values < IDy> reduce 1- output IDx_ IDy_ 41
  42. 42. Elastic MapReduce 42
  43. 43. Elastic MapReduce 1 elastic-mapreduce --create --name "item similarity job" --alive --log-uri s3n://bucket /logs --num-instances 10 --instance-type m1.small --availability-zone us-west-1a 43
  44. 44. EC2 EC2 44
  45. 45. Elastic MapReduce WAITING 45
  46. 46. Elastic MapReduce 2 S3 (s3cmd input map/reduce python s3cmd.rb put bucket :input/input.tsv input.tsv s3cmd.rb put bucket :script/map.py map1.py s3cmd.rb put bucket :script/reduce1.py reduce1.py ... 46
  47. 47. Elastic MapReduce 4 Job elastic-mapreduce --job-flow-id j-2ROU0QKL6KOV6 --json item_similarity.json 47
  48. 48. Elastic MapReduce Step1 RUNNING 48
  49. 49. Elastic MapReduce 5 output s3sync.rb -r --make-dirs bucket :output . elastic-mapreduce --terminate --job-flow-id j-2ROU0QKL6KOV6 49
  50. 50. Hadoop x 50
  51. 51. Hadoop Tom White ( ) ( ) ( ) ¥4,830 51
  52. 52. 52

×