0
961
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•          (@sasata299)

• 2009 8                  JOIN

•
• Hadoop
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•      961

• 30         3   1

•
   -
   -
•
•       (       ,   , ...)

    -
    -       (   ,       ,    , …)

    -
•         Hadoop     MySQL

•
    -   GROUP BY

•                   7000      …

•                  (´Д`   )
• MySQL
   -
•
   -
  -
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
Hadoop
• Google   MapReduce   OSS

•
   -
   -
   -
   -
Hadoop
master (   )

               slave (   )
Hadoop
master (   )

               slave (   )




Map
Hadoop
master (    )

                       slave (   )



           <key,value>
Map
           Shuffle & Sort
Hadoop
master (    )

                       slave (            )



           <key,value>
Map                           ...
• Hadoop Streaming (Ruby       )

• EC2 Cloudera Hadoop
   - Cloudera CDH1
   - Hadoop           0.18.3

•                ...
MySQL → Hadoop
•
• GROUP BY        MapReduce
    -       (          )
    -           key

• JOIN   MapReduce
    •
(1)   master



(2)   S3
(1)   master



(2)   S3
(1)   master
               master   slave scp


(2)   S3
(1)        master
                    master      slave scp


(2)        S3


      S3            slave scp
MySQL vs Hadoop


   7000




    MySQL    Hadoop
   MySQL    Hadoop
MySQL vs Hadoop

            ( Д )
   7000
            30

    MySQL    Hadoop
   MySQL    Hadoop
Hadoop++
   ←Hadoop


        ↓MySQL
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
• Hadoop
   -
• Hadoop                        (HADOOP-6254)
   -   S3
   -   SocketTimeoutException
• EMR (Elastic MapReduce)
   -   Amazon               Hadoop

• Cloudera   CDH2
   -
AMI
            (Amazon Machine
       UP       Image)




EMR


CDH2
AMI
            (Amazon Machine
       UP       Image)




EMR


CDH2
EMR




Job Flow (   )
EMR
             BootStrap Action




Job Flow (               )
EMR
             BootStrap Action


             Step (Hadoop Job)




Job Flow (               )
EMR
             BootStrap Action


             Step (Hadoop Job)




Job Flow (               )
•
    -
    - --alive
• AMI
   -            AMI

    - BootStrap Action
Created job flow j-8IXS98OW1WEE
                         ID
Hadoop
•
    - mapred.child.java.opts
    -   streaming

•
    -
    -   ElasticMapReduce-master 5100
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
• Map
   -
• Reduce
   -       key   Reduce
   -
UU


      Map


     Reduce
UU


           Map
     ID

          Reduce
UU


           Map


          Reduce
     ID
UU


           Map


          Reduce
     ID
Map


Reduce
Map
ID key

         Reduce
Map


               Reduce
key   Reduce
Map
100
                 100

                       Reduce
  key   Reduce
×
                        Map
100
                 ×
                 100

                       Reduce
  key   Reduce
×
                           Map
100
                   ×100

                          Reduce
      Reduce   key sort
×
                           Map
100
                   ×100

                          Reduce
      Reduce   key sort
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
Hadoop

 •
     -
         Hadoop
•
•
• Hadoop         (Cloudera)

• Elastic MapReduce
•
•
•                Hadoop
    -
    -
    -   Reduce
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
961万人の食卓を支えるデータ解析
Upcoming SlideShare
Loading in...5
×

961万人の食卓を支えるデータ解析

9,978

Published on

2010/10/18のJJUG CCC 2010 Fallの講演で使用したスライドです

Published in: Technology, Spiritual

Transcript of "961万人の食卓を支えるデータ解析"

  1. 1. 961
  2. 2. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  3. 3. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  4. 4. • (@sasata299) • 2009 8 JOIN • • Hadoop
  5. 5. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  6. 6. • 961 • 30 3 1 • - -
  7. 7. • • ( , , ...) - - ( , , , …) -
  8. 8. • Hadoop MySQL • - GROUP BY • 7000 … • (´Д` )
  9. 9. • MySQL - • - -
  10. 10. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  11. 11. Hadoop • Google MapReduce OSS • - - - -
  12. 12. Hadoop master ( ) slave ( )
  13. 13. Hadoop master ( ) slave ( ) Map
  14. 14. Hadoop master ( ) slave ( ) <key,value> Map Shuffle & Sort
  15. 15. Hadoop master ( ) slave ( ) <key,value> Map Reduce Shuffle & Sort
  16. 16. • Hadoop Streaming (Ruby ) • EC2 Cloudera Hadoop - Cloudera CDH1 - Hadoop 0.18.3 • S3
  17. 17. MySQL → Hadoop • • GROUP BY MapReduce - ( ) - key • JOIN MapReduce •
  18. 18. (1) master (2) S3
  19. 19. (1) master (2) S3
  20. 20. (1) master master slave scp (2) S3
  21. 21. (1) master master slave scp (2) S3 S3 slave scp
  22. 22. MySQL vs Hadoop 7000 MySQL Hadoop MySQL Hadoop
  23. 23. MySQL vs Hadoop ( Д ) 7000 30 MySQL Hadoop MySQL Hadoop
  24. 24. Hadoop++ ←Hadoop ↓MySQL
  25. 25. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  26. 26. • Hadoop - • Hadoop (HADOOP-6254) - S3 - SocketTimeoutException
  27. 27. • EMR (Elastic MapReduce) - Amazon Hadoop • Cloudera CDH2 -
  28. 28. AMI (Amazon Machine UP Image) EMR CDH2
  29. 29. AMI (Amazon Machine UP Image) EMR CDH2
  30. 30. EMR Job Flow ( )
  31. 31. EMR BootStrap Action Job Flow ( )
  32. 32. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
  33. 33. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
  34. 34. • - - --alive • AMI - AMI - BootStrap Action
  35. 35. Created job flow j-8IXS98OW1WEE ID
  36. 36. Hadoop
  37. 37. • - mapred.child.java.opts - streaming • - - ElasticMapReduce-master 5100
  38. 38. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  39. 39. • Map - • Reduce - key Reduce -
  40. 40. UU Map Reduce
  41. 41. UU Map ID Reduce
  42. 42. UU Map Reduce ID
  43. 43. UU Map Reduce ID
  44. 44. Map Reduce
  45. 45. Map ID key Reduce
  46. 46. Map Reduce key Reduce
  47. 47. Map 100 100 Reduce key Reduce
  48. 48. × Map 100 × 100 Reduce key Reduce
  49. 49. × Map 100 ×100 Reduce Reduce key sort
  50. 50. × Map 100 ×100 Reduce Reduce key sort
  51. 51. Hadoop • - Hadoop
  52. 52. Hadoop • - Hadoop
  53. 53. Hadoop • - Hadoop
  54. 54. Hadoop • - Hadoop
  55. 55. Hadoop • - Hadoop
  56. 56. • • • Hadoop (Cloudera) • Elastic MapReduce • •
  57. 57. • Hadoop - - - Reduce
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×