Hadoop
http://hadoop.apache.org/
Hadooooo

• Google   MapReduce

•
•
•   PC   …
•   PC
         PC   /
PC
Yahoo! Search Assist

•

•
Hadoop
•
             7000   …

•   Hadoop
816
30         3   1
•
    DB

•        Hadoop   SQL
Hive
• Hadoop
•      SQL(HiveQL)   SQL



• SQL
Hive

•                      (each do ... end)

• Hive    DB,

•           (HiveQL)

• MySQL
            EXISTS
          ...
Hadoop
Hadoop
1) Map
2) Shuffle & Sort
3) Reduce
Map
aaa
bbb
ccc
ddd
eee


          Mapper
                   2
      ※
aaa
bbb
ccc
ddd
eee


      2   aaa
      0   bbb
      1   ccc
      1   ddd
      0   eee
aaa
bbb
ccc
ddd
eee


            2   aaa
            0   bbb
      key   1   ccc value
            1   ddd
            0 ...
Shuffle & Sort
key
Reducer
Map

       2    aaa
       0    bbb
key    1    ccc   value
       1    ddd
       0    eee
Map

 2    aaa
 0    bbb
 1    ccc   Reducer
 1    ddd
 0    eee
Map
key
Map   Reduce



Reduce
Map
 2    aaa
 0    bbb
 1    ccc
 1    ddd
 0    eee


            Reducer
                      1
      ※
key value

2   aaa
0   bbb
1   ccc
1   ddd
0   eee


      Reducer 3
2   aaa
0   bbb
1   ccc
1   ddd
0   eee
2   aaa
0   bbb
1   ccc
1   ddd
0   eee
2   aaa
0   bbb
1   ccc
1   ddd
0   eee
          key   Reducer
Hadoop
Google MapReduce
Reduce

• Reduce

•
                                 …

• Google   MapReduce   Reducer
Hadoop
Iterater
id:naoya


http://d.hatena.ne.jp/naoya/20080513/1210684438
Hadoop
Hadoop

• Hadoop Streaming (Ruby)
• EC2 Hadoop
•                    S3

•             50
EC2        S3
              Amazon



•   EC2 •••


                       ※



•   S3 •••
•
    DB

•        Hadoop   SQL
1.                                  (CSV or
     Marshal)      S3

2. EC2          Hadoop    1.
                          ...
DB
1.                                  (CSV or
     Marshal)        S3

2. EC2     Hadoop         1.
                     ...
Hadoop
1.                                      (CSV or
     Marshal)     S3

2. EC2          Hadoop        1.
            ...
DB
1.                                 (CSV or
     Marshal)      S3

2. EC2          Hadoop   1.
                         ...
MySQL
        …orz
1taaa,bbb,ccc     aaa,bbb,ccc
1thoge,fuga,foo   hoge,fuga,foo
Mapper, Reducer
•


•   Mapper,   Reducer



•
Hadoop    S3




`hadoop dfs -cat s3://xxx/
     input/user_info`
failed to allocate memory
     (NoMemoryError)
Mapper
         or
7000   →

30
•   Hadoop



•        MapReduce
             MapReduce



•               Hadoop
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Hadoopを業務で使ってみた
Upcoming SlideShare
Loading in …5
×

Hadoopを業務で使ってみた

12,858 views
12,713 views

Published on

1 Comment
26 Likes
Statistics
Notes
No Downloads
Views
Total views
12,858
On SlideShare
0
From Embeds
0
Number of Embeds
6,435
Actions
Shares
0
Downloads
220
Comments
1
Likes
26
Embeds 0
No embeds

No notes for slide

Hadoopを業務で使ってみた

  1. 1. Hadoop
  2. 2. http://hadoop.apache.org/
  3. 3. Hadooooo • Google MapReduce • •
  4. 4. • PC …
  5. 5. • PC PC /
  6. 6. PC
  7. 7. Yahoo! Search Assist • •
  8. 8. Hadoop
  9. 9. • 7000 … • Hadoop
  10. 10. 816 30 3 1
  11. 11. • DB • Hadoop SQL
  12. 12. Hive • Hadoop • SQL(HiveQL) SQL • SQL
  13. 13. Hive • (each do ... end) • Hive DB, • (HiveQL) • MySQL EXISTS …
  14. 14. Hadoop
  15. 15. Hadoop
  16. 16. 1) Map 2) Shuffle & Sort 3) Reduce
  17. 17. Map
  18. 18. aaa bbb ccc ddd eee Mapper 2 ※
  19. 19. aaa bbb ccc ddd eee 2 aaa 0 bbb 1 ccc 1 ddd 0 eee
  20. 20. aaa bbb ccc ddd eee 2 aaa 0 bbb key 1 ccc value 1 ddd 0 eee
  21. 21. Shuffle & Sort
  22. 22. key Reducer
  23. 23. Map 2 aaa 0 bbb key 1 ccc value 1 ddd 0 eee
  24. 24. Map 2 aaa 0 bbb 1 ccc Reducer 1 ddd 0 eee
  25. 25. Map key
  26. 26. Map Reduce Reduce
  27. 27. Map 2 aaa 0 bbb 1 ccc 1 ddd 0 eee Reducer 1 ※
  28. 28. key value 2 aaa 0 bbb 1 ccc 1 ddd 0 eee Reducer 3
  29. 29. 2 aaa 0 bbb 1 ccc 1 ddd 0 eee
  30. 30. 2 aaa 0 bbb 1 ccc 1 ddd 0 eee
  31. 31. 2 aaa 0 bbb 1 ccc 1 ddd 0 eee key Reducer
  32. 32. Hadoop Google MapReduce
  33. 33. Reduce • Reduce • … • Google MapReduce Reducer
  34. 34. Hadoop Iterater
  35. 35. id:naoya http://d.hatena.ne.jp/naoya/20080513/1210684438
  36. 36. Hadoop
  37. 37. Hadoop • Hadoop Streaming (Ruby) • EC2 Hadoop • S3 • 50
  38. 38. EC2 S3 Amazon • EC2 ••• ※ • S3 •••
  39. 39. • DB • Hadoop SQL
  40. 40. 1. (CSV or Marshal) S3 2. EC2 Hadoop 1. S3 3. S3 2. MySQL
  41. 41. DB 1. (CSV or Marshal) S3 2. EC2 Hadoop 1. S3 3. S3 2. MySQL
  42. 42. Hadoop 1. (CSV or Marshal) S3 2. EC2 Hadoop 1. S3 3. S3 2. MySQL
  43. 43. DB 1. (CSV or Marshal) S3 2. EC2 Hadoop 1. S3 3. S3 2. MySQL
  44. 44. MySQL …orz
  45. 45. 1taaa,bbb,ccc aaa,bbb,ccc 1thoge,fuga,foo hoge,fuga,foo
  46. 46. Mapper, Reducer
  47. 47. • • Mapper, Reducer •
  48. 48. Hadoop S3 `hadoop dfs -cat s3://xxx/ input/user_info`
  49. 49. failed to allocate memory (NoMemoryError)
  50. 50. Mapper or
  51. 51. 7000 → 30
  52. 52. • Hadoop • MapReduce MapReduce • Hadoop

×