COOKPAD
Hadoop
http://cookpad.com/
@sasata299 (   )
@sasata299 (   )
@sasata299 (   )
•          Hadoop

• Hadoop
•
•          Hadoop

• Hadoop
•
COOKPAD
               896
          30         3   1
COOKPAD
               896
          30         3   1




                             etc..
”   ”
GROUP BY




                 MySQL
(   3.5   )
GROUP BY




                 MySQL
(   3.5   )




          7000
…
…
…
Hadoop


• Google MapReduce OSS
•
•
Hadoop
  •
  •
  •
master(             )




                             slave(        )




             key   Reducer


Mapper            ...
(2009/10   )



•
•
•   MapReduce
•               etc..
Hadoop
•      Hadoop

• EC2 S3
                            ver. 0.18.3




• Cloudera & Hadoop Streaming
S3 Native FileSystem
  • Hadoop
  •                    5GB
  •            s3n:// ← ”n”



S3 Block FileSystem
  • Hadoop
 ...
7000
  1/223




 30
Hadoop++
   ←Hadoop


      ↓MySQL
•          Hadoop

• Hadoop
•
cat hoge.csv | ruby mapper.rb | ruby reducer.rb



Reducer
 Mapper                               Reducer
 Mapper→Reducer  ...
master

1) -file
               master   slave                   scp

      hadoop jar xxx.jar
       -mapper hoge.rb -redu...
S3

1) -cacheFile
            S3                    slave

      hadoop jar xxx.jar
       -mapper hoge.rb -reducer fuga.r...
p target_ids.size # 50000

   ARGF.each do |log|
    log.chomp!
    id, type, ... = log.split(/,/)
    next if target_ids....
[13930, 29011, 39291, ...] # 50000

                  1000


{
    ‘139’ => [13930, 13989, 13991, ...], # 50
    ‘290’ => ...
50
hash = Hash.new {|h,k| h[k] = []}
target_ids.each do |id|
  hash[ id.to_s[0,3] ] << id
end

ARGF.each do |log|
 log.cho...
•          Hadoop

• Hadoop
•
8              7                                        8      …




                                                  htt...
8Amazon7 Elastic             MapReduce                  8      …




                                                  htt...
8Amazon7 Elastic             MapReduce                  8      …




                                                  htt...
Amazon Elastic MapReduce
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
COOKPADでのHadoop利用
Upcoming SlideShare
Loading in …5
×

COOKPADでのHadoop利用

6,726 views

Published on

Published in: Technology
1 Comment
15 Likes
Statistics
Notes
No Downloads
Views
Total views
6,726
On SlideShare
0
From Embeds
0
Number of Embeds
1,466
Actions
Shares
0
Downloads
108
Comments
1
Likes
15
Embeds 0
No embeds

No notes for slide








































  • COOKPADでのHadoop利用

    1. 1. COOKPAD Hadoop
    2. 2. http://cookpad.com/
    3. 3. @sasata299 ( )
    4. 4. @sasata299 ( )
    5. 5. @sasata299 ( )
    6. 6. • Hadoop • Hadoop •
    7. 7. • Hadoop • Hadoop •
    8. 8. COOKPAD 896 30 3 1
    9. 9. COOKPAD 896 30 3 1 etc..
    10. 10. ” ”
    11. 11. GROUP BY MySQL ( 3.5 )
    12. 12. GROUP BY MySQL ( 3.5 ) 7000
    13. 13.
    14. 14.
    15. 15.
    16. 16. Hadoop • Google MapReduce OSS • •
    17. 17. Hadoop • • •
    18. 18. master( ) slave( ) key Reducer Mapper Reducer
    19. 19. (2009/10 ) • • • MapReduce • etc..
    20. 20. Hadoop • Hadoop • EC2 S3 ver. 0.18.3 • Cloudera & Hadoop Streaming
    21. 21. S3 Native FileSystem • Hadoop • 5GB • s3n:// ← ”n” S3 Block FileSystem • Hadoop • HDFS • • s3://
    22. 22. 7000 1/223 30
    23. 23. Hadoop++ ←Hadoop ↓MySQL
    24. 24. • Hadoop • Hadoop •
    25. 25. cat hoge.csv | ruby mapper.rb | ruby reducer.rb Reducer Mapper Reducer Mapper→Reducer key Reducer
    26. 26. master 1) -file master slave scp hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -file 2) mapper, reducer File.open(‘ ’) {|f| ...}
    27. 27. S3 1) -cacheFile S3 slave hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -cacheFile s3n://path/to/ # 2) mapper, reducer File.open(‘ ’) {|f| ...}
    28. 28. p target_ids.size # 50000 ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if target_ids.include?(id) end target_ids 5 …
    29. 29. [13930, 29011, 39291, ...] # 50000 1000 { ‘139’ => [13930, 13989, 13991, ...], # 50 ‘290’ => [29011, 29098, 29076, ...], # 50 ‘392’ => [39291, 39244, 39251, ...], # 50 }
    30. 30. 50 hash = Hash.new {|h,k| h[k] = []} target_ids.each do |id| hash[ id.to_s[0,3] ] << id end ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if hash[ id[0,3] ].include?(id) end
    31. 31. • Hadoop • Hadoop •
    32. 32. 8 7 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
    33. 33. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
    34. 34. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 Amazon Elastic MapReduce S3 Native FileSystem java.net.SocketTimeoutException: Read timed out Hadoop 0.21
    35. 35. Amazon Elastic MapReduce

    ×