COOKPADでのHadoop利用

  • 5,925 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
5,925
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
101
Comments
1
Likes
13

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide








































Transcript

  • 1. COOKPAD Hadoop
  • 2. http://cookpad.com/
  • 3. @sasata299 ( )
  • 4. @sasata299 ( )
  • 5. @sasata299 ( )
  • 6. • Hadoop • Hadoop •
  • 7. • Hadoop • Hadoop •
  • 8. COOKPAD 896 30 3 1
  • 9. COOKPAD 896 30 3 1 etc..
  • 10. ” ”
  • 11. GROUP BY MySQL ( 3.5 )
  • 12. GROUP BY MySQL ( 3.5 ) 7000
  • 13.
  • 14.
  • 15.
  • 16. Hadoop • Google MapReduce OSS • •
  • 17. Hadoop • • •
  • 18. master( ) slave( ) key Reducer Mapper Reducer
  • 19. (2009/10 ) • • • MapReduce • etc..
  • 20. Hadoop • Hadoop • EC2 S3 ver. 0.18.3 • Cloudera & Hadoop Streaming
  • 21. S3 Native FileSystem • Hadoop • 5GB • s3n:// ← ”n” S3 Block FileSystem • Hadoop • HDFS • • s3://
  • 22. 7000 1/223 30
  • 23. Hadoop++ ←Hadoop ↓MySQL
  • 24. • Hadoop • Hadoop •
  • 25. cat hoge.csv | ruby mapper.rb | ruby reducer.rb Reducer Mapper Reducer Mapper→Reducer key Reducer
  • 26. master 1) -file master slave scp hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -file 2) mapper, reducer File.open(‘ ’) {|f| ...}
  • 27. S3 1) -cacheFile S3 slave hadoop jar xxx.jar -mapper hoge.rb -reducer fuga.rb -file hoge.rb -file fuga.rb -cacheFile s3n://path/to/ # 2) mapper, reducer File.open(‘ ’) {|f| ...}
  • 28. p target_ids.size # 50000 ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if target_ids.include?(id) end target_ids 5 …
  • 29. [13930, 29011, 39291, ...] # 50000 1000 { ‘139’ => [13930, 13989, 13991, ...], # 50 ‘290’ => [29011, 29098, 29076, ...], # 50 ‘392’ => [39291, 39244, 39251, ...], # 50 }
  • 30. 50 hash = Hash.new {|h,k| h[k] = []} target_ids.each do |id| hash[ id.to_s[0,3] ] << id end ARGF.each do |log| log.chomp! id, type, ... = log.split(/,/) next if hash[ id[0,3] ].include?(id) end
  • 31. • Hadoop • Hadoop •
  • 32. 8 7 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
  • 33. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 S3 Native FileSystem java.net.SocketTimeoutException: Read timed out
  • 34. 8Amazon7 Elastic MapReduce 8 … http://ow.ly/2bdW1 Amazon Elastic MapReduce S3 Native FileSystem java.net.SocketTimeoutException: Read timed out Hadoop 0.21
  • 35. Amazon Elastic MapReduce