800万人の"食べたい"をHadoopで分散処理

5,460 views
5,417 views

Published on

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,460
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
68
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

800万人の"食べたい"をHadoopで分散処理

  1. 1. 800 ” ” Hadoop
  2. 2. • id:sasata299 ( ) • Ruby Perl • • http://blog.livedoor.jp/sasata299/
  3. 3. 1. Hadoop 2. Hadoop 3. 4. 5.
  4. 4. Hadoop
  5. 5. 816 30 3 1
  6. 6. ( )
  7. 7. ( )
  8. 8. • • GROUP BY ( ( Д`) • 7000 ( )
  9. 9. !!
  10. 10. Hadoop
  11. 11. Hadoop
  12. 12. • Google MapReduce • • • HDFS
  13. 13. ( ) ( ) Mapper Reducer ( ) ( )
  14. 14. ‣ Hadoop Streaming ‣ Ruby ‣ EC2 Hadoop ( 50 ) ‣ HDFS S3 (s3fs)
  15. 15. ( ) ( ) Mapper ( ) ( )
  16. 16. HDFS Mapper, Reducer
  17. 17. Hadoop cat `hadoop dfs -cat s3://xxx/user/root/in/hoge` ※
  18. 18. require ‘csv’ path = ‘s3://xxx/user/root/in/user_info’ # user_info = `hadoop dfs -cat #{path}` ARGF.each_line do |line| # line.chomp! csv = CSV.parse(line) # user_info end
  19. 19. 7000 ( )→
  20. 20. 7000 ( )→ 30
  21. 21. Hadoop !!
  22. 22. • Mapper, Reducer HDFS (Hadoop cat) • • DB

×