Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

800万人の"食べたい"をHadoopで分散処理

5,650 views

Published on

  • Be the first to comment

800万人の"食べたい"をHadoopで分散処理

  1. 1. 800 ” ” Hadoop
  2. 2. • id:sasata299 ( ) • Ruby Perl • • http://blog.livedoor.jp/sasata299/
  3. 3. 1. Hadoop 2. Hadoop 3. 4. 5.
  4. 4. Hadoop
  5. 5. 816 30 3 1
  6. 6. ( )
  7. 7. ( )
  8. 8. • • GROUP BY ( ( Д`) • 7000 ( )
  9. 9. !!
  10. 10. Hadoop
  11. 11. Hadoop
  12. 12. • Google MapReduce • • • HDFS
  13. 13. ( ) ( ) Mapper Reducer ( ) ( )
  14. 14. ‣ Hadoop Streaming ‣ Ruby ‣ EC2 Hadoop ( 50 ) ‣ HDFS S3 (s3fs)
  15. 15. ( ) ( ) Mapper ( ) ( )
  16. 16. HDFS Mapper, Reducer
  17. 17. Hadoop cat `hadoop dfs -cat s3://xxx/user/root/in/hoge` ※
  18. 18. require ‘csv’ path = ‘s3://xxx/user/root/in/user_info’ # user_info = `hadoop dfs -cat #{path}` ARGF.each_line do |line| # line.chomp! csv = CSV.parse(line) # user_info end
  19. 19. 7000 ( )→
  20. 20. 7000 ( )→ 30
  21. 21. Hadoop !!
  22. 22. • Mapper, Reducer HDFS (Hadoop cat) • • DB

×