800        ”   ”
  Hadoop
• id:sasata299 (            )

• Ruby Perl
•
•          http://blog.livedoor.jp/sasata299/
1. Hadoop

2. Hadoop

3.

4.

5.
Hadoop
816
30         3   1
(   )
(   )
•
• GROUP BY        (
        (   Д`)

•                     7000   (
    )
!!
Hadoop
Hadoop
• Google   MapReduce

•
•

• HDFS
(            )   (             )



    Mapper           Reducer


(            )   (             )
‣ Hadoop Streaming
‣               Ruby

‣ EC2      Hadoop               (
            50 )

‣   HDFS      S3       (s3fs)
(            )

    (            )


        Mapper       (   )


    (            )
HDFS
Mapper, Reducer
Hadoop            cat



`hadoop dfs -cat
 s3://xxx/user/root/in/hoge`
               ※
require ‘csv’

path = ‘s3://xxx/user/root/in/user_info’ #
user_info = `hadoop dfs -cat #{path}`

ARGF.each_line do |line| ...
7000   (   )→
7000   (   )→

30
Hadoop   !!
• Mapper, Reducer   HDFS
               (Hadoop     cat)

•
• DB
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
800万人の"食べたい"をHadoopで分散処理
Upcoming SlideShare
Loading in …5
×

800万人の"食べたい"をHadoopで分散処理

5,373
-1

Published on

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,373
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
68
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

800万人の"食べたい"をHadoopで分散処理

  1. 1. 800 ” ” Hadoop
  2. 2. • id:sasata299 ( ) • Ruby Perl • • http://blog.livedoor.jp/sasata299/
  3. 3. 1. Hadoop 2. Hadoop 3. 4. 5.
  4. 4. Hadoop
  5. 5. 816 30 3 1
  6. 6. ( )
  7. 7. ( )
  8. 8. • • GROUP BY ( ( Д`) • 7000 ( )
  9. 9. !!
  10. 10. Hadoop
  11. 11. Hadoop
  12. 12. • Google MapReduce • • • HDFS
  13. 13. ( ) ( ) Mapper Reducer ( ) ( )
  14. 14. ‣ Hadoop Streaming ‣ Ruby ‣ EC2 Hadoop ( 50 ) ‣ HDFS S3 (s3fs)
  15. 15. ( ) ( ) Mapper ( ) ( )
  16. 16. HDFS Mapper, Reducer
  17. 17. Hadoop cat `hadoop dfs -cat s3://xxx/user/root/in/hoge` ※
  18. 18. require ‘csv’ path = ‘s3://xxx/user/root/in/user_info’ # user_info = `hadoop dfs -cat #{path}` ARGF.each_line do |line| # line.chomp! csv = CSV.parse(line) # user_info end
  19. 19. 7000 ( )→
  20. 20. 7000 ( )→ 30
  21. 21. Hadoop !!
  22. 22. • Mapper, Reducer HDFS (Hadoop cat) • • DB
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×