Hadoopをemr経由で利用する方法

6,519 views
6,394 views

Published on

2010/9/30の頓智・さんとの勉強会で使ったスライドです

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,519
On SlideShare
0
From Embeds
0
Number of Embeds
2,411
Actions
Shares
0
Downloads
46
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

































  • Hadoopをemr経由で利用する方法

    1. 1. Elastic MapReduce Hadoop EMR
    2. 2. • (@sasata299) • NoSQL • • http://blog.livedoor.jp/sasata299/
    3. 3. Hadoop
    4. 4. etc…
    5. 5. • • EC2 Hadoop & S3 • Cloudera (CDH1) • • Hadoop Streaming (Ruby ) •
    6. 6. • • ( ) • • master ssh • Hadoop (HADOOP-6254) • S3 cpu • S3 → …
    7. 7. SocketTimeoutException
    8. 8. HADOOP-6254 Elastic MapReduce !! https://issues.apache.org/jira/browse/HADOOP-6254
    9. 9. HADOOP-6254 Cloudera (CDH2) !! http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.88.releasenotes.html
    10. 10. Elastic Mapreduce (EMR)
    11. 11. • EC2, S3 • • • GUI( )
    12. 12. • EC2, S3 → • → • → • GUI( ) →CUI •
    13. 13. • EC2, S3 → • → • → • GUI( ) →CUI •
    14. 14. EMR CDH2 AMI (Amazon Machine UP Image) EMR CDH2
    15. 15. EMR CDH2 AMI (Amazon Machine UP Image) EMR CDH2
    16. 16. EMR !! (eHarmony)
    17. 17.
    18. 18. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
    19. 19. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
    20. 20. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
    21. 21. EMR BootStrap Action Step (Hadoop Job) Job Flow ( )
    22. 22. ( ) elastic-mapreduce --create # --num-instances 10 # master:1 , slave:9 --bootstrap-action s3n://xxx/hoge.sh # --alive #
    23. 23. ( ) elastic-mapreduce --create # --num-instances 10 # master:1 , slave:9 --bootstrap-action s3n://xxx/hoge.sh # --alive # Created job flow j-8IXS98OW1WEE ID
    24. 24. ( ) elastic-mapreduce --stream # Hadoop streaming --input, --output, --mapper, --reducer # --cache s3n://xxx/fuga.rb # --jobconf xxx=yyy # --jobflow j-xxxxx # ID
    25. 25. ( ) elastic-mapreduce --stream # Hadoop streaming --input, --output, --mapper, --reducer # --cache s3n://xxx/fuga.rb # --jobconf xxx=yyy # --jobflow j-xxxxx # ID
    26. 26. • • • • --alive • AMI • Cloudera AMI • BootStrap Action
    27. 27. • • mapred.child.java.opts • Java • Streaming • • • ElasticMapReduce-master 5100
    28. 28. • EMR Hadoop • EMR • • --alive

    ×