はじめてのまっぷりでゅ〜す

4,589 views

Published on

同期の勉強会で発表したスライドです。
間違いも多々あると思うので指摘していただけると嬉しいです。

Published in: Technology, Business

はじめてのまっぷりでゅ〜す

  1. 1. 2011/7/10 @a_bicky
  2. 2. • Takeshi Arabiki ‣ ‣ Twitter: @a_bicky ‣ : id:a_bicky• R• http://d.hatena.ne.jp/a_bicky/
  3. 3. • MapReduce• MapReduce• MapReduce• MapReduce••
  4. 4. MapReduce
  5. 5. MapReduce• TB PB Facebook 20TB• ” ” ” ” ‣ ‣ ‣ etc. ↑ MPI orz MapReduce
  6. 6. MapReduce• Google• map reduce >>> map(lambda x: x ** 2, range(1, 6)) map [1, 4, 9, 16, 25] >>> reduce(lambda a, b: a + b, range(1, 6)) reduce 15• OK• KVS MapReduce Big Table Google File System
  7. 7. Hadoop• Google• MapReduce Hadoop MapReduce Google Hadoop KVS KVS Hadoop MapReduce Big Table HBase MapReduce Hadoop Distributed File System Google File System (HDFS) Google Hadoop
  8. 8. Hadoop MapReduce JobTracker JobClient assign map task assign reduce taskHDFS HDFS mapper copy & sort reducer mapper reducer mapper Map Shuffle Reduce phase phase phase
  9. 9. MapReduce
  10. 10. WordCount JobTracker JobClient HDFSthe end of money is the end of love
  11. 11. WordCount JobTracker JobClient assign map task assign reduce task HDFS the end of love mapperthe end of money is reducer the end of love the end of money is mapper
  12. 12. WordCount JobTracker JobClient the 1 end 1 of 1 money 1 HDFS is 1 mapperthe end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Map phase
  13. 13. WordCount JobTracker JobClient the 1 end 1 end 1 of 1 end 1 money 1 is 1 is 1 love 1 HDFS money 1 of 1 mapper copy & sort of 1 the 1 the 1the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Shuffle phase
  14. 14. WordCount JobTracker JobClient end <1, 1> HDFS is <1> love <1> mapper money <1> of <1, 1> the <1, 1>the end of money is reducer the end of love mapper
  15. 15. WordCount JobTracker JobClient end <1, 1> HDFS is <1> HDFS love <1> mapper money <1> of <1, 1> the <1, 1> end 2 is 1the end of money is love 1 reducer money 1 the end of love of 2 the 2 mapper Reduce phase
  16. 16. MapReduce※ Javamapred.pl 1 #!/usr/bin/env perl 23 package main; # MapReduce Framework 2 use strict; 24 my $phase = shift; 3 use warnings; 25 if ($phase eq map) { # map phase 4 26 while (my $line = <STDIN>) { 5 package MapReduce; 27 chomp $line; # map 6 sub map { map 28 MapReduce::map($line); 7 my $text = shift; 29 } 8 my @words = split /s/, $text; 30 } elsif ($phase eq reduce) { # reduce phase 9 foreach my $word (@words) { 31 my ($prev_key, @values);10 print $word, "t", 1, "n"; 32 while (my $line = <STDIN>) {11 } 33 chomp $line;12 } # 34 my ($key, $value) = split /t/, $line;13 35 if (!$prev_key || $key eq $prev_key) {14 sub reduce { reduce 36 push @values, $value;15 my ($key, @values) = @_; 37 } else { # ( ) reduce16 my $cnt = 0; 38 MapReduce::reduce($prev_key, @values);17 foreach my $value (@values) { 39 @values = ($value);18 $cnt += $value; 40 }19 } 41 $prev_key = $key;20 print $key, "t", $cnt, "n"; 42 } # ( ) reduce21 } 43 MapReduce::reduce($prev_key, @values);22 44 }
  17. 17. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reducetext.txt the end of money is mapper reducer the end of love
  18. 18. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 6 sub map { the 1 7 my $text = shift; end 1 8 my @words = split /s/, $text; of 1 9 foreach my $word (@words) { money 1 10 print $word, "t", 1, "n"; is 1 11 } the 1 12 } end 1 of 1 love 1the end of money is mapper reducer the end of love map Map phase
  19. 19. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce the 1 end 1 end 1 end 1 of 1 is 1 money 1 love 1 is 1 money 1 the 1 of 1 end 1 of 1 of 1 the 1 love 1 the 1the end of money is copy & sort mapper reducer the end of love Shuffle phase
  20. 20. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 14 sub reduce { 15 my ($key, @values) = @_; end <1, 1> 16 my $cnt = 0; is <1> 17 foreach my $value (@values) { love <1> 18 $cnt += $value; money <1> 19 } of <1, 1> 20 print $key, "t", $cnt, "n"; the <1, 1> 21 } end 2the end of money is is 1 mapper reducer love 1 the end of love money 1 of 2 the 2 reduce Reduce phase
  21. 21. MapReduce
  22. 22. MapReduce• Split• Map• Combine• Shuffle• Reduce
  23. 23. Split• HDFS mapper• HDFS 64MB 128MB• mapper HDFS PC
  24. 24. Map• map• HDFS
  25. 25. Combine• Map reducer WordCount Map••
  26. 26. Shuffle • Map Combine reducer reducer shuffle sortmapper hash(the) % 2 = 0 reducer hash(end) % 2 = 0Map hash(is) % 2 = 0 the 1 end 1 end 1the 1 sort end 1 copy end 1 end 1 end 1end 1 partition is 1 is 1 is 1 end 1of 1 the 1 the 1 the 1 fuga 1money 1 end 1 the 1 the 1 hoge 1is 1 sort & merge hash(key) % 2 is 1the 1 the 1end 1 copy the 1of 1 hoge 1love 1 of 1 love 1 fuga 1 sort money 1 partition money 1 of 1 of 1 love 1 of 1 hash(of) % 2 = 1 hash(money) % 2 = 1 hash(love) % 2 = 1 reducer
  27. 27. Reduce• shuffle reducer• reduce• HDFS
  28. 28. MapReduce
  29. 29. MapReduce• ‣ Word Count ‣ Grep ‣ etc.•
  30. 30. MapReduce• MapReduce mapper → reducer → mapper → reducer HDFS MapReduce• WordCount MapReduce MapReduce
  31. 31. MapReduce: Hadoop Streaming• Java map reduce Perl, Python, Ruby, JavaScript etc.• Java MapReduce map Hadoop Streaming mapper map combine ” ” Hadoop Streaming WordCount map #!/usr/bin/env perl use strict; use warnings; while (my $line = <STDIN>) { my @words = split / /, $line; foreach my $word (@words) { print $word . "t" . 1 . "n"; } }
  32. 32. MapReduce: Hadoop Streaming• Java map reduce Perl, Python, Ruby, JavaScript etc.• Java MapReduce map Hadoop Streaming mapper map combine ” ” http://hapyrus.com/ cf. http://www.slideshare.net/fujibee/tokyo- webmining12-8349942
  33. 33. MapReduce: DSL• Pig ‣ Yahoo! SQL ” ” ‣ http://pig.apache.org/ ‣ MapReduce• Hive ‣ Facebook SQL ‣ http://hive.apache.org/ ‣ SQL Pig• Cascading ‣ Pig Java API ‣ Java http://www.cascading.org/1.2/userguide/html/ch10.html
  34. 34. • MapReduce••• Java
  35. 35. • SlideShare • Map Reduce http:// www.slideshare.net/doryokujin/map-reduce-8349406 • Hadoop http://www.slideshare.net/pfi/hadoop-2525724 • Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for- programmer-5202246• Web • MapReduce - naoya http://d.hatena.ne.jp/naoya/ 20080511/1210506301 • Hadoop http://www.atmarkit.co.jp/fjava/index/ index_hadoop_tm.html • Hadoop hBase 1/2 CodeZine http://codezine.jp/article/detail/2448• • ( ), ( ), ( ), ( ), ( ), ( ), Hadoop , , 2011 • Tom White ( ), ( ), ( ), Hadoop, , 2010 • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 6th OSDI, 2004

×