ちょっとHadoopについて語ってみるか(仮題)

10,101 views

Published on

Published in: Technology

ちょっとHadoopについて語ってみるか(仮題)

  1. 1. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  2. 2. ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
  3. 3. ✓ ✓
  4. 4. ✓ - HDD - ✓ ✓ ✓
  5. 5. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  6. 6. ✓ ✓Map Reduce
  7. 7. ✓ - grep - - ✓ - - -
  8. 8. PV UU 30+2
  9. 9. 5+1
  10. 10. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  11. 11. •JobTracker TaskTracker Map/Reduce •NameNode DataNode HDFS
  12. 12. •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
  13. 13. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  14. 14. ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
  15. 15. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  16. 16. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  17. 17. ✓Hadoop Streaming ✓LibHDFS ✓Hadoop Pipes ✓Amazon Elastic MapReduce
  18. 18. ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
  19. 19. ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  20. 20. ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  21. 21. ✓C HDFS http://wiki.apache.org/hadoop/LibHDFS
  22. 22. ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
  23. 23. ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
  24. 24. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  25. 25. ✓ - NameNode/TaskTracker - ✓HDFS - HDFS DataNode
  26. 26. ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
  27. 27. ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
  28. 28. ✓ wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
  29. 29. ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
  30. 30. ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
  31. 31. ✓Reduce - mapred.reduce.parallel.copies
  32. 32. ✓Map - mapred.compress.map.output (true )
  33. 33. ✓Map→Reduce HDFS - fs.inmemory.size.mb
  34. 34. ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
  35. 35. ✓Reduce ✓ Reduce - -
  36. 36. ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
  37. 37. ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
  38. 38. ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!

×