ちょっとHadoopについて語ってみるか(仮題)

  • 9,485 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
9,485
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
234
Comments
0
Likes
18

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • 2. ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
  • 3. ✓ ✓
  • 4. ✓ - HDD - ✓ ✓ ✓
  • 5. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • 6. ✓ ✓Map Reduce
  • 7. ✓ - grep - - ✓ - - -
  • 8. PV UU 30+2
  • 9. 5+1
  • 10. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • 11. •JobTracker TaskTracker Map/Reduce •NameNode DataNode HDFS
  • 12. •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
  • 13. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • 14. ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
  • 15. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 16. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 17. ✓Hadoop Streaming ✓LibHDFS ✓Hadoop Pipes ✓Amazon Elastic MapReduce
  • 18. ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
  • 19. ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  • 20. ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  • 21. ✓C HDFS http://wiki.apache.org/hadoop/LibHDFS
  • 22. ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
  • 23. ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
  • 24. Hadoop Map/Reduce Hadoop Hadoop(Map/Reduce)
  • 25. ✓ - NameNode/TaskTracker - ✓HDFS - HDFS DataNode
  • 26. ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
  • 27. ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
  • 28. ✓ wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
  • 29. ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
  • 30. ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
  • 31. ✓Reduce - mapred.reduce.parallel.copies
  • 32. ✓Map - mapred.compress.map.output (true )
  • 33. ✓Map→Reduce HDFS - fs.inmemory.size.mb
  • 34. ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
  • 35. ✓Reduce ✓ Reduce - -
  • 36. ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
  • 37. ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
  • 38. ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!