Your SlideShare is downloading. ×
Hadoop本 輪読会 1章〜2章
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop本 輪読会 1章〜2章

4,657
views

Published on

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,657
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
137
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Hadoop
    • 2. 19:00 20:00 ※
    • 3. Hadoop / MapReduce(43p) Hadoop 38p Hadoop I/O(46p) MapRedue (42p) MapReduce (22p) MapReduce (38p) MapReduce (36p) Hadoop (30p) 10 Hadoop (30p) 11 Pig(46p) 12 HBase(30p) 13 ZooKeeper(38p)
    • 4. http://twitter.com/just_do_neet http://ameblo.jp/just-do-neet/ Hadoop MapReduce
    • 5. Hadoop
    • 6. −−Grace Hopper
    • 7. (p1) / 100 / Facebook 2.5 Ancestry.com) 20 The Internet Archive) 15
    • 8. How Much Information? http://hmi.ucsd.edu/pdf/HMI_2009_ConsumerReport_Dec9_2009.pdf
    • 9. (p3) HDD
    • 10. (p3) Hadoop(HDFS (HDFS) (MapReduce)
    • 11. (p4) RDBMS MapReduce
    • 12. Hadoop (p9) Doug Cutting(Douglas Read Cutting) Lucene Nutch Google MapReduce Hadoop Yahoo! Inc. Cloudera http://www.sdtimes.com/blog/post/2009/08/10/Hadoop-creator-goes-to-Cloudera.aspx
    • 13. Hadoop (p9) Doug Cutting
    • 14. Hadoop (p13)
    • 15. [ ]Eclipse http://d.hatena.ne.jp/torazuka/20100102/p1
    • 16. [ ]GAE http://d.hatena.ne.jp/torazuka/20100101/gaetan
    • 17. http://d.hatena.ne.jp/torazuka/20091011/pic
    • 18. Hadoop (p13) Core I/O Avro RPC MapReduce HDFS
    • 19. Hadoop (p13) Pig/HIVE DSL HBase (≒BigTable) ZooKeeper (≒Chubby) Chukwa
    • 20. MapReduce
    • 21. Hadoop (p19) MapReduce map reduce
    • 22. MapReduce
    • 23. MapReduce
    • 24. MapReduce
    • 25. MapReduce
    • 26. MapReduce
    • 27. MapReduce
    • 28. Java MapReduce (p20) Java MapReduce MapReduceBase extend Mapper/Reducer implement MapReduce JobClient
    • 29. < 0.20.0 public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    • 30. < 0.20.0 (20) public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    • 31. JobConf Job Map/Reduce / / /HBase Map/Reduce
    • 32. Map/Reduce Key-Value Mapper<LongWritable, Text, Text, IntWritable> LongWritable Text IntWritable Hadoop Key=Long/Value=Text Key=Text/Value=Int
    • 33. Writable Hadoop Text,LongWritable,IntWritable, BytesWritable..... Java org.apache.hadoop.io.serializer.WritableSerial ization (ReflectionUtil) ※deserialize DataInputStream (Writable#readField / Writable#write)
    • 34. >= 0.20.0 public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
    • 35. >= 0.20.0 (20) public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
    • 36. API JobConf Configuration JobClient Job Mapper/Reducer OutputCollector,Reporter Context Key-Value Map Map
    • 37. (p29) → → Map Reduce →Map/Reduce Map
    • 38. (p29) JobTracker → TaskTracker → JobTracker
    • 39. Hadoop
    • 40. Reduce (p30)
    • 41. Reduce (p31) ※Reduce shuffle
    • 42. (p31) Map→Reduce (max/min/average ) Combiner Reduce (Shuffle Key-Value Map→Reduce
    • 43. Hadoop Streaming (p34) Map/Reduce C/Python/Ruby/perl... Map/Reduce Hadoop hadoop-streaming.jar
    • 44. Hadoop Streaming Map:cat/Reduce:wc HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
    • 45. Hadoop Streaming Python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
    • 46. Hadoop Pipes (p38) Hadoop MapReduce C++ TaskTracker Map/Reduce JNI