Hadoop本 輪読会 1章〜2章
Upcoming SlideShare
Loading in...5
×
 

Hadoop本 輪読会 1章〜2章

on

  • 5,712 views

 

Statistics

Views

Total Views
5,712
Views on SlideShare
5,609
Embed Views
103

Actions

Likes
6
Downloads
137
Comments
0

2 Embeds 103

http://ameblo.jp 82
http://www.slideshare.net 21

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop本 輪読会 1章〜2章 Hadoop本 輪読会 1章〜2章 Presentation Transcript

  • Hadoop
  • 19:00 20:00 ※
  • Hadoop / MapReduce(43p) Hadoop 38p Hadoop I/O(46p) MapRedue (42p) MapReduce (22p) MapReduce (38p) MapReduce (36p) Hadoop (30p) 10 Hadoop (30p) 11 Pig(46p) 12 HBase(30p) 13 ZooKeeper(38p)
  • http://twitter.com/just_do_neet http://ameblo.jp/just-do-neet/ Hadoop MapReduce
  • Hadoop
  • −−Grace Hopper
  • (p1) / 100 / Facebook 2.5 Ancestry.com) 20 The Internet Archive) 15
  • How Much Information? http://hmi.ucsd.edu/pdf/HMI_2009_ConsumerReport_Dec9_2009.pdf
  • (p3) HDD
  • (p3) Hadoop(HDFS (HDFS) (MapReduce)
  • (p4) RDBMS MapReduce
  • Hadoop (p9) Doug Cutting(Douglas Read Cutting) Lucene Nutch Google MapReduce Hadoop Yahoo! Inc. Cloudera http://www.sdtimes.com/blog/post/2009/08/10/Hadoop-creator-goes-to-Cloudera.aspx
  • Hadoop (p9) Doug Cutting
  • Hadoop (p13)
  • [ ]Eclipse http://d.hatena.ne.jp/torazuka/20100102/p1
  • [ ]GAE http://d.hatena.ne.jp/torazuka/20100101/gaetan
  • http://d.hatena.ne.jp/torazuka/20091011/pic
  • Hadoop (p13) Core I/O Avro RPC MapReduce HDFS
  • Hadoop (p13) Pig/HIVE DSL HBase (≒BigTable) ZooKeeper (≒Chubby) Chukwa
  • MapReduce
  • Hadoop (p19) MapReduce map reduce
  • MapReduce
  • MapReduce
  • MapReduce
  • MapReduce
  • MapReduce
  • MapReduce
  • Java MapReduce (p20) Java MapReduce MapReduceBase extend Mapper/Reducer implement MapReduce JobClient
  • < 0.20.0 public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • < 0.20.0 (20) public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • JobConf Job Map/Reduce / / /HBase Map/Reduce
  • Map/Reduce Key-Value Mapper<LongWritable, Text, Text, IntWritable> LongWritable Text IntWritable Hadoop Key=Long/Value=Text Key=Text/Value=Int
  • Writable Hadoop Text,LongWritable,IntWritable, BytesWritable..... Java org.apache.hadoop.io.serializer.WritableSerial ization (ReflectionUtil) ※deserialize DataInputStream (Writable#readField / Writable#write)
  • >= 0.20.0 public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • >= 0.20.0 (20) public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • API JobConf Configuration JobClient Job Mapper/Reducer OutputCollector,Reporter Context Key-Value Map Map
  • (p29) → → Map Reduce →Map/Reduce Map
  • (p29) JobTracker → TaskTracker → JobTracker
  • Hadoop
  • Reduce (p30)
  • Reduce (p31) ※Reduce shuffle
  • (p31) Map→Reduce (max/min/average ) Combiner Reduce (Shuffle Key-Value Map→Reduce
  • Hadoop Streaming (p34) Map/Reduce C/Python/Ruby/perl... Map/Reduce Hadoop hadoop-streaming.jar
  • Hadoop Streaming Map:cat/Reduce:wc HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  • Hadoop Streaming Python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  • Hadoop Pipes (p38) Hadoop MapReduce C++ TaskTracker Map/Reduce JNI