Hadoop
19:00   20:00



          ※
Hadoop       /      MapReduce(43p)
     Hadoop                    38p
     Hadoop   I/O(46p)
     MapRedue                ...
http://twitter.com/just_do_neet
         http://ameblo.jp/just-do-neet/




Hadoop




MapReduce
Hadoop
−−Grace Hopper
(p1)



           /

100   /             Facebook

2.5                 Ancestry.com)

           20                      ...
How Much Information?
http://hmi.ucsd.edu/pdf/HMI_2009_ConsumerReport_Dec9_2009.pdf
(p3)



HDD
(p3)




Hadoop(HDFS




                 (HDFS)




              (MapReduce)
(p4)


RDBMS   MapReduce
Hadoop                                              (p9)




         Doug Cutting(Douglas Read Cutting)



              ...
Hadoop   (p9)




Doug Cutting
Hadoop   (p13)
[           ]Eclipse




    http://d.hatena.ne.jp/torazuka/20100102/p1
[              ]GAE




http://d.hatena.ne.jp/torazuka/20100101/gaetan
http://d.hatena.ne.jp/torazuka/20091011/pic
Hadoop           (p13)
Core

            I/O

Avro

            RPC



MapReduce




HDFS
Hadoop                      (p13)

Pig/HIVE

             DSL

HBase

               (≒BigTable)

ZooKeeper

            (...
MapReduce
Hadoop                     (p19)



MapReduce   map   reduce
MapReduce
MapReduce
MapReduce
MapReduce
MapReduce
MapReduce
Java MapReduce                       (p20)



Java

  MapReduce

       MapReduceBase    extend

       Mapper/Reducer    ...
< 0.20.0
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, ...
< 0.20.0                         (20)
public class WordCount {
    public static class Map extends MapReduceBase implement...
JobConf
Job

        Map/Reduce

                 /

                         /
        /HBase

  Map/Reduce
Map/Reduce
   Key-Value

Mapper<LongWritable, Text, Text,
IntWritable>

     LongWritable   Text   IntWritable
           ...
Writable
Hadoop

  Text,LongWritable,IntWritable,
  BytesWritable.....

  Java

         org.apache.hadoop.io.serializer.W...
>= 0.20.0
public class WordCount {
    public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
   ...
>= 0.20.0                            (20)
public class WordCount {
    public static class Map extends Mapper<LongWritable...
API
JobConf            Configuration

JobClient        Job

Mapper/Reducer



OutputCollector,Reporter           Context

...
(p29)




→



→              Map   Reduce




→Map/Reduce
         Map
(p29)




JobTracker
→

TaskTracker
→             JobTracker
Hadoop
Reduce   (p30)
Reduce             (p31)




※Reduce   shuffle
(p31)



Map→Reduce
   (max/min/average   )

  Combiner

             Reduce
  (Shuffle                Key-Value



  Map→...
Hadoop Streaming                     (p34)



              Map/Reduce

  C/Python/Ruby/perl...           Map/Reduce



Ha...
Hadoop Streaming
 Map:cat/Reduce:wc

 HDFS
$ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/
$ hadoop j...
Hadoop Streaming
 Python
 http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python


$ hadoop jar /u...
Hadoop Pipes     (p38)



Hadoop MapReduce   C++

TaskTracker Map/Reduce
        JNI
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
Hadoop本 輪読会 1章〜2章
Upcoming SlideShare
Loading in …5
×

Hadoop本 輪読会 1章〜2章

5,037 views

Published on

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,037
On SlideShare
0
From Embeds
0
Number of Embeds
106
Actions
Shares
0
Downloads
138
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • Hadoop本 輪読会 1章〜2章

    1. 1. Hadoop
    2. 2. 19:00 20:00 ※
    3. 3. Hadoop / MapReduce(43p) Hadoop 38p Hadoop I/O(46p) MapRedue (42p) MapReduce (22p) MapReduce (38p) MapReduce (36p) Hadoop (30p) 10 Hadoop (30p) 11 Pig(46p) 12 HBase(30p) 13 ZooKeeper(38p)
    4. 4. http://twitter.com/just_do_neet http://ameblo.jp/just-do-neet/ Hadoop MapReduce
    5. 5. Hadoop
    6. 6. −−Grace Hopper
    7. 7. (p1) / 100 / Facebook 2.5 Ancestry.com) 20 The Internet Archive) 15
    8. 8. How Much Information? http://hmi.ucsd.edu/pdf/HMI_2009_ConsumerReport_Dec9_2009.pdf
    9. 9. (p3) HDD
    10. 10. (p3) Hadoop(HDFS (HDFS) (MapReduce)
    11. 11. (p4) RDBMS MapReduce
    12. 12. Hadoop (p9) Doug Cutting(Douglas Read Cutting) Lucene Nutch Google MapReduce Hadoop Yahoo! Inc. Cloudera http://www.sdtimes.com/blog/post/2009/08/10/Hadoop-creator-goes-to-Cloudera.aspx
    13. 13. Hadoop (p9) Doug Cutting
    14. 14. Hadoop (p13)
    15. 15. [ ]Eclipse http://d.hatena.ne.jp/torazuka/20100102/p1
    16. 16. [ ]GAE http://d.hatena.ne.jp/torazuka/20100101/gaetan
    17. 17. http://d.hatena.ne.jp/torazuka/20091011/pic
    18. 18. Hadoop (p13) Core I/O Avro RPC MapReduce HDFS
    19. 19. Hadoop (p13) Pig/HIVE DSL HBase (≒BigTable) ZooKeeper (≒Chubby) Chukwa
    20. 20. MapReduce
    21. 21. Hadoop (p19) MapReduce map reduce
    22. 22. MapReduce
    23. 23. MapReduce
    24. 24. MapReduce
    25. 25. MapReduce
    26. 26. MapReduce
    27. 27. MapReduce
    28. 28. Java MapReduce (p20) Java MapReduce MapReduceBase extend Mapper/Reducer implement MapReduce JobClient
    29. 29. < 0.20.0 public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    30. 30. < 0.20.0 (20) public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
    31. 31. JobConf Job Map/Reduce / / /HBase Map/Reduce
    32. 32. Map/Reduce Key-Value Mapper<LongWritable, Text, Text, IntWritable> LongWritable Text IntWritable Hadoop Key=Long/Value=Text Key=Text/Value=Int
    33. 33. Writable Hadoop Text,LongWritable,IntWritable, BytesWritable..... Java org.apache.hadoop.io.serializer.WritableSerial ization (ReflectionUtil) ※deserialize DataInputStream (Writable#readField / Writable#write)
    34. 34. >= 0.20.0 public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
    35. 35. >= 0.20.0 (20) public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); FileInputFormat.addInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
    36. 36. API JobConf Configuration JobClient Job Mapper/Reducer OutputCollector,Reporter Context Key-Value Map Map
    37. 37. (p29) → → Map Reduce →Map/Reduce Map
    38. 38. (p29) JobTracker → TaskTracker → JobTracker
    39. 39. Hadoop
    40. 40. Reduce (p30)
    41. 41. Reduce (p31) ※Reduce shuffle
    42. 42. (p31) Map→Reduce (max/min/average ) Combiner Reduce (Shuffle Key-Value Map→Reduce
    43. 43. Hadoop Streaming (p34) Map/Reduce C/Python/Ruby/perl... Map/Reduce Hadoop hadoop-streaming.jar
    44. 44. Hadoop Streaming Map:cat/Reduce:wc HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
    45. 45. Hadoop Streaming Python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
    46. 46. Hadoop Pipes (p38) Hadoop MapReduce C++ TaskTracker Map/Reduce JNI

    ×