MapReduce
@shot6
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chuk...
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chuk...
•                 MapReduce

     –    Mapper/Reducer
• 
MapReduce                      	
•              WordCount
• 
• 
     – Mapper/Reducer       Job   ⾏行行
     – InputFormat/O...
WordCount	
•  Hadoop          Hello World
•                   API
   (org.apache.hadoop.mapreduce)
•  API
Grep	
•  grep
  – grepJob/sortJob 2
        ⾏行行
  – JobConf/Mapper/Reducer            ⽅方
  – Mapper RegexMapper     ⾏行行   ...
Grep
                  -	
•  JobConf
•  Mapper
•  Reducer
o.a.hadoop.mapred.JobConf	
• 
     –           mapred-default.xml
     –    conf/mapred-site.xml
     – XML    ⾝身
       D...
mapred-site.xml	
<configuration>
<!–                 -->
<property>
 <key>mapred.job.tracker</key>
 <value>your-site:9001<...
o.a.hadoop.mapred.Mapper	
•  Mapper
•  InputSplit    Mapper
•  MapTask/MapRunner
•  map(KEY, VALUE, COLLECTOR,
   REPORTER...
o.a.hadoop.mapred.MapTask	
•  Map
•  initiazlize              (Task Reducer    )
  –                                     ⽣...
o.a.h.mapred.MapTask cont	
•  run        runOldMapper
•  JobClient
   InputSplit
•  RecordReader
o.a.h.mapred.MapTask cont2	
•  Reduce
  –              spill                   (*            )
       •  $mapred.local.dir...
o.a.h.mapred.MapRunner	
•  MapRunnable
  – mapred.map.runner.class
  – Hadoop
    PipeMapRunner
  –               Map
    ...
o.a.h.mapred.MapRunner
                cont	
•  run(RecordReader, OutputCollector,
   Reporter)
     – RecordReader: Input...
MapTask	
      MapRunner	
              Mapper	
         Record            Output
                                        ...
m(_ _)m
•  Mapper
     – JobConf
     – Mapper/MapRunner/MapTask
• 
     – Reducer
       •  Reducer   ⾏行行
       •  Reducer      ...
o.a.h.mapred.Reducer	
•  Reducer
•  InputSplit      Mapper
•  ReduceTask/ReduceRunner
•  reduce(KEY, Iterator<VALUE>,
   C...
o.a.h.mapred.ReduceTask	
•  SHUFFLE
•  ReduceTask.ReduceCopier
  – fetchOutputs(            Merger.MergeQueue)
    •  Map ...
o.a.h.mapred.ReduceTask	
•  run(RecordReader, OutputCollector,
   Reporter)
•  SORT
  – Memory, disk                      ...
Upcoming SlideShare
Loading in...5
×

サンプルから見るMapReduceコード

3,470

Published on

Mapperしか出来ませんでしたが、とりあえず。

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,470
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
48
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

サンプルから見るMapReduceコード

  1. 1. MapReduce @shot6
  2. 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  3. 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  4. 4. •  MapReduce –  Mapper/Reducer • 
  5. 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  6. 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  7. 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  8. 8. Grep - •  JobConf •  Mapper •  Reducer
  9. 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  10. 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  11. 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  12. 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  13. 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  14. 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  15. 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  16. 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  17. 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  18. 18. m(_ _)m
  19. 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  20. 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  21. 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  22. 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×