Successfully reported this slideshow.
Your SlideShare is downloading. ×

サンプルから見るMapReduceコード

Ad

MapReduce
@shot6

Ad

Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chuk...

Ad

Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chuk...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Upcoming SlideShare
Introduction to Apache Pig
Introduction to Apache Pig
Loading in …3
×

Check these out next

1 of 22 Ad
1 of 22 Ad
Advertisement

More Related Content

Advertisement
Advertisement

サンプルから見るMapReduceコード

  1. 1. MapReduce @shot6
  2. 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  3. 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  4. 4. •  MapReduce –  Mapper/Reducer • 
  5. 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  6. 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  7. 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  8. 8. Grep - •  JobConf •  Mapper •  Reducer
  9. 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  10. 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  11. 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  12. 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  13. 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  14. 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  15. 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  16. 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  17. 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  18. 18. m(_ _)m
  19. 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  20. 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  21. 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  22. 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行

×