Mapreduce introduction

1,245 views
1,121 views

Published on

Introduction to MapReduce framework

Published in: Technology

Mapreduce introduction

  1. 1. MapReduce Data Infrastructure Team Jongyoul Lee Friday, September 27, 13
  2. 2. MapReduce? Data types, input/output format Mapper, reducer Combiner Hadoop streaming Next... Friday, September 27, 13
  3. 3. https://github.com/madeng/mrintro.git Friday, September 27, 13
  4. 4. MapReduce? Data types, input/output format Mapper, reducer Combiner Hadoop streaming Next... Friday, September 27, 13
  5. 5. DataNode DataNode DataNode JobTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker TaskTracker Client Structure Overview DataNode DataNode DataNode Friday, September 27, 13
  6. 6. 분산처리를 위한 고수준(!!) 아키택처 데이터의 흐름은 생각하지 않음 Key/value에 대해서만 생각하면 됨 모든 문제를 해결할 수 있는 것은 아님 Friday, September 27, 13
  7. 7. Output Input TextInputFormat (k1, v1) ! (k2, v2) (k2, list(v2)) ! (k2, v2’) (k2, v2’, #reducer) ! #partition (k2, list(v2’)) ! (k3, v3) TextOutputFormat Mapper Combiner Partitioner Shuffle/sort Reducer Friday, September 27, 13
  8. 8. Output Input TextInputFormat (k1, v1) ! (k2, v2) (k2, list(v2)) ! (k2, v2’) (k2, v2’, #reducer) ! #partition (k2, list(v2’)) ! (k3, v3) TextOutputFormat Mapper Combiner Partitioner Shuffle/sort Reducer Friday, September 27, 13
  9. 9. org.apache.hadoop.mapred org.apache.hadoop.mapreduce mapreduce가 새로운 패키지 하지만 예전 패키지도 여전히 많이 사용 Cascading... Friday, September 27, 13
  10. 10. MapReduce? Data types, input/output format Mapper, reducer Combiner Hadoop streaming Next... Friday, September 27, 13
  11. 11. Class name Data type BooleanWritable Boolean ByteWritable byte DoubleWritable Double FloatWritable Float IntWritable Integer LongWritable Long Text UTF-8

×