An Introduction To Map-Reduce

An Introduction to MapReduce Francisco P érez-Sorrosal Distributed Systems Lab (DSL/LSD) Universidad Polit é cnica de Madrid 10/Apr/2008

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Motivation ,[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Motivation (II) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

What is MapReduce? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Simple Example An Introduction to MapReduce Input data Mapped data on Node 1 Mapped data on Node 2 Result

What is MapReduce ’ s Main Goal? An Introduction to MapReduce Simplify the parallelization and distribution of large-scale computations in clusters

MapReduce Main Features ,[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

What does MapReduce solves? ,[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

What does MapReduce solves? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Programming Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Programming Model: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Framework Overview An Introduction to MapReduce

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB Master 1) Split File into 10 pieces of 64MB R = 4 output files (Set by the user)‏ a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l Worker Idle Worker Idle Worker Idle (There are 26 different keys letters in the range [a..z]) Worker Idle Worker Idle Worker Idle Worker Idle Worker Idle 1 2 3 4 5 6 7 8 9 10

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB Master 2) Assign map and reduce tasks a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l 1 2 3 4 5 6 7 8 9 10 Worker Idle Worker Idle Worker Idle Worker Idle Worker Idle Worker Idle Worker Idle Worker Idle Mappers Reducers

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB Master 3) Read the split data a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l 1 2 3 4 Map T. In progress Map T. In progress Map T. In progress Reduce T. Idle Reduce T. Idle Reduce T. Idle Map T. In progress Reduce T. Idle

Example: Count # of Each Letter in a Big File An Introduction to MapReduce a b c d e f g h i j k l m n n o p q r s t v w x y z Machine 1 Big File 640MB 4) Process data (in memory) a y b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l R1 Partition Function (used to map the letters in regions): R2 R3 R4 Simulating the execution in memory Map T.1 In-Progress R1 R2 R3 R4 (a,1) (b,1) (a,1) (m1) (o,1) (p,1) (r, 1) (y,1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine 1 Big File 640MB Master 5) Apply combiner function a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l Simulating the execution in memory R1 R2 R3 R4 Map T.1 In-Progress (a,1) (b,1) (a,1) (m1) (o,1) (p,1) (r, 1) (y,1) (a,2) (b,1) (m1) (o,1) (p,1) (r, 1) (y,1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine 1 Big File 640MB Master 6) Store results on disk a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l Memory R1 R2 R3 R4 Disk Map T.1 In-Progress (a,2) (b,1) (m1) (o,1) (p,1) (r, 1) (y,1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB Master 7) Inform the master about the position of the intermediate results in local disk a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l Machine 1 R1 R2 R3 R4 MT1 Results Location MT1 Results Map T.1 In-Progress (a,2) (b,1) (m1) (o,1) (p,1) (r, 1) (y,1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB Master 8) The Master assigns the next task (Map Task 5) to the Worker recently free a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l Machine 1 R1 R2 R3 R4 T1 Results Worker In-Progress Data for Map Task 5 (a,2) (b,1) (m1) (o,1) (p,1) (r, 1) (y,1) Task 5

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Master 9) The Master forwards the location of the intermediate results of Map Task 1 to reducers Machine 1 R1 R2 R3 R4 MT1 Results MT1 Results Location (R1) MT1 Results Location (Rx) Big File 640MB a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l ... Map T.5 In-Progress Reduce T.1 Idle (a,2) (b,1) (m1) (o,1) (p,1) (r, 1) (y,1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Big File 640MB a t b o m a p r r e d u c e g o o o g l e a p i m a c a c a b r a a r r o z f e i j a o t o m a t e c r u i m e s s o l R1 a b c d e f g Letters in Region 1 : Reduce T.1 Idle (a, 2) (b,1) (e, 1) (d, 1) (c, 1) (e, 1) (g, 1) (e, 1) (a, 3) (c, 1) (c, 1) (a, 1) (b,1) (a, 2) (f, 1) (e, 1) (a, 2) (e, 1)(c, 1) (e, 1)

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine N (a, 2) (b,1) (e, 1) (d, 1) (c, 1) (e, 1) (g, 1) (e, 1) (a, 3) (c, 1) (c, 1) (a, 1) (b,1) (a, 2) (f, 1) (e, 1) (a, 2) (e, 1)(c, 1) (e, 1) Data read from each Map Task stored in region 1 10) The RT 1 reads the data in R=1 from each MT Reduce T.1 In-Progress

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine N (a, 2) (a, 3) (a, 1) (a, 2) (a, 2) (b,1) (b,1) (c, 1) (c, 1) (c, 1) (c, 1) (d, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (f, 1) (g, 1) 11) The reduce task 1 sorts the data Reduce T.1 In-Progress

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine N (a, 2) (a, 3) (a, 1) (a, 2) (a, 2) (b,1) (b,1) (c, 1) (c, 1) (c, 1) (c, 1) (d, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (e, 1) (f, 1) (g, 1) 12) Then it passes the key and the corresponding set of intermediate data to the user's reduce function (a, {2,3,1,2,2}) (b, {1,1}) (c, {1,1,1,1}) (d,{1}) (e, {1,1,1,1,1,1}) (f, {1}) (g, {1}) Reduce T.1 In-Progress

Example: Count # of Each Letter in a Big File An Introduction to MapReduce Machine N 12) Finally, generates the output file 1 of R, after executing the user's reduce (a, {2,3,1,2,2}) (b, {1,1}) (c, {1,1,1,1}) (d,{1}) (e, {1,1,1,1,1,1}) (f, {1}) (g, {1}) (a, 10) (b, 2) (c, 4) (d, 1) (e, 6) (f, 1) (g, 1) Reduce T.1 In-Progress

Other Features: Failures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Other Features: Locality ,[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Other Features: Backup Tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Performance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Performance: Distributed Grep Program ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Performance: Grep ,[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce 1764 Workers Maps are starting to finish Scan Rate

Hadoop: A MapReduce Implementation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hadoop: A MapReduce Implementation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Hadoop: HDFS Console An Introduction to MapReduce

Hadoop: JobTracker Console An Introduction to MapReduce

Hadoop: Word Count Example ,[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Hadoop: Running the Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Hadoop: Word Count Example An Introduction to MapReduce public class WordCount extends Configured implements Tool { ... public static class MapClass extends MapReduceBase implements Mapper < LongWritable , Text , Text , IntWritable > { ... // Map Task Definition } public static class Reduce extends MapReduceBase implements Reducer < Text , IntWritable , Text , IntWritable > { ... // Reduce Task Definition } public int run ( String [] args ) throws Exception { ... // Job Configuration } public static void main ( String [] args ) throws Exception { int res = ToolRunner . run (new Configuration (), new WordCount (), args ); System . exit ( res ); } }

Hadoop: Job Configuration An Introduction to MapReduce public int run ( String [] args ) throws Exception { JobConf conf = new JobConf ( getConf (), WordCount .class); conf . setJobName ( "wordcount"); // the keys are words (strings) conf . setOutputKeyClass ( Text . class); // the values are counts (ints) conf . setOutputValueClass ( IntWritable . class); conf . setMapperClass ( MapClass .class); conf . setCombinerClass ( Reduce .class); conf . setReducerClass ( Reduce .class); conf . setInputPath ( new Path ( args . get (0))); conf . setOutputPath (new Path ( args . get (1))); JobClient . runJob ( conf ); return 0; }

Hadoop: Map Class An Introduction to MapReduce public static class MapClass extends MapReduceBase implements Mapper < LongWritable , Text , Text , IntWritable > { private final static IntWritable one = new IntWritable (1); private Text word = new Text (); // map(WritableComparable, Writable, OutputCollector, Reporter) public void map ( LongWritable key , Text value , OutputCollector < Text , IntWritable > output , Reporter reporter ) throws IOException { String line = value . toString (); StringTokenizer itr = new StringTokenizer ( line ); while ( itr . hasMoreTokens ()) { word . set ( itr . nextToken ()); output . collect ( word , one ); } } }

Hadoop: Reduce Class An Introduction to MapReduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References ,[object Object],[object Object],[object Object],[object Object],An Introduction to MapReduce

Questions? An Introduction to MapReduce

An Introduction To Map-Reduce

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An Introduction To Map-Reduce

Similar to An Introduction To Map-Reduce (20)

An Introduction To Map-Reduce

Editor's Notes