Successfully reported this slideshow.
Your SlideShare is downloading. ×

Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 78 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜 (20)

Advertisement

More from Takahiro Inoue (20)

Recently uploaded (20)

Advertisement

Map Reduce 〜入門編:仕組みの理解とアルゴリズムデザイン〜

  1. 1. 7 Aggregate Data Analysis
  2. 2. Data Data Data mapper mapper mapper mapper mapper mapper mapper mapper mapper
  3. 3. #$%&'()*' -'. #$% 0 1"23 45)667' &'()*' 0 1"23 0 1" 3 " " Welcome to My HomePage. Thank you. Where is your house? .... " " !+/"-'. " "
  4. 4. mapper Big Data mapper mapper
  5. 5. map: (k1, v1) ! [(k2, v2)] // [] //word count class Mapper method Map(docid a, doc d) for all term t ∈ doc d do Emit(term t, count 1)
  6. 6. > require 'msgpack' > msg = [1,2,3].to_msgpack  #=>"x93x01x02x03" > MessagePack.unpack(msg)  #=> [1,2,3]
  7. 7. // word count class Combiner method Combine(string t, counts [c1, c2, . . .]) sum ← 0 for all count c ∈ counts [c1, c2, . . .] do sum ← sum + c Emit(string t, count sum)
  8. 8. reduce: (k2, [v2]) ! [(k3, v3)] //word count class Reducer method Reduce(term t, counts [c1, c2, . . .]) sum ← 0 for all count c ∈ counts [c1,c2,...] do sum ← sum + c Emit(term t, count sum)
  9. 9. 30 CHAPTER 2. MAPREDUCE BASICS ! " # $ % & '())*+ )) '())*+ )) '())*+ )) '())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5',67*+ /5',67*+ /5',67*+ /5',67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  10. 10. Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning
  11. 11. Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning
  12. 12. Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning
  13. 13. package com.philippeadjiman.hadooptraining;  package com.philippeadjiman.hadooptraining; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.Partitioner;   public class MyPartitioner implements Partitioner<IntWritable,Text> { " @Override " public int getPartition(IntWritable key, Text value, int numPartitions) { " " /* Pretty ugly hard coded partitioning function. Don't do that in practice, it is just for the sake of understanding. */ " " int nbOccurences = key.get();   " " if( nbOccurences < 3 ) " " " return 0; " " else " " " return 1; " }   " @Override " public void configure(JobConf arg0) {   " } } Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning
  14. 14. x + y = y + x x y = y x (x + y) + z = x + (y + z) (x y) z = x (y z)
  15. 15. class Mapper { buffer init() { buffer = HashMap.new } map(id, data) { elements = process(data) for each element { .... check_and_put(buffer, k2, v2) } } // Designing algorithms for Map Reduce
  16. 16. check_and_put(buffer, k2, v2) { if buffer.full { for each k2 in buffer.keys { emit(k2, buffer[k2]) } } else { buffer.incrby(k2, v2) // H[k2]+=v2 } } close() { for each k2 in buffer.keys { emit(k2, buffer[k2]) } } } Designing algorithms for Map Reduce
  17. 17. ! ! ! !"# ! ! ! 2,% 6"# "# "# "# !"#$%&' " " " $%&'(%)*'+ 5,% '( '( '( ()*+*,-./0$1/ ! ! !#! !+(,+ ! ! ! $ $ 2,%$ :*8+"*6$+/ !#!"# 6"# !"#$%&' " " " $%&'(%)*'+ -(.*#/0 "# "# "# 5,% # # (+2*3+ '( '( '( ()*+*,-./0$1/ ,33"/3,+*#)9+"// !" !" 2/"3/ $ $ !+(,+ 1+2*3+ $ $ $ :*8+"*6$+/ !# !# "% "% !"#$%&' % -(.*#/0 4.5&*6+( # #% & & 4#56*)/ (+2*3+ ,33"/3,+*#)9+"// !" !" 2/"3/ 1+2*3+ $ Figure 1: Distributed execution plan for MapReduce $ when reduce cannot be decomposed to perform partial "% "% !"#$%&' % aggregation. % 4.5&*6+( !" !" 2/"3/ & & 4#56*)/ "/0$1/ Figure 1: Distributed execution plan for function, and merge and With this user-defined MapReduce "% "% !"#$%&' when reduce cannot beoperators provided by partial grouping decomposed to perform the system, it is pos- aggregation. sible to execute a simple distributed computation as !" !" ) )2/"3/ 7*),-./0$1/ shown in Figure 1. The computation has exactly "/0$1/ "% With this user-definedthe first phase merge anda Map function two phases: function, and executes "% * *!"#$%&' 4#)8$5/" grouping operatorsinputs to by the system, and pos- on the provided extract keys it is records, then per- ) sible to execute a simple distributed computation as based on the forms a partitioning of these outputs ) 7*),-./0$1/ Figure 2: Distributed execution plan for MapReduce shown in Figureof the records. The second phase collects and keys 1. The computation has exactly when reduce supports partial aggregation. The imple- two phases: the first phase executes a Map function * * 4#)8$5/" mentation of GroupBy in the first stage may be different to
  18. 18. Def. 1 x: data items, x1 ⊕ x2: concatenation of x1, x2. H decomposable 2 I C : 1) ∀x1, x2 : H(x1 ⊕ x2) = C(I(x1 ⊕ x2)) = C(I(x1) ⊕ I(x2)) 2) ∀x1, x2 : I(x1 ⊕ x2) = I(x2 ⊕ x1) 3) ∀x1, x2 : C(x1 ⊕ x2) = C(x2 ⊕ x1) Def. 2 H associative-decomposable Def.1 1-3 C 4) ∀x1, x2, x3 : C(C(x1 ⊕ x2) ⊕ x3) = C(x1 ⊕ C(x2 ⊕ x3)) ( i.e. C is associative )
  19. 19. class Combiner { share_space init(share_space_info) { share_space = conn(share_space_info) } combine(key, elements) { sum = 0 for each element { ... sum += v } //
  20. 20. share_space.incrby(key, sum) emit(key, share_space_info) } // end combine() } class Reducer { reduce(key, list_of_share_space_info) { for each share_space_info { share_space = conn(share_space_info) sum = 0 elements = share_space.hget(key) for each elemnt { ... } } }
  21. 21. partition(key) { range = (KEY_MAX - KEY_MIN) / NUM_OF_REDUCERS reducer_no = (key - KEY_MIN) / range return reducer_no } Designing algorithms for Map Reduce
  22. 22. (t1, m1, r80521), (t1, m2, r14209), (t1, m3, r76042), (t2, m1, r21823), (t2, m2, r66508), (t2, m3, r98347),... map: m1 ! (t1, r80521) // // t1,t2,t3,... (m1) ! [(t1, r80521), (t3, r146925), (t2, r21823)] (m2) ! [(t2, r66508), (t1, r14209), (t3, r14720)]
  23. 23. map: (m1, t1) ! r80521 (m1, t1) ! [(r80521)] // t1,t2,t3,... (m1, t2) ! [(r21823)] (m1, t3) ! [(r146925)]
  24. 24. class Mapper { buffer map(id, number) { buffer.append(number) if (buffer.is_full) { max = compute_max(buffer) emit(1, max) } } } Designing algorithms for Map Reduce
  25. 25. class Reducer { reduce(key, list_of_local_max) { global_max = 0 for local_max in list_of_local_max { if local_max > global_max { global_max = local_max } } emit(1, global_max) } } Designing algorithms for Map Reduce
  26. 26. class Combiner { combine(key, list_of_local_max) { local_max = maximum(list_of_local_max) emit(1, local_max) } // Max() } Designing algorithms for Map Reduce
  27. 27. class Mapper { map(id, data) { key, value = process(data) if rand() < 0.1 { //rand() ∈ [0.0, 1.0) emit(key, value) } } }
  28. 28. Map Reduce and Stream Processing
  29. 29. # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # key(=site) slice slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } Map Reduce and Stream Processing
  30. 30. combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } Map Reduce and Stream Processing
  31. 31. # mapper slice reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } Map Reduce and Stream Processing
  32. 32. # Window init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Reduce merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # slice slicing window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount } Map Reduce and Stream Processing
  33. 33. 5&4.)1*,!,);3-00+*0-1*,!&/*+!*-58!.-$*9!-$%!@+&22,!).A!18*! -!:2*=#;2*!'-<!1&!4&$#1&+!,1+*-4#$0!%-1-6!! .-$*3-00+*0-1*,! 1&! 5&4.)1*! '#$%&'3-00+*0-1*,6! >)+! *=3 R)++*$1!.+&.&,-2,!:&+!*/-2)-1#$0!,2#%#$03'#$%&'!-00+*0-1*! .*+#4*$1-2! ,1)%<! ,8&',! 18-1! ),#$0! .-$*,! 8-,! ,#0$#:#5-$1! ()*+#*,!;)::*+!*-58!#$.)1!1).2*!)$1#2!#1!#,!$&!2&$0*+!$**%*%! .*+:&+4-$5*!;*$*:#1,6!! INP6! D#$5*! *-58! #$.)1! 1).2*! ;*2&$0,! 1&! 4)21#.2*! '#$%&',9! ,)58!-..+&-58*,!;)::*+!-!1).2*!)$1#2!#1!#,!.+&5*,,*%!:&+!18*! '(# )*+,-./0+1-*2 -00+*0-1*! &/*+! 18*! 2-,1! '#$%&'! 1&! '8#58! #1! ;*2&$0,6! -58! B-$<! -..2#5-1#&$,! $**%! 1&! .+&5*,,! ,1+*-4,9! :&+! *=-4.2*9! #$.)1! 1).2*! #,! -55*,,*%! 4)21#.2*! 1#4*,9! &$5*! :&+! *-58! '#$3 :#$-$5#-2! %-1-! -$-2<,#,9! $*1'&+C! 1+-::#5! 4&$#1&+#$09! -$%! %&'!18-1!#1!.-+1#5#.-1*,!#$6!!! 1*2*5&44)$#5-1#&$! 4&$#1&+#$06! D*/*+-2! %-1-;-,*! +*,*-+58! 0+&).,! -+*! ;)#2%#$0! --1-! .1+*-4! /-$-0*4*$1! .<,1*4,! "*! ,**! 1'&! .+&;2*4,! '#18! ,)58! -..+&-58*,6! W#+,1! 18*! EFDBDG!,&!18-1!-..2#5-1#&$,!5-$!#,,)*!()*+#*,!1&!0*1!1#4*2<! ;)::*+!,#H*!+*()#+*%!#,!)$;&)$%*%T!Q1!-$<!1#4*!#$,1-$19!-22! #$:&+4-1#&$! :+&4! ,1+*-4,6! B-$-0#$0! -$%! .+&5*,,#$0! 1).2*,! 5&$1-#$*%! #$! 18*! 5)++*$1! '#$%&'! -+*! #$! 18*! ;)::*+9! ,1+*-4,!0#/*,!+#,*!1&!58-22*$0*,!18-1!8-/*!;**$!*=1*$,#/*2<! -$%!,&!18*!,#H*!&:!18*!+*()#+*%!;)::*+,!#,!%*1*+4#$*%!;<!18*! %#,5),,*%!-$%!+*5&0$#H*%!IJ9!K9!L9!M9!NOP6!! '#$%&'!+-$0*!-$%!18*!%-1-!-++#/-2!+-1*6!D*5&$%9!.+&5*,,#$0! *-58!#$.)1!1).2*!4)21#.2*!1#4*,!2*-%,!1&!-!8#08!5&4.)1-1#&$! Q$!#4.&+1-$1!52-,,!&:!()*+#*,!&/*+!%-1-!,1+*-4,!#,!,2#%#$03 5&,16!W&+!*=-4.2*!#$!X)*+<!N9!*-58!#$.)1!1).2*!#,!.+&5*,,*%! '#$%&'!-00+*0-1*!()*+#*,6!R&$,#%*+!-$!&$2#$*!-)51#&$!,<,3 :&)+!1#4*,6!Q,!18*!+-1#&!&:!YQZ[!&/*+!D]7F!#$5+*-,*,9! 1*4!#$!'8#58!;#%,!&$!-)51#&$!#1*4,!-+*!,1+*-4*%!#$1&!-!5*$3 ,&!%&*,!18*!$)4;*+!&:!1#4*,!*-58!1).2*!#,!.+&5*,,*%6!R&$3 1+-2!-)51#&$!.+&5*,,#$0!,<,1*46!S8*!,58*4-!&:!*-58!;#%!#,T! ,#%*+#$0!18*!2-+0*!/&2)4*!-$%!:-,1!-++#/-2!+-1*!&:!,1+*-4#$0! U#1*43#%9! ;#%3.+#5*9! 1#4*,1-4.V6! W&+! *-,*! &:! .+*,*$1-1#&$9! %-1-9!+*%)5#$0!18*!-4&)$1!&:!+*()#+*%!;)::*+!,.-5*!E#%*-22<! '*!-,,)4*!18-1!;#%,!-++#/*!#$!&+%*+!&$!18*#+!1#4*,1-4.!-13 1&!-!5&$,1-$1!;&)$%G!-$%!5&4.)1-1#&$!1#4*!#,!-$!#4.&+1-$1! 1+#;)1*6! E"*! -+*! -51#/*2<! #$/*,1#0-1#$0! .+&5*,,#$0! %#,&+3 %*+*%!%-1-!,1+*-4,G!X)*+<!N!,8&',!-$!*=-4.2*!&:!-!,2#%#$03 '#$%&'!-00+*0-1*!()*+<6! 3/4,52'T!@W#$%!18*!4-=#4)4!;#%!.+#5*!:&+!18*!.-,1!K!4#$3 )1*,!-$%!).%-1*!18*!+*,)21!*/*+<!N!4#$)1*6A! !"#"$%&'()*+,-./0,123& 4567&+,-89:;%%5&<,'28<('/& &&&&&&&&&&5;=>"&?&',@A<28& &&&&&&&&&&!#BC"&D&',@A<2E& 7$! 18*! ()*+<! -;&/*9! '*! #$1+&%)5*! -! '#$%&'! ,.*5#:#5-1#&$! '#18!18+**!.-+-4*1*+,T!YQZ[!,.*5#:#*,!18*!'#$%&'!,#H*9! D]7F! ,.*5#:#*,! 8&'! 18*! '#$%&'! 4&/*,9! -$%! "QSSY! ,.*5#:#*,! 18*! '#$%&'#$0! -11+#;)1*! &$! '8#58! 18-1! 18*! YQZ[! -$%! D]7F! .-+-4*1*+,! -+*! %*:#$*%6! S8*! '#$%&'! ,.*5#:#5-1#&$! &:! X)*+<! N! ;+*-C,! 18*! ;#%! ,1+*-4! #$1&! &/*+3 2-..#$0!K34#$)1*!,);3,1+*-4,!18-1!,1-+1!*/*+<!4#$)1*9!'#18! +*,.*51! 1&! 18*! 1#4*,1-4.! -11+#;)1*6! S8*,*! &/*+2-..#$0! ,);3 ,1+*-4,!-+*!5-22*%!!"#$#%&0(#%$)(!6!X)*+<!N!5-25)2-1*,!18*! 617/,42'8291*.-:;2&-<=-;4.2->26-/,2?@*4;2 No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams
  34. 34. K-Means Clustering in Map Reduce
  35. 35. Figure 2: MapReduce Classifier Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
  36. 36. Google Pregel Graph Processing
  37. 37. Google Pregel Graph Processing

×