Map Reduce ~Continuous Map Reduce Design~

7,220 views

Published on

Published in: Technology, Business
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,220
On SlideShare
0
From Embeds
0
Number of Embeds
3,483
Actions
Shares
0
Downloads
92
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Map Reduce ~Continuous Map Reduce Design~

  1. 1. R 2. MAPREDUCE BASICS ! " # $ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  2. 2. R 2. MAPREDUCE BASICS ! " # $ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  3. 3. 11 Input Splits Map ... <K, V>Shuffle <K, list(V)>Reduce ... <list(V)> Output Files
  4. 4. Data Processing Framework Distributed File System Log Servers Hadoop MapReduceFramework Users HDFS Query 5 Result Data Processing Framework (Continuous MapReduce) Framework UsersFigure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3]is used as an example. Query Cloud Servers with Logs Resultsframeworks in a traditional store-first-query-later model [17]. Companies migrate logdata from the source nodes to an append-only distributed file system such as GFS [18] orHDFS [3]. The distributed file system replicates the log data for availability and fault- HDFStolerance. Once the data is placed in the file system, users can execute queries usingbulk-processing frameworks and retrieve results from the distributed file system. Figure1.1 illustrates this model. Distributed File System
  5. 5. each input record, and the reduc of values, v[], that share the same for queries that are either highly duce functions that are distribu gates [14]. Thus we expect that u MapReduce combiner, allowing to merge values of a single key to and distribute processing overhe !"#$%" biner allows iMR to process win !"#$" further reduce data volumes throu %#&()* !"#$$%&!()*+,-./01 tion. The only non-standard (butFigure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-." avoids MapReduce jobs may impleme )*+,-."cost and latency of the store-first-query-later design by %#&()* describe in Section 2.3.2. we %#&()* &( &(moving processing onto the data sources. However, the primary way in w +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." that they emit a stream of results )*+,-." )*+,-." %#&()*speed of social network updates or accuracy of ad target- &( %#&()* uous input, e.g., server log files %#&()* %#&()* &( &( &(ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0on previous work in stream processing [5, 7, 9] to sup- haps infinite) data streams by pr
  6. 6. Map Reduce and Stream Processing
  7. 7. ",-./#"0-1.2 ! !( !)% !*+ !() E7F/!.:7!2# "#$%& 3014!5 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A % & & & B B 10,!# %& ( ( )% *+ () 6!7819:7-;,<./,</10<. 10<.# C+$ =>%?@//A =>&?@//A C%$ =>&?@//A =>B?@//A CD CD CD CD +/3,< )+/3,< %&+/3,<
  8. 8. !"#$%" ",-. E7F !"#$" %#&()*!"#$$%&!()*+,-./01 >.)01201*%$$, )*+,-." )*+,-." %#&()* %#&()* &( &( +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." )*+,-." )*+,-." %#&()* %#&()* %#&()* %#&()* &( &( &( &( Figure+&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi have a
  9. 9. 3: iMR nodes process local log files to produce dows or panes. The system assumes log recordsogical timestamp and arrive in order. !#5 !# & !$ 67 !#5 84 9 !4 & !$ " % " % " % !4&!4 !#&!# !$&!$ (()*("+*,-".*,-")+/"0,1"02*3 :;/0< " " % % :;/0< !# !$ !# !$ =: iMR aggregates individual panes Pi in the net-o produce a result, the root may either combine
  10. 10. # Call at each hit recordmap(k1, hitRecord) { timestamp = hitRecord.time # look up paneId from timestamp paneId = lookupPane(timestamp) if (paneId.endFlag == True) { # Notify whole data of the pane is sent notify(paneId) } emitIntermediate(paneId, 1, timestamp)} Map Reduce and Stream Processing
  11. 11. combine(paneId, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(paneId, hitCount)} Map Reduce and Stream Processing
  12. 12. # if node == root of aggregation treereduce(paneId ,countList) { hitCount = 0 for count in countList { hitCount += count } sv = SlideValue.new(paneId) sv.hitCount = hitCount return sv} Map Reduce and Stream Processing
  13. 13. # Window slideinit(slide) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue}# Reducemerge(rangeValue, slideValue) { rangeValue.hitCount += slideValue.hitCount}# slide windowunmerge(rangeValue, slideValue) { rangeValue.hitCount -= slideValue.hitCount} Map Reduce and Stream Processing
  14. 14. K-Means Clustering in Map Reduce
  15. 15. Figure 2: MapReduce Classifier Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
  16. 16. Google Pregel Graph Processing
  17. 17. Google Pregel Graph Processing

×