Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Map Reduce ~Continuous Map Reduce Design~

  • Be the first to comment

Map Reduce ~Continuous Map Reduce Design~

  1. 1. R 2. MAPREDUCE BASICS ! " # $ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  2. 2. R 2. MAPREDUCE BASICS ! " # $ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
  3. 3. 11 Input Splits Map ... <K, V>Shuffle <K, list(V)>Reduce ... <list(V)> Output Files
  4. 4. Data Processing Framework Distributed File System Log Servers Hadoop MapReduceFramework Users HDFS Query 5 Result Data Processing Framework (Continuous MapReduce) Framework UsersFigure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3]is used as an example. Query Cloud Servers with Logs Resultsframeworks in a traditional store-first-query-later model [17]. Companies migrate logdata from the source nodes to an append-only distributed file system such as GFS [18] orHDFS [3]. The distributed file system replicates the log data for availability and fault- HDFStolerance. Once the data is placed in the file system, users can execute queries usingbulk-processing frameworks and retrieve results from the distributed file system. Figure1.1 illustrates this model. Distributed File System
  5. 5. each input record, and the reduc of values, v[], that share the same for queries that are either highly duce functions that are distribu gates [14]. Thus we expect that u MapReduce combiner, allowing to merge values of a single key to and distribute processing overhe !"#$%" biner allows iMR to process win !"#$" further reduce data volumes throu %#&()* !"#$$%&!()*+,-./01 tion. The only non-standard (butFigure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-." avoids MapReduce jobs may impleme )*+,-."cost and latency of the store-first-query-later design by %#&()* describe in Section 2.3.2. we %#&()* &( &(moving processing onto the data sources. However, the primary way in w +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." that they emit a stream of results )*+,-." )*+,-." %#&()*speed of social network updates or accuracy of ad target- &( %#&()* uous input, e.g., server log files %#&()* %#&()* &( &( &(ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0on previous work in stream processing [5, 7, 9] to sup- haps infinite) data streams by pr
  6. 6. Map Reduce and Stream Processing
  7. 7. ",-./#"0-1.2 ! !( !)% !*+ !() E7F/!.:7!2# "#$%& 3014!5 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A % & & & B B 10,!# %& ( ( )% *+ () 6!7819:7-;,<./,</10<. 10<.# C+$ =>%?@//A =>&?@//A C%$ =>&?@//A =>B?@//A CD CD CD CD +/3,< )+/3,< %&+/3,<
  8. 8. !"#$%" ",-. E7F !"#$" %#&()*!"#$$%&!()*+,-./01 >.)01201*%$$, )*+,-." )*+,-." %#&()* %#&()* &( &( +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." )*+,-." )*+,-." %#&()* %#&()* %#&()* %#&()* &( &( &( &( Figure+&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi have a
  9. 9. 3: iMR nodes process local log files to produce dows or panes. The system assumes log recordsogical timestamp and arrive in order. !#5 !# & !$ 67 !#5 84 9 !4 & !$ " % " % " % !4&!4 !#&!# !$&!$ (()*("+*,-".*,-")+/"0,1"02*3 :;/0< " " % % :;/0< !# !$ !# !$ =: iMR aggregates individual panes Pi in the net-o produce a result, the root may either combine
  10. 10. # Call at each hit recordmap(k1, hitRecord) { timestamp = hitRecord.time # look up paneId from timestamp paneId = lookupPane(timestamp) if (paneId.endFlag == True) { # Notify whole data of the pane is sent notify(paneId) } emitIntermediate(paneId, 1, timestamp)} Map Reduce and Stream Processing
  11. 11. combine(paneId, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(paneId, hitCount)} Map Reduce and Stream Processing
  12. 12. # if node == root of aggregation treereduce(paneId ,countList) { hitCount = 0 for count in countList { hitCount += count } sv = SlideValue.new(paneId) sv.hitCount = hitCount return sv} Map Reduce and Stream Processing
  13. 13. # Window slideinit(slide) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue}# Reducemerge(rangeValue, slideValue) { rangeValue.hitCount += slideValue.hitCount}# slide windowunmerge(rangeValue, slideValue) { rangeValue.hitCount -= slideValue.hitCount} Map Reduce and Stream Processing
  14. 14. K-Means Clustering in Map Reduce
  15. 15. Figure 2: MapReduce Classifier Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
  16. 16. Google Pregel Graph Processing
  17. 17. Google Pregel Graph Processing

    Be the first to comment

    Login to see the comments

  • hide5stm

    Jun. 25, 2011
  • matsumana0101

    Jun. 26, 2011
  • ueshin

    Jun. 26, 2011
  • kazuakey

    Jun. 26, 2011
  • whitestardev

    Apr. 3, 2012
  • Linco69

    Jun. 13, 2012
  • teruok

    Jul. 26, 2012
  • ssuser53346b

    Jan. 14, 2015
  • hczcolin

    Apr. 17, 2016

Views

Total views

7,539

On Slideshare

0

From embeds

0

Number of embeds

3,483

Actions

Downloads

92

Shares

0

Comments

0

Likes

9

×