Successfully reported this slideshow.
Upcoming SlideShare
×

of

9

Share

# Map Reduce ~Continuous Map Reduce Design~

See all

See all

### Map Reduce ~Continuous Map Reduce Design~

1. 1. R 2. MAPREDUCE BASICS ! " # \$ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
2. 2. R 2. MAPREDUCE BASICS ! " # \$ % & ())*+ )) ())*+ )) ())*+ )) ())*+ )) ( - , . / 0 / 1 ( 2 / . , 3 / 4 /5,67*+ /5,67*+ /5,67*+ /5,67*+ ( - , . / 8 ( 2 / . , 3 / 4 ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ ) )(+969657*+ :;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED ( - 2 , . 3 / . 8 4 +*@</*+ +*@</*+ +*@</*+ G 2 H 3 I 8
3. 3. 11 Input Splits Map ... <K, V>Shufﬂe <K, list(V)>Reduce ... <list(V)> Output Files
4. 4. Data Processing Framework Distributed File System Log Servers Hadoop MapReduceFramework Users HDFS Query 5 Result Data Processing Framework (Continuous MapReduce) Framework UsersFigure 1.1: Log processing with the store-ﬁrst-query-later model. Apache Hadoop [3]is used as an example. Query Cloud Servers with Logs Resultsframeworks in a traditional store-ﬁrst-query-later model [17]. Companies migrate logdata from the source nodes to an append-only distributed ﬁle system such as GFS [18] orHDFS [3]. The distributed ﬁle system replicates the log data for availability and fault- HDFStolerance. Once the data is placed in the ﬁle system, users can execute queries usingbulk-processing frameworks and retrieve results from the distributed ﬁle system. Figure1.1 illustrates this model. Distributed File System
5. 5. each input record, and the reduc of values, v[], that share the same for queries that are either highly duce functions that are distribu gates [14]. Thus we expect that u MapReduce combiner, allowing to merge values of a single key to and distribute processing overhe !"#\$%" biner allows iMR to process win !"#\$" further reduce data volumes throu %#&()* !"#\$\$%&!()*+,-./01 tion. The only non-standard (butFigure 1: The in-situ MapReduce architecture )01201*%\$\$,the )*+,-." avoids MapReduce jobs may impleme )*+,-."cost and latency of the store-ﬁrst-query-later design by %#&()* describe in Section 2.3.2. we %#&()* &( &(moving processing onto the data sources. However, the primary way in w +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." that they emit a stream of results )*+,-." )*+,-." %#&()*speed of social network updates or accuracy of ad target- &( %#&()* uous input, e.g., server log ﬁles %#&()* %#&()* &( &( &(ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0on previous work in stream processing [5, 7, 9] to sup- haps inﬁnite) data streams by pr
6. 6. Map Reduce and Stream Processing
7. 7. ",-./#"0-1.2 ! !( !)% !*+ !() E7F/!.:7!2# "#\$%& 3014!5 >.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A % & & & B B 10,!# %& ( ( )% *+ () 6!7819:7-;,<./,</10<. 10<.# C+\$ =>%?@//A =>&?@//A C%\$ =>&?@//A =>B?@//A CD CD CD CD +/3,< )+/3,< %&+/3,<
8. 8. !"#\$%" ",-. E7F !"#\$" %#&()*!"#\$\$%&!()*+,-./01 >.)01201*%\$\$, )*+,-." )*+,-." %#&()* %#&()* &( &( +&,-+.#",&#/0 +&,-+.#",&#/0 )*+,-." )*+,-." )*+,-." )*+,-." %#&()* %#&()* %#&()* %#&()* &( &( &( &( Figure+&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi have a
9. 9. 3: iMR nodes process local log ﬁles to produce dows or panes. The system assumes log recordsogical timestamp and arrive in order. !#5 !# & !\$ 67 !#5 84 9 !4 & !\$ " % " % " % !4&!4 !#&!# !\$&!\$ (()*("+*,-".*,-")+/"0,1"02*3 :;/0< " " % % :;/0< !# !\$ !# !\$ =: iMR aggregates individual panes Pi in the net-o produce a result, the root may either combine
10. 10. # Call at each hit recordmap(k1, hitRecord) { timestamp = hitRecord.time # look up paneId from timestamp paneId = lookupPane(timestamp) if (paneId.endFlag == True) { # Notify whole data of the pane is sent notify(paneId) } emitIntermediate(paneId, 1, timestamp)} Map Reduce and Stream Processing
11. 11. combine(paneId, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(paneId, hitCount)} Map Reduce and Stream Processing
12. 12. # if node == root of aggregation treereduce(paneId ,countList) { hitCount = 0 for count in countList { hitCount += count } sv = SlideValue.new(paneId) sv.hitCount = hitCount return sv} Map Reduce and Stream Processing
13. 13. # Window slideinit(slide) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue}# Reducemerge(rangeValue, slideValue) { rangeValue.hitCount += slideValue.hitCount}# slide windowunmerge(rangeValue, slideValue) { rangeValue.hitCount -= slideValue.hitCount} Map Reduce and Stream Processing
14. 14. K-Means Clustering in Map Reduce
15. 15. Figure 2: MapReduce Classiﬁer Training and Evaluation Procedure A Comparison of Approaches for Large-Scale Data Mining
16. 16. Google Pregel Graph Processing
17. 17. Google Pregel Graph Processing
• #### hczcolin

Apr. 17, 2016
• #### ssuser53346b

Jan. 14, 2015
• #### teruok

Jul. 26, 2012
• #### Linco69

Jun. 13, 2012

Apr. 3, 2012
• #### kazuakey

Jun. 26, 2011
• #### ueshin

Jun. 26, 2011
• #### matsumana0101

Jun. 26, 2011
• #### hide5stm

Jun. 25, 2011

Total views

7,578

On Slideshare

0

From embeds

0

Number of embeds

3,483

92

Shares

0