Map Reduce ~Continuous Map Reduce Design~

R 2. MAPREDUCE BASICS

! " # $ % &

'())*+
)) '())*+
)) '())*+
)) '())*+
))

( - , . / 0 / 1 ( 2 / . , 3 / 4

/5',67*+ /5',67*+ /5',67*+ /5',67*+

( - , . / 8 ( 2 / . , 3 / 4

)
)(+969657*+ )
)(+969657*+ )
)(+969657*+ )
)(+969657*+

:;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

( - 2 , . 3 / . 8 4

+*@</*+ +*@</*+ +*@</*+

G 2 H 3 I 8

11

Input Splits

Map ... <K, V>

Shufﬂe <K, list(V)>

Reduce ... <list(V)>

Output Files

Data Processing Framework Distributed File System Log Servers

Hadoop MapReduce
Framework Users HDFS

Query

5
Result

Data Processing Framework (Continuous MapReduce)
Framework Users
Figure 1.1: Log processing with the store-first-query-later model. Apache Hadoop [3]
is used as an example.
Query
Cloud Servers
with Logs
Results
frameworks in a traditional store-first-query-later model [17]. Companies migrate log
data from the source nodes to an append-only distributed file system such as GFS [18] or
HDFS [3]. The distributed file system replicates the log data for availability and fault-
HDFS

tolerance. Once the data is placed in the file system, users can execute queries using
bulk-processing frameworks and retrieve results from the distributed file system. Figure
1.1 illustrates this model.
Distributed File System

each input record, and the reduc
of values, v[], that share the same
for queries that are either highly
duce functions that are distribu
gates [14]. Thus we expect that u
MapReduce combiner, allowing
to merge values of a single key to
and distribute processing overhe
!"#$%"
biner allows iMR to process win
!"#$"
further reduce data volumes throu
%#&'()*
!"#$$%&!'()*+,-./01
tion. The only non-standard (but
Figure 1: The in-situ MapReduce architecture )01201*%$$,the )*+,-."
avoids MapReduce jobs may impleme
)*+,-."

cost and latency of the store-first-query-later design by %#&'()* describe in Section 2.3.2.
we %#&'()*
&'( &'(
moving processing onto the data sources. However, the primary way in w
+&,-+.#",&#/0 +&,-+.#",&#/0
)*+,-." )*+,-." that they emit a stream of results
)*+,-." )*+,-."

%#&'()*
speed of social network updates or accuracy of ad target- &'(
%#&'()* uous input, e.g., server log files
%#&'()* %#&'()*
&'( &'( &'(
ing. The in-situ MapReduce (iMR) architecture builds +&,-+.#",&#/0 cessors [7], iMR bounds comp
+&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0

on previous work in stream processing [5, 7, 9] to sup- haps infinite) data streams by pr

Map Reduce and Stream Processing

",-./#"0-1.2 ! !'( !)% !*+ !()
E7F/!.:7!2# "#$%&
3014!5
>.GH@0E8. => ?@A => ?@A => ?@A => ?@A => ?@A => ?@A
% & & & B B
10,!# %& '( '( )% *+ ()

6!7819:7-;,<./,</10<.

10<.# C+$ =>%?@//A =>&?@//A C%$ =>&?@//A =>B?@//A
CD CD CD CD
+/3,< )+/3,< %&+/3,<

!"#$%" ",-.
E7F
!"#$"
%#&'()*
!"#$$%&!'()*+,-./01 >.

)01201*%$$, )*+,-." )*+,-."

%#&'()* %#&'()*
&'( &'(
+&,-+.#",&#/0 +&,-+.#",&#/0
)*+,-." )*+,-." )*+,-." )*+,-."

%#&'()* %#&'()* %#&'()* %#&'()*
&'( &'( &'( &'( Figure
+&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 +&,-+.#",&#/0 sub-wi
have a

3: iMR nodes process local log ﬁles to produce
dows or panes. The system assumes log records
ogical timestamp and arrive in order.

!#5 !# & !$ 67 !#5 84 9 !4 & !$

" % " % " %
!4&!4 !#&!# !$&!$

'(()*("+*,-".*,-")+/"0,1"02*3
:;/0< " " % % :;/0<
' !# !$ !# !$ =

: iMR aggregates individual panes Pi in the net-
o produce a result, the root may either combine

# Call at each hit record
map(k1, hitRecord) {
timestamp = hitRecord.time
# look up paneId from timestamp
paneId = lookupPane(timestamp)
if (paneId.endFlag == True) {
# Notify whole data of the pane is sent
notify(paneId)
}
emitIntermediate(paneId, 1, timestamp)
}

Map Reduce and Stream Processing

combine(paneId, countList) {
hitCount = 0
for count in countList {
hitCount += count
}
# Send the message to the downstream node
emitIntermediate(paneId, hitCount)
} Map Reduce and Stream Processing

# if node == root of aggregation tree
reduce(paneId ,countList) {
hitCount = 0
for count in countList {
hitCount += count
}
sv = SlideValue.new(paneId)
sv.hitCount = hitCount
return sv

# Window slide
init(slide) {
rangeValue = RangeValue.new
rangeValue.hitCount = 0
return rangeValue
}
# Reduce
merge(rangeValue, slideValue) {
rangeValue.hitCount += slideValue.hitCount
}
# slide window
unmerge(rangeValue, slideValue) {
rangeValue.hitCount -= slideValue.hitCount

K-Means Clustering in Map Reduce

Figure 2: MapReduce Classiﬁer Training and Evaluation Procedure

A Comparison of Approaches for Large-Scale Data Mining

Google Pregel Graph Processing

Map Reduce ~Continuous Map Reduce Design~

Map Reduce ~Continuous Map Reduce Design~

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to Map Reduce ~Continuous Map Reduce Design~

Similar to Map Reduce ~Continuous Map Reduce Design~ (20)

More from Takahiro Inoue

More from Takahiro Inoue (20)

Recently uploaded

Recently uploaded (20)

Map Reduce ~Continuous Map Reduce Design~