C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
Upcoming SlideShare
Loading in...5
×
 

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

on

  • 325 views

Paper presentation in DB reading group

Paper presentation in DB reading group

Statistics

Views

Total Views
325
Views on SlideShare
325
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Repeatedly invoking a Phoenix++ MapReduce job over a stream results in many redundant computations (at both Map and Reduce operations). C-MR allows data to be processed only once by Map and the inclusion of the Combine operator significantly decreases redundant work performed at the Reduce operator.
  • 1. Data is often generated from a source that can potentially produce an unbounded stream.2. A stream’s contents can only be accessed sequentially.Traditional queries are comprised of relational operators that assume a finite data source that can be accessed randomly.

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors Presentation Transcript

  • C-MR: Continuously ExecutingMapReduce Workflows on Multi- Core Processors Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian
  • Problem• Stream applications are often time-critical• Enabling stream support for MapReduce jobs – Simple for the Map operations – Hard for the Reduce operations• Continuously executing MapReduce workflows requires a great deal of coordination 1
  • C-MR Workflow• Windows: temporal subdivisions of a stream described by – size (the amount of the stream spanning) – slide (the interval between windows) 2
  • C-MR Programming Interface• Map/Reduce operations
  • C-MR Programming Interface (cont.1)• Input/Output streams
  • C-MR Programming Interface (cont.2)• Create workflows of continuous MapReduce jobs
  • C-MR vs. MapReduce• MapReduce computing nodes receive a set of Map or Reduce tasks and each node must wait for all other nodes to complete their tasks before being allocated additional tasks.• C-MR uses pull-based data acquisition allowing computing nodes to execute any Map or Reduce workload as they are able. Thus, straggling nodes will not hinder the progress of the other nodes if there is data available to process elsewhere in the workflow. 6
  • C-MR Architecture 7
  • Stream and Window Management• The merged output streams are not guaranteed to retain their original orderings.• Solution: Replicating window-bounding punctuations
  • Stream and Window Management (cont.1) A node consumes the punctuation from the sorted input stream-buffer 9
  • Stream and Window Management (cont.2) Replicate that punctuation to the other nodes
  • Stream and Window Management (cont.3) After all replicas are received at the intermediate buffer, collect data whose timestamps fall into the applicable interval and materialize them as a window
  • Operator Scheduling• Scheduling framework – Execute multiple policies simultaneously – Transition between policies based on resource availability• Scheduling policies
  • Incremental ComputationOutput1 = d1 + d2 + d3 + ... + dnOutput2 = d2 + d3 + d4 + ... + dn+1Output3 = d3 + d4 + d5 + ... + dn+2Output4 = d4 + d5 + d6 + ... + dn+3Share the common data subset of computation
  • Evaluation• Continuously executing a MapReduce job – Compare with Phoenix++ 14
  • Evaluation (cont.1)• Operator scheduling – Oldest data first (ODF) – Best memory trade-off (MEM) – Hybrid utilization of both policies 15
  • Evaluation (cont.2)• Workflow optimization 16
  • Evaluation (cont.3)• Workflow optimization – Latency and throughput 17
  • Thank you 18
  • Two Properties of Streams• Unbounded• Accessed sequentially Hard to be handled using traditional DBMS 19
  • Query Operators• Unbounded stateful operators – maintain state with no upper bound in size  run out of memory• Blocking operators – read an entire input before emitting a single output  might never produce a result • Never use them, or • Use them under a refactoring 20
  • Punctuations• Mark the end of substreams – allowing us to view an infinite stream as a mixture of finite streams 21