SlideShare a Scribd company logo
1 of 22
C-MR: Continuously Executing
MapReduce Workflows on Multi-
       Core Processors

         Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian
Problem
• Stream applications are often time-critical
• Enabling stream support for MapReduce
  jobs
  – Simple for the Map operations
  – Hard for the Reduce operations
• Continuously executing MapReduce
  workflows requires a great deal of
  coordination
                                                1
C-MR Workflow




• Windows: temporal subdivisions of a stream
 described by
  – size (the amount of the stream spanning)
  – slide (the interval between windows)
                                               2
C-MR Programming Interface
• Map/Reduce operations
C-MR Programming Interface (cont.1)
• Input/Output streams
C-MR Programming Interface (cont.2)
• Create workflows of continuous
  MapReduce jobs
C-MR vs. MapReduce
• MapReduce computing nodes receive a set of
  Map or Reduce tasks and each node must wait
  for all other nodes to complete their tasks
  before being allocated additional tasks.
• C-MR uses pull-based data acquisition allowing
  computing nodes to execute any Map or
  Reduce workload as they are able. Thus,
  straggling nodes will not hinder the progress of
  the other nodes if there is data available to
  process elsewhere in the workflow.
                                                     6
C-MR Architecture




                    7
Stream and Window Management
• The merged output streams are not
  guaranteed to retain their original
  orderings.
• Solution: Replicating window-bounding
  punctuations
Stream and Window Management (cont.1)




 A node consumes the punctuation from the sorted input
 stream-buffer
                                                         9
Stream and Window Management (cont.2)




 Replicate that punctuation to the other nodes
Stream and Window Management (cont.3)




 After all replicas are received at the intermediate buffer,
 collect data whose timestamps fall into the applicable
 interval and materialize them as a window
Operator Scheduling
• Scheduling framework
  – Execute multiple policies simultaneously
  – Transition between policies based on
    resource availability
• Scheduling policies
Incremental Computation

Output1 = d1 + d2 + d3 + ... + dn
Output2 = d2 + d3 + d4 + ... + dn+1
Output3 = d3 + d4 + d5 + ... + dn+2
Output4 = d4 + d5 + d6 + ... + dn+3

Share the common data subset of computation
Evaluation
• Continuously executing a MapReduce job
  – Compare with Phoenix++




                                           14
Evaluation (cont.1)
• Operator scheduling
  – Oldest data first (ODF)
  – Best memory trade-off (MEM)
  – Hybrid utilization of both policies




                                          15
Evaluation (cont.2)
• Workflow optimization




                                16
Evaluation (cont.3)
• Workflow optimization
  – Latency and throughput




                                 17
Thank you




            18
Two Properties of Streams
• Unbounded
• Accessed sequentially



   Hard to be handled using traditional DBMS




                                               19
Query Operators
• Unbounded stateful operators
  – maintain state with no upper bound in size
   run out of memory
• Blocking operators
  – read an entire input before emitting a
    single output
   might never produce a result
 • Never use them, or
 • Use them under a refactoring
                                             20
Punctuations
• Mark the end of substreams
  – allowing us to view an infinite stream as a
    mixture of finite streams




                                                  21

More Related Content

What's hot

Applications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentApplications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentSterling Geo
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceShantanu Sharma
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCAAbhiSaxena16
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Computer center lab
Computer center labComputer center lab
Computer center labManoj Jhawar
 
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETA BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OSVedant Mane
 

What's hot (15)

Applications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentApplications of FME in a Consultant Environment
Applications of FME in a Consultant Environment
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
2D_BitBlt_Scale
2D_BitBlt_Scale2D_BitBlt_Scale
2D_BitBlt_Scale
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Vector computing
Vector computingVector computing
Vector computing
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCA
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Aca2 08 new
Aca2 08 newAca2 08 new
Aca2 08 new
 
Computer center lab
Computer center labComputer center lab
Computer center lab
 
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETA BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
 
Aca2 09 new
Aca2 09 newAca2 09 new
Aca2 09 new
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
 

Viewers also liked

In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingQian Lin
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudQian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudQian Lin
 

Viewers also liked (7)

In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 

Similar to C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPCinside-BigData.com
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Pregel reading circle
Pregel reading circlePregel reading circle
Pregel reading circlecharlingual
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)Sudarshan Mondal
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsEMC
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 

Similar to C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors (20)

Disco workshop
Disco workshopDisco workshop
Disco workshop
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
try
trytry
try
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPC
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Pregel reading circle
Pregel reading circlePregel reading circle
Pregel reading circle
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Architectures for parallel
Architectures for parallelArchitectures for parallel
Architectures for parallel
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Pregel
PregelPregel
Pregel
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

  • 1. C-MR: Continuously Executing MapReduce Workflows on Multi- Core Processors Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian
  • 2. Problem • Stream applications are often time-critical • Enabling stream support for MapReduce jobs – Simple for the Map operations – Hard for the Reduce operations • Continuously executing MapReduce workflows requires a great deal of coordination 1
  • 3. C-MR Workflow • Windows: temporal subdivisions of a stream described by – size (the amount of the stream spanning) – slide (the interval between windows) 2
  • 4. C-MR Programming Interface • Map/Reduce operations
  • 5. C-MR Programming Interface (cont.1) • Input/Output streams
  • 6. C-MR Programming Interface (cont.2) • Create workflows of continuous MapReduce jobs
  • 7. C-MR vs. MapReduce • MapReduce computing nodes receive a set of Map or Reduce tasks and each node must wait for all other nodes to complete their tasks before being allocated additional tasks. • C-MR uses pull-based data acquisition allowing computing nodes to execute any Map or Reduce workload as they are able. Thus, straggling nodes will not hinder the progress of the other nodes if there is data available to process elsewhere in the workflow. 6
  • 9. Stream and Window Management • The merged output streams are not guaranteed to retain their original orderings. • Solution: Replicating window-bounding punctuations
  • 10. Stream and Window Management (cont.1) A node consumes the punctuation from the sorted input stream-buffer 9
  • 11. Stream and Window Management (cont.2) Replicate that punctuation to the other nodes
  • 12. Stream and Window Management (cont.3) After all replicas are received at the intermediate buffer, collect data whose timestamps fall into the applicable interval and materialize them as a window
  • 13. Operator Scheduling • Scheduling framework – Execute multiple policies simultaneously – Transition between policies based on resource availability • Scheduling policies
  • 14. Incremental Computation Output1 = d1 + d2 + d3 + ... + dn Output2 = d2 + d3 + d4 + ... + dn+1 Output3 = d3 + d4 + d5 + ... + dn+2 Output4 = d4 + d5 + d6 + ... + dn+3 Share the common data subset of computation
  • 15. Evaluation • Continuously executing a MapReduce job – Compare with Phoenix++ 14
  • 16. Evaluation (cont.1) • Operator scheduling – Oldest data first (ODF) – Best memory trade-off (MEM) – Hybrid utilization of both policies 15
  • 18. Evaluation (cont.3) • Workflow optimization – Latency and throughput 17
  • 19. Thank you 18
  • 20. Two Properties of Streams • Unbounded • Accessed sequentially Hard to be handled using traditional DBMS 19
  • 21. Query Operators • Unbounded stateful operators – maintain state with no upper bound in size  run out of memory • Blocking operators – read an entire input before emitting a single output  might never produce a result • Never use them, or • Use them under a refactoring 20
  • 22. Punctuations • Mark the end of substreams – allowing us to view an infinite stream as a mixture of finite streams 21

Editor's Notes

  1. Repeatedly invoking a Phoenix++ MapReduce job over a stream results in many redundant computations (at both Map and Reduce operations). C-MR allows data to be processed only once by Map and the inclusion of the Combine operator significantly decreases redundant work performed at the Reduce operator.
  2. 1. Data is often generated from a source that can potentially produce an unbounded stream.2. A stream’s contents can only be accessed sequentially.Traditional queries are comprised of relational operators that assume a finite data source that can be accessed randomly.