SlideShare a Scribd company logo
1 of 22
C-MR: Continuously Executing
MapReduce Workflows on Multi-
       Core Processors

         Speaker: LIN Qian
http://www.comp.nus.edu.sg/~linqian
Problem
• Stream applications are often time-critical
• Enabling stream support for MapReduce
  jobs
  – Simple for the Map operations
  – Hard for the Reduce operations
• Continuously executing MapReduce
  workflows requires a great deal of
  coordination
                                                1
C-MR Workflow




• Windows: temporal subdivisions of a stream
 described by
  – size (the amount of the stream spanning)
  – slide (the interval between windows)
                                               2
C-MR Programming Interface
• Map/Reduce operations
C-MR Programming Interface (cont.1)
• Input/Output streams
C-MR Programming Interface (cont.2)
• Create workflows of continuous
  MapReduce jobs
C-MR vs. MapReduce
• MapReduce computing nodes receive a set of
  Map or Reduce tasks and each node must wait
  for all other nodes to complete their tasks
  before being allocated additional tasks.
• C-MR uses pull-based data acquisition allowing
  computing nodes to execute any Map or
  Reduce workload as they are able. Thus,
  straggling nodes will not hinder the progress of
  the other nodes if there is data available to
  process elsewhere in the workflow.
                                                     6
C-MR Architecture




                    7
Stream and Window Management
• The merged output streams are not
  guaranteed to retain their original
  orderings.
• Solution: Replicating window-bounding
  punctuations
Stream and Window Management (cont.1)




 A node consumes the punctuation from the sorted input
 stream-buffer
                                                         9
Stream and Window Management (cont.2)




 Replicate that punctuation to the other nodes
Stream and Window Management (cont.3)




 After all replicas are received at the intermediate buffer,
 collect data whose timestamps fall into the applicable
 interval and materialize them as a window
Operator Scheduling
• Scheduling framework
  – Execute multiple policies simultaneously
  – Transition between policies based on
    resource availability
• Scheduling policies
Incremental Computation

Output1 = d1 + d2 + d3 + ... + dn
Output2 = d2 + d3 + d4 + ... + dn+1
Output3 = d3 + d4 + d5 + ... + dn+2
Output4 = d4 + d5 + d6 + ... + dn+3

Share the common data subset of computation
Evaluation
• Continuously executing a MapReduce job
  – Compare with Phoenix++




                                           14
Evaluation (cont.1)
• Operator scheduling
  – Oldest data first (ODF)
  – Best memory trade-off (MEM)
  – Hybrid utilization of both policies




                                          15
Evaluation (cont.2)
• Workflow optimization




                                16
Evaluation (cont.3)
• Workflow optimization
  – Latency and throughput




                                 17
Thank you




            18
Two Properties of Streams
• Unbounded
• Accessed sequentially



   Hard to be handled using traditional DBMS




                                               19
Query Operators
• Unbounded stateful operators
  – maintain state with no upper bound in size
   run out of memory
• Blocking operators
  – read an entire input before emitting a
    single output
   might never produce a result
 • Never use them, or
 • Use them under a refactoring
                                             20
Punctuations
• Mark the end of substreams
  – allowing us to view an infinite stream as a
    mixture of finite streams




                                                  21

More Related Content

What's hot

Applications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentApplications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentSterling Geo
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceShantanu Sharma
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCAAbhiSaxena16
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Computer center lab
Computer center labComputer center lab
Computer center labManoj Jhawar
 
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETA BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETcsandit
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OSVedant Mane
 

What's hot (15)

Applications of FME in a Consultant Environment
Applications of FME in a Consultant EnvironmentApplications of FME in a Consultant Environment
Applications of FME in a Consultant Environment
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
2D_BitBlt_Scale
2D_BitBlt_Scale2D_BitBlt_Scale
2D_BitBlt_Scale
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Vector computing
Vector computingVector computing
Vector computing
 
Memory management based on MCA
Memory management  based on MCAMemory management  based on MCA
Memory management based on MCA
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Aca2 08 new
Aca2 08 newAca2 08 new
Aca2 08 new
 
Computer center lab
Computer center labComputer center lab
Computer center lab
 
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SETA BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
 
Aca2 09 new
Aca2 09 newAca2 09 new
Aca2 09 new
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
 

Viewers also liked

In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingQian Lin
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudQian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudQian Lin
 

Viewers also liked (7)

In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 

Similar to C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPCinside-BigData.com
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Pregel reading circle
Pregel reading circlePregel reading circle
Pregel reading circlecharlingual
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)Sudarshan Mondal
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsEMC
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureHeechul Yun
 

Similar to C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors (20)

Disco workshop
Disco workshopDisco workshop
Disco workshop
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
try
trytry
try
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
The Legion Programming Model for HPC
The Legion Programming Model for HPCThe Legion Programming Model for HPC
The Legion Programming Model for HPC
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Pregel reading circle
Pregel reading circlePregel reading circle
Pregel reading circle
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Architectures for parallel
Architectures for parallelArchitectures for parallel
Architectures for parallel
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Lec 4 (program and network properties)
Lec 4 (program and network properties)Lec 4 (program and network properties)
Lec 4 (program and network properties)
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Pregel
PregelPregel
Pregel
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System ArchitectureDeterministic Memory Abstraction and Supporting Multicore System Architecture
Deterministic Memory Abstraction and Supporting Multicore System Architecture
 

Recently uploaded

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Recently uploaded (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 

C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors

  • 1. C-MR: Continuously Executing MapReduce Workflows on Multi- Core Processors Speaker: LIN Qian http://www.comp.nus.edu.sg/~linqian
  • 2. Problem • Stream applications are often time-critical • Enabling stream support for MapReduce jobs – Simple for the Map operations – Hard for the Reduce operations • Continuously executing MapReduce workflows requires a great deal of coordination 1
  • 3. C-MR Workflow • Windows: temporal subdivisions of a stream described by – size (the amount of the stream spanning) – slide (the interval between windows) 2
  • 4. C-MR Programming Interface • Map/Reduce operations
  • 5. C-MR Programming Interface (cont.1) • Input/Output streams
  • 6. C-MR Programming Interface (cont.2) • Create workflows of continuous MapReduce jobs
  • 7. C-MR vs. MapReduce • MapReduce computing nodes receive a set of Map or Reduce tasks and each node must wait for all other nodes to complete their tasks before being allocated additional tasks. • C-MR uses pull-based data acquisition allowing computing nodes to execute any Map or Reduce workload as they are able. Thus, straggling nodes will not hinder the progress of the other nodes if there is data available to process elsewhere in the workflow. 6
  • 9. Stream and Window Management • The merged output streams are not guaranteed to retain their original orderings. • Solution: Replicating window-bounding punctuations
  • 10. Stream and Window Management (cont.1) A node consumes the punctuation from the sorted input stream-buffer 9
  • 11. Stream and Window Management (cont.2) Replicate that punctuation to the other nodes
  • 12. Stream and Window Management (cont.3) After all replicas are received at the intermediate buffer, collect data whose timestamps fall into the applicable interval and materialize them as a window
  • 13. Operator Scheduling • Scheduling framework – Execute multiple policies simultaneously – Transition between policies based on resource availability • Scheduling policies
  • 14. Incremental Computation Output1 = d1 + d2 + d3 + ... + dn Output2 = d2 + d3 + d4 + ... + dn+1 Output3 = d3 + d4 + d5 + ... + dn+2 Output4 = d4 + d5 + d6 + ... + dn+3 Share the common data subset of computation
  • 15. Evaluation • Continuously executing a MapReduce job – Compare with Phoenix++ 14
  • 16. Evaluation (cont.1) • Operator scheduling – Oldest data first (ODF) – Best memory trade-off (MEM) – Hybrid utilization of both policies 15
  • 18. Evaluation (cont.3) • Workflow optimization – Latency and throughput 17
  • 19. Thank you 18
  • 20. Two Properties of Streams • Unbounded • Accessed sequentially Hard to be handled using traditional DBMS 19
  • 21. Query Operators • Unbounded stateful operators – maintain state with no upper bound in size  run out of memory • Blocking operators – read an entire input before emitting a single output  might never produce a result • Never use them, or • Use them under a refactoring 20
  • 22. Punctuations • Mark the end of substreams – allowing us to view an infinite stream as a mixture of finite streams 21

Editor's Notes

  1. Repeatedly invoking a Phoenix++ MapReduce job over a stream results in many redundant computations (at both Map and Reduce operations). C-MR allows data to be processed only once by Map and the inclusion of the Combine operator significantly decreases redundant work performed at the Reduce operator.
  2. 1. Data is often generated from a source that can potentially produce an unbounded stream.2. A stream’s contents can only be accessed sequentially.Traditional queries are comprised of relational operators that assume a finite data source that can be accessed randomly.