SlideShare a Scribd company logo
Nam-Luc Tran
Sabri Skhiri
Arthur Lesuisse
Esteban Zimanyi
Presented by: Raminder Kaur
Wayne State University
 Introduction
 Current Programming models
 Framework
 Architecture of AROM
 Features and Execution
 Result
 Future Work
 Conclusion
 References
Wayne State University
This paper proposes a better tradeoff between implicit
distributed programming, job efficiency and openness in the
design of the processing algorithm.
AROM, a new open source distributed processing framework
is introduced in which jobs are defined using directed acyclic
graphs.
Wayne State University
 The Map Reduce model
- Pros and Cons
 The DFG model
- Pros and Cons
 Pipelining Considerations
Wayne State University
Composed of phases:
 Map phase - Map function executed on every key/value
pair of input data which further produces intermediate
key/value pairs.
 Shuffle phase – Intermediate key/value pairs are sorted
and grouped by the key.
 Reduce phase – Reduce function is applied individually
on each key and the associated values of key. This
phase produces zero or one output.
 Intermediate outputs between each phase are stored on
the filesystem.
Wayne State University
 Data Flow Graph is a directed acyclic graph where each vertex
represents a program and edges represent data channels.
 At runtime the vertices of the DFG will be scheduled to run
on the available machines of the cluster.
 Edges represent the flow of the processed data through the
vertices.
 Data communication between the vertices is abstracted from
the user and can be physically implemented in several ways
(e.g. network sockets, local filesystem, asynchronous I/O
mechanisms...).
 No data model is usually imposed, it is up to the user to
handle the input and output formats for each vertex program.
Wayne State University
MapReduce model:
Pros:
 Convenient to work with
 Simple for most users to implement
Cons:
 Sort phase is not necessarily required for each job
 Shuffle phase represents one of the most restrictive weakness of the
MapReduce model.
 Joins in particular have proven to be cumbersome to express.
back
Wayne State University
DFG Model
Pros:
 Provides the opportunity to implement jobs that are not constrained by a strict
Map-Shuffle-Reduce schema.
 As vertex programs are able receive multiple inputs, it is possible to implement
relational operations such as joins in a natural and distributed fashion.
 Since the jobs are not framed in a particular sequence of phases, intermediate
output data do not have to be stored into the filesystem.
 DFG model is less constrained than the MapReduce model. It is more
convenient to compile a job from a higher-level language.
Cons:
 DFG model is more general than the MapReduce model which can be seen as a
restriction on the DFG model.
go back
Wayne State University
Even if we could implement pipelined jobs in MapReduce, this is not as natural
nor efficient when compared to a pipeline defined using DFG where each stage
can be represented by a group of vertices.
 The stages of the the pipelines would consist of Map- only jobs.
 The mandatory shuffle phase may not even be useful for the pipeline stages.
 The back and forth coming of the intermediate data on the distributed
filesystem can also cause a significant performance penalty.
 Additional code is also required for coordinating and chaining the separate
MapReduce stages and in the end the iterations are diluted, making the code
less obvious to understand.
 Each stage of the pipeline in MapReduce needs to wait for the completion of
the previous stage in order to begin the processing.
Wayne State University
The requirements of the framework are the following:
1) Provide a coherent API to express jobs as DFG and use a
programming paradigm which favors reusable and generic
operators.
2) Base the architecture on asynchronous actors and event-
driven constructs which make it suitable for large scale
deployment on Cloud Computing environments.
The vision behind AROM is to provide a generic-purpose
environment for testing and prototyping on distributed parallel
processing jobs and models.
Wayne State University
 Job Definition
 Operators and Data Model
 Processing Stages
 Enforcing Genericity and Reusability of Operators
 Operator Types
 Job Scheduling and Execution
 Specificities
next
Wayne State University
 Job Definition:
- Jobs described by Data Flow Graphs
- A vertex (also called operator), receives data on its input, processes it and emits the
resulting data to the next operator.
- Edges between the vertices depict the flow of the data between the operators.
 Operators and Data Model:
- An operator can receive and emit data on multiple inputs.
- The data form is (i,d) indicating the data “d” is handed through the incoming
edge(i) connected to the upstream operator.
 Processing Stages:
- Two Principal phases : Process and Finish
- Process phase is first phase in which resides the logic. This logic is executed on
the data units as they arrive
- Final phase triggers once all the entries have been received and processed.
- Users can define their own logic and functions.
Wayne State University
 Enforcing Genericity and Reusability of Operators:
- AROM permits to develop generic and reusable operators.
- The generic filtering operator is used to apply a user defined function on the input to
determine if the data should be transmitted to downstream operator.
- Only Predicate Function is required to return TRUE or FALSE depending on the input
data.
 Operator Types:
- Different operator interfaces are available, each differing in their process cycle and in the
way they handle the incoming data.
- Asynchronous: processes the entries as they come from the upstream operators, in a first-
come, first-served basis.
- Synchronous: processes only once when there are entries present at each of its upstream
inputs.
- Vector: A vector operator operates on batches of entries provided by the upstream
operator.
- Scalar: A scalar operator operates on entries one at the time.
Wayne State University
 Job Scheduling and Execution:
- Master-Slave Architecture.
- The scheduling of the operators is based on:
- the synchronous/asynchronous nature of the operator
- the source nature of the operators
- Master : - Select the slave node among those available.
-Send the code of the operator to the selected slave.
-Update all the slaves running a predecessor operator with location of new operator.
 Specificities :
- AROM is implemented using Scala, a functional programming language which compiles to
the Java JVM.
- Enables the usage of anonymous and higher order functions.
- Scala binaries enables to include Java code and reuse the existing libraries.
- Scala also enables the use of the powerful Scala collections API which comprises natural
handling of tuples.
Wayne State University
 Better performance for the AROM framework.
 The MapReduce model is in fact a restriction of the more general DFG
model.
 Using DFG for defining the job leaves more possibility for optimization.
 Compared to the MapReduce version, it was possible to implement at least
two optimizations.
 A third possible optimization would be to directly propagate the count of
the inbound articles from the first iteration to all the following iterations in
order to transmit the results of the computation earlier for each stage.
Wayne State University
 Migrate the implementation to a more scalable stack.
 Plans to further develop the scheduling.
 Add speculative executions scheduling in order to minimize the
impact of an operator failure on performance.
 Scheduler should be able to react to additional resources available in
the pool of workers and extend or reduce the capacity of the
scheduling in scenarios of scaling out and scaling in.
 DFG execution plan should be modifiable at runtime.
Wayne State University
 AROM implementation of a distributed parallel
processing framework using directed acyclic graphs.
 The primary goal of the framework is to provide a
playground for testing and developing on distributed
parallel processing.
 Model is based on the general DFG Processing model.
 Paradigms from functional programming are used.
Wayne State University
 http://doi.acm.org/10.1145/1807167.1807273
 http://doi.acm.org/10.1145/1272998.1273005
 http://doi.acm.org/10.1145/1376616.1376726
 http://dl.acm.org/citation.cfm?id=2008928.2008952
 http://doi.acm.org/10.1145/1809028.1806638
 http: //doi.acm.org/10.1145/1247480.1247602
 http://dl.acm.org/citation.cfm?id=1855741.1855742
 http://doi.acm.org/10.1145/1559845.1559962
 http://dl.acm.org/citation.cfm?id=1863103.1863113
 http://ilpubs.stanford.edu:8090/422/
Wayne State University
Thanks !!!

More Related Content

What's hot

MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
NilaNila16
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
Pankaj Kumar Jain
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
Systolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and ProgrammingSystolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and Programming
Sandip Jassar (sandipjassar@hotmail.com)
 
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
Alessio Bucaioni
 
Flow control in computer
Flow control in computerFlow control in computer
Flow control in computer
rud_d_rcks
 
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
TELKOMNIKA JOURNAL
 
Modeling distribution networks with neplan
Modeling distribution networks with neplanModeling distribution networks with neplan
Modeling distribution networks with neplan
Yusuf A. KHALIL
 
I012255862
I012255862I012255862
I012255862
IOSR Journals
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
Amir Payberah
 
Unit 3 part2
Unit 3 part2Unit 3 part2
Unit 3 part2
Karthik Vivek
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
EMC
 
Traffic Simulator
Traffic SimulatorTraffic Simulator
Traffic Simulator
gystell
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming PrimerSri Prasanna
 
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Philip Goddard
 
Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)
Global R & D Services
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksMumtaz Hannah Vauhkonen
 

What's hot (18)

MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
Systolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and ProgrammingSystolic, Transposed & Semi-Parallel Architectures and Programming
Systolic, Transposed & Semi-Parallel Architectures and Programming
 
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
Raising Abstraction in Timing Analysis for Vehicular Embedded Systems through...
 
Flow control in computer
Flow control in computerFlow control in computer
Flow control in computer
 
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
Application Profiling and Mapping on NoC-based MPSoC Emulation Platform on Re...
 
Modeling distribution networks with neplan
Modeling distribution networks with neplanModeling distribution networks with neplan
Modeling distribution networks with neplan
 
I012255862
I012255862I012255862
I012255862
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
Unit 3 part2
Unit 3 part2Unit 3 part2
Unit 3 part2
 
Taming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data AnalyticsTaming Latency: Case Studies in MapReduce Data Analytics
Taming Latency: Case Studies in MapReduce Data Analytics
 
Traffic Simulator
Traffic SimulatorTraffic Simulator
Traffic Simulator
 
Parallel Programming Primer
Parallel Programming PrimerParallel Programming Primer
Parallel Programming Primer
 
Dependencies
DependenciesDependencies
Dependencies
 
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Taking your machine learning workflow to the next level using Scikit-Learn Pi...
Taking your machine learning workflow to the next level using Scikit-Learn Pi...
 
Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarksVauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
 

Similar to Raminder kaur presentation_two

Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Qualcomm Developer Network
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
WS-VLAM workflow
WS-VLAM workflowWS-VLAM workflow
WS-VLAM workflowguest6295d0
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
IJCI JOURNAL
 
Performance comparison on java technologies a practical approach
Performance comparison on java technologies   a practical approachPerformance comparison on java technologies   a practical approach
Performance comparison on java technologies a practical approach
csandit
 
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACHPERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
cscpconf
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Sara Alvarez
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
CSCJournals
 
Enhance the energy awareness with ant colony optimazation in cloud computing
Enhance the energy awareness with ant colony optimazation in cloud computingEnhance the energy awareness with ant colony optimazation in cloud computing
Enhance the energy awareness with ant colony optimazation in cloud computing
jaygovindchauhan
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
1crore projects
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014
ijcite
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Samrat Jha
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
Robert Grossman
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET Journal
 

Similar to Raminder kaur presentation_two (20)

Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
WS-VLAM workflow
WS-VLAM workflowWS-VLAM workflow
WS-VLAM workflow
 
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
BARRACUDA, AN OPEN SOURCE FRAMEWORK FOR PARALLELIZING DIVIDE AND CONQUER ALGO...
 
Performance comparison on java technologies a practical approach
Performance comparison on java technologies   a practical approachPerformance comparison on java technologies   a practical approach
Performance comparison on java technologies a practical approach
 
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACHPERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
PERFORMANCE COMPARISON ON JAVA TECHNOLOGIES - A PRACTICAL APPROACH
 
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterog...
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Sqlmr
SqlmrSqlmr
Sqlmr
 
Sqlmr
SqlmrSqlmr
Sqlmr
 
Sqlmr
SqlmrSqlmr
Sqlmr
 
Sqlmr
SqlmrSqlmr
Sqlmr
 
Enhance the energy awareness with ant colony optimazation in cloud computing
Enhance the energy awareness with ant colony optimazation in cloud computingEnhance the energy awareness with ant colony optimazation in cloud computing
Enhance the energy awareness with ant colony optimazation in cloud computing
 
PAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph ComputationPAGE: A Partition Aware Engine for Parallel Graph Computation
PAGE: A Partition Aware Engine for Parallel Graph Computation
 
p850-ries
p850-riesp850-ries
p850-ries
 
E5 05 ijcite august 2014
E5 05 ijcite august 2014E5 05 ijcite august 2014
E5 05 ijcite august 2014
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 

Raminder kaur presentation_two

  • 1. Nam-Luc Tran Sabri Skhiri Arthur Lesuisse Esteban Zimanyi Presented by: Raminder Kaur Wayne State University
  • 2.  Introduction  Current Programming models  Framework  Architecture of AROM  Features and Execution  Result  Future Work  Conclusion  References Wayne State University
  • 3. This paper proposes a better tradeoff between implicit distributed programming, job efficiency and openness in the design of the processing algorithm. AROM, a new open source distributed processing framework is introduced in which jobs are defined using directed acyclic graphs. Wayne State University
  • 4.  The Map Reduce model - Pros and Cons  The DFG model - Pros and Cons  Pipelining Considerations Wayne State University
  • 5. Composed of phases:  Map phase - Map function executed on every key/value pair of input data which further produces intermediate key/value pairs.  Shuffle phase – Intermediate key/value pairs are sorted and grouped by the key.  Reduce phase – Reduce function is applied individually on each key and the associated values of key. This phase produces zero or one output.  Intermediate outputs between each phase are stored on the filesystem. Wayne State University
  • 6.
  • 7.  Data Flow Graph is a directed acyclic graph where each vertex represents a program and edges represent data channels.  At runtime the vertices of the DFG will be scheduled to run on the available machines of the cluster.  Edges represent the flow of the processed data through the vertices.  Data communication between the vertices is abstracted from the user and can be physically implemented in several ways (e.g. network sockets, local filesystem, asynchronous I/O mechanisms...).  No data model is usually imposed, it is up to the user to handle the input and output formats for each vertex program. Wayne State University
  • 8. MapReduce model: Pros:  Convenient to work with  Simple for most users to implement Cons:  Sort phase is not necessarily required for each job  Shuffle phase represents one of the most restrictive weakness of the MapReduce model.  Joins in particular have proven to be cumbersome to express. back Wayne State University
  • 9. DFG Model Pros:  Provides the opportunity to implement jobs that are not constrained by a strict Map-Shuffle-Reduce schema.  As vertex programs are able receive multiple inputs, it is possible to implement relational operations such as joins in a natural and distributed fashion.  Since the jobs are not framed in a particular sequence of phases, intermediate output data do not have to be stored into the filesystem.  DFG model is less constrained than the MapReduce model. It is more convenient to compile a job from a higher-level language. Cons:  DFG model is more general than the MapReduce model which can be seen as a restriction on the DFG model. go back Wayne State University
  • 10.
  • 11. Even if we could implement pipelined jobs in MapReduce, this is not as natural nor efficient when compared to a pipeline defined using DFG where each stage can be represented by a group of vertices.  The stages of the the pipelines would consist of Map- only jobs.  The mandatory shuffle phase may not even be useful for the pipeline stages.  The back and forth coming of the intermediate data on the distributed filesystem can also cause a significant performance penalty.  Additional code is also required for coordinating and chaining the separate MapReduce stages and in the end the iterations are diluted, making the code less obvious to understand.  Each stage of the pipeline in MapReduce needs to wait for the completion of the previous stage in order to begin the processing. Wayne State University
  • 12. The requirements of the framework are the following: 1) Provide a coherent API to express jobs as DFG and use a programming paradigm which favors reusable and generic operators. 2) Base the architecture on asynchronous actors and event- driven constructs which make it suitable for large scale deployment on Cloud Computing environments. The vision behind AROM is to provide a generic-purpose environment for testing and prototyping on distributed parallel processing jobs and models. Wayne State University
  • 13.  Job Definition  Operators and Data Model  Processing Stages  Enforcing Genericity and Reusability of Operators  Operator Types  Job Scheduling and Execution  Specificities next Wayne State University
  • 14.  Job Definition: - Jobs described by Data Flow Graphs - A vertex (also called operator), receives data on its input, processes it and emits the resulting data to the next operator. - Edges between the vertices depict the flow of the data between the operators.  Operators and Data Model: - An operator can receive and emit data on multiple inputs. - The data form is (i,d) indicating the data “d” is handed through the incoming edge(i) connected to the upstream operator.  Processing Stages: - Two Principal phases : Process and Finish - Process phase is first phase in which resides the logic. This logic is executed on the data units as they arrive - Final phase triggers once all the entries have been received and processed. - Users can define their own logic and functions. Wayne State University
  • 15.  Enforcing Genericity and Reusability of Operators: - AROM permits to develop generic and reusable operators. - The generic filtering operator is used to apply a user defined function on the input to determine if the data should be transmitted to downstream operator. - Only Predicate Function is required to return TRUE or FALSE depending on the input data.  Operator Types: - Different operator interfaces are available, each differing in their process cycle and in the way they handle the incoming data. - Asynchronous: processes the entries as they come from the upstream operators, in a first- come, first-served basis. - Synchronous: processes only once when there are entries present at each of its upstream inputs. - Vector: A vector operator operates on batches of entries provided by the upstream operator. - Scalar: A scalar operator operates on entries one at the time. Wayne State University
  • 16.  Job Scheduling and Execution: - Master-Slave Architecture. - The scheduling of the operators is based on: - the synchronous/asynchronous nature of the operator - the source nature of the operators - Master : - Select the slave node among those available. -Send the code of the operator to the selected slave. -Update all the slaves running a predecessor operator with location of new operator.  Specificities : - AROM is implemented using Scala, a functional programming language which compiles to the Java JVM. - Enables the usage of anonymous and higher order functions. - Scala binaries enables to include Java code and reuse the existing libraries. - Scala also enables the use of the powerful Scala collections API which comprises natural handling of tuples. Wayne State University
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.  Better performance for the AROM framework.  The MapReduce model is in fact a restriction of the more general DFG model.  Using DFG for defining the job leaves more possibility for optimization.  Compared to the MapReduce version, it was possible to implement at least two optimizations.  A third possible optimization would be to directly propagate the count of the inbound articles from the first iteration to all the following iterations in order to transmit the results of the computation earlier for each stage. Wayne State University
  • 22.  Migrate the implementation to a more scalable stack.  Plans to further develop the scheduling.  Add speculative executions scheduling in order to minimize the impact of an operator failure on performance.  Scheduler should be able to react to additional resources available in the pool of workers and extend or reduce the capacity of the scheduling in scenarios of scaling out and scaling in.  DFG execution plan should be modifiable at runtime. Wayne State University
  • 23.  AROM implementation of a distributed parallel processing framework using directed acyclic graphs.  The primary goal of the framework is to provide a playground for testing and developing on distributed parallel processing.  Model is based on the general DFG Processing model.  Paradigms from functional programming are used. Wayne State University
  • 24.  http://doi.acm.org/10.1145/1807167.1807273  http://doi.acm.org/10.1145/1272998.1273005  http://doi.acm.org/10.1145/1376616.1376726  http://dl.acm.org/citation.cfm?id=2008928.2008952  http://doi.acm.org/10.1145/1809028.1806638  http: //doi.acm.org/10.1145/1247480.1247602  http://dl.acm.org/citation.cfm?id=1855741.1855742  http://doi.acm.org/10.1145/1559845.1559962  http://dl.acm.org/citation.cfm?id=1863103.1863113  http://ilpubs.stanford.edu:8090/422/ Wayne State University