SlideShare a Scribd company logo
Parallel Execution of Model
Management Programs
Sina Madani
supported through CROSSMINER EC project
supervised by Dr. Dimitris Kolovos and Prof. Richard Paige
Introduction
 Motivation for the project
 Brief literature review
 Proposed Solution
 Work so far
 Expected contributions
 Plan for Evaluation
Problem and Motivation
 Model-Driven Engineering (MDE) used in industrial contexts
 e.g. automotive, civil engineering, reverse-engineered code
 Current tools not built to handle Very Large Models (VLMs)
 “Very Large” meaning millions of elements
 Scalability is a widely cited concern with MDE
 Execution of programs with VLMs as input is very slow
 Almost all mainstream MDE tools are single-threaded in execution
“Model Management”
 Simply doing things with models!
 Querying
 Validation
 Model-to-Model Transformations (M2M)
 Model-to-Text Transformations (M2T) / code generation
 Comparison
 Merging
 Migration
Related Works – overview
 Increasing interest in improving performance of MDE tools
 Main approaches:
 Incrementality – only computing the delta of changes (caching)
 Laziness – delaying computation until it can no longer be avoided
 Reactivity – event-driven computations (incremental and lazy)
 Parallelism – splitting the computation across multiple threads
 Distribution – using multiple computers to perform the computation
Incrementality
 The most explored solution (especially in graph transformations)
 Querying / Pattern matching (Bergmann, Varró, Ráth...)
 Model-to-Model transformations (Jouault & Tisi, 2010)
 Model-to-Text transformations (Ogunyomi, 2016)
 Model Validation (Cabot & Teniente, 2006)
 Partial evaluation (Razavi & Kontogiannis, 2012)
Lazy and Reactive approaches
 Lazy ATL (Tisi et al., 2011)
 Navigation (source) and Generation (target)
 Lazy OCL (Tisi et al., 2015)
 Includes lazy collections based on iterators
 Reactive ATL (Pérez et al., 2015)
 Automated model transformations
 VIATRA-3 (Bergmann et al., 2015)
 Event-driven incremental model transformations
 Suitable for real-time applications
Parallel and Distributed
 A number of works on parallel graph transformations
 LinTra (Burgueño et al., 2013 / 2015)
 Concurrent model transformation by streaming from tuple space
 Parallel ATL (Tisi et al., 2013)
 Task-parallel approach, 2.2x speedup with 4 cores / threads
 Distributed ATL based on MapReduce (Benelallam et al., 2015)
 Data-parallel approach, ~3x speedup with 8 nodes
 Efficient Model Partitioning (Benelallam et al., 2016)
Un(der)-explored areas
 Most work focused on model-to-model transformations
 Distributed execution on multi-threaded machines
 Combining parallelism with incrementality/laziness
 Using GPUs for accelerating execution
Incremental Lazy Parallel Distributed GPGPU Reactive
Querying Too many to
name!
ATL ATL ATL-MR
IncQuery-D
Reactive-ATL
Viatra3
M2M Too many to
name!
ATL ATL
LinTra
ATL
LinTra
ATL
Viatra3
Validation OCL OCL OCL
EVL (partial)
M2T EGL
Comparison
Dark Green: Main focus areas of research
Light Green: Potential interest / contributions of research (if time permits)
Grey: Unlikely to be within research scope
Proposed Solution – overview
 Concurrent execution of model management tasks
 Using Epsilon as implementation testbed
 Offers DSLs for various tasks based on a common language
 Hybrid (imperative/declarative) DSLs
 Convenient for generalising concurrency concerns across tasks
 Open-source Eclipse project (intend to merge our changes)
 Minimising impact to existing codebase is a concern
Epsilon at a glance
http://eclipse.org/epsilon
Proposed Solution – challenges
 Concurrent modification
 Dependencies pose a problem – require synchronisation or duplication
 Parallelisation of imperative constructs
 EOL potentially allows execution of any Java program!
 Can limit scope to pure functions on collections
 GPU acceleration requires a limited programming model
 Low-level APIs
 Only primitives and one-dimensional arrays as data
 No / limited branching logic
Preliminary Work
 Focused on Epsilon Validation Language (EVL)
 Read-only model (except fixes), so simplifies concurrency
 Element (data) and constraint (task) parallel
 Implemented and tested multiple parallel approaches
 ThreadPoolExecutor, concurrent collections... (minimal synchronization)
 Equivalence testing with original implementation
 Results show promising speed-ups (~3x with 4 threads)
 Tests seem to be passing so far
EVL example – couples in movies
context Couple {
constraint twoDifferentPeople {
guard: self.commonMovies.size() > 5
check {
if (self.p1.name == self.p2.name) {
return not self.p1.movies.includesAll(self.p2.movies);
}
return true;
}
message: "Couple contains the same person!"
}
}
Future Work & Expected Contributions
 Concurrent implementation of EVL, ECL, EPL, EGX
 and likely ETL for comparing our results and approach to other works
 Investigation into GPU acceleration of model management programs
 Investigation into distributed parallelism of model management tasks
 Combining parallel execution with laziness and incrementality
Evaluation Plan
 Two primary aspects:
 Correctness – does our implementation behave as it should?
 Performance – how much faster is our approach/implementation?
 Testing on very large models and complex programs
 Though finding them is proving to be a challenge!
 Equivalence testing concurrent / non-concurrent implementations
 Also comparing with other tools for consistency
 Requires writing semantically identical scripts in other languages
Current Status
 Started in January
 Mostly literature review
 Finalising tests for Parallel EVL v1.0
 More scripts and models which can be re-used for evaluating other tasks
 Testing with lots of cores / threads on computing cluster
 Looking into distributed processing (Spark, Hadoop, Kafka...)
 Next tasks: Pattern-matching (EPL), Model-to-Text (EGX)
 Intend to see how much of Parallel EVL approach can be re-used
Summary
 Current model management execution engines are inefficient
 Mostly single-threaded, sequential
 Laziness and incrementality are desirable but insufficient
 Concurrency is hard!
Questions?
Thank you for listening!
Contact:
Sina Madani
sm1748@york.ac.uk
Thread 1 Thread 2 Thread N
Results
Concurrent
Collections
Merge
Batch
jobs
ExecutorService
Submit
Proposed Solution – justification
 Why parallelism?
 Single-threaded CPU performance relatively stagnant
 All general-purpose CPUs now multi-core (increasing thread counts)
 Distributed / cloud computing resources more ubiquitous
 Data-parallel approach (SIMD)
 Scalability concerns are with # of model elements
 Allows for partial distribution of data per processing unit
 Minimises synchronisation / contention -> better performance
 Suitable for stream processing (e.g. GPUs, various frameworks)
 Avoids rule dependencies

More Related Content

What's hot

Parallel Computing in .NET
Parallel Computing in .NETParallel Computing in .NET
Parallel Computing in .NET
meghantaylor
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
Katy Lee
 
Epsilon
EpsilonEpsilon
IPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
IPL: An Integration Property Language for Multi-Model Cyber-Physical SystemsIPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
IPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
Ivan Ruchkin
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processes
manishthaper
 
Libro de MATLAB
Libro de MATLABLibro de MATLAB
Libro de MATLAB
guestecaca7
 
Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujarati
Ashvin Nakum
 
Story story ppt
Story story pptStory story ppt
Story story ppt
Pooja Patil
 
What every Eclipse developer should know about EMF
What every Eclipse developer should know about EMFWhat every Eclipse developer should know about EMF
What every Eclipse developer should know about EMFPhilip Langer
 
Basic concepts of parallelization
Basic concepts of parallelizationBasic concepts of parallelization
Basic concepts of parallelization
snegacmr
 
Functional programming ideas in python
Functional programming ideas in pythonFunctional programming ideas in python
Functional programming ideas in python
Manish Tomar
 
Neo4EMF eclipsecon 2013
Neo4EMF eclipsecon 2013Neo4EMF eclipsecon 2013
Neo4EMF eclipsecon 2013
Amine Benelallam
 
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
Elizabeth Steiner
 
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
Nutchanon Ninyawee
 

What's hot (14)

Parallel Computing in .NET
Parallel Computing in .NETParallel Computing in .NET
Parallel Computing in .NET
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
Epsilon
EpsilonEpsilon
Epsilon
 
IPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
IPL: An Integration Property Language for Multi-Model Cyber-Physical SystemsIPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
IPL: An Integration Property Language for Multi-Model Cyber-Physical Systems
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processes
 
Libro de MATLAB
Libro de MATLABLibro de MATLAB
Libro de MATLAB
 
Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujarati
 
Story story ppt
Story story pptStory story ppt
Story story ppt
 
What every Eclipse developer should know about EMF
What every Eclipse developer should know about EMFWhat every Eclipse developer should know about EMF
What every Eclipse developer should know about EMF
 
Basic concepts of parallelization
Basic concepts of parallelizationBasic concepts of parallelization
Basic concepts of parallelization
 
Functional programming ideas in python
Functional programming ideas in pythonFunctional programming ideas in python
Functional programming ideas in python
 
Neo4EMF eclipsecon 2013
Neo4EMF eclipsecon 2013Neo4EMF eclipsecon 2013
Neo4EMF eclipsecon 2013
 
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2
 
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
O2DESUltra - a new appoach to acceleration simulation-opimization using machi...
 

Similar to Parallel Execution of Model Management Programs (STAF 2017)

C# Parallel programming
C# Parallel programmingC# Parallel programming
C# Parallel programmingUmeshwaran V
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
ijpla
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
ijdpsjournal
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
ijdpsjournal
 
Patching the gap in collaborating on models
Patching the gap in collaborating on modelsPatching the gap in collaborating on models
Patching the gap in collaborating on models
Ábel Hegedüs
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
Mahsa ahmadi (1)
Mahsa ahmadi (1)Mahsa ahmadi (1)
Mahsa ahmadi (1)
Mahsa Ahmadi
 
Advanced computer architecture unit 5
Advanced computer architecture  unit 5Advanced computer architecture  unit 5
Advanced computer architecture unit 5
Kunal Bangar
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
Arnaud Rachez
 
A Program Transformation Technique to Support Aspect-Oriented Programming wit...
A Program Transformation Technique to Support Aspect-Oriented Programming wit...A Program Transformation Technique to Support Aspect-Oriented Programming wit...
A Program Transformation Technique to Support Aspect-Oriented Programming wit...
Sabrina Ball
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester
Rafi Ullah
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Subhajit Sahu
 
GoF Design patterns I: Introduction + Structural Patterns
GoF Design patterns I:   Introduction + Structural PatternsGoF Design patterns I:   Introduction + Structural Patterns
GoF Design patterns I: Introduction + Structural Patterns
Sameh Deabes
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
Malobe Lottin Cyrille Marcel
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
CSCJournals
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
IRJET Journal
 
Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel Hardware
Joel Falcou
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJAX London
 

Similar to Parallel Execution of Model Management Programs (STAF 2017) (20)

C# Parallel programming
C# Parallel programmingC# Parallel programming
C# Parallel programming
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
 
LOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONSLOCK-FREE PARALLEL ACCESS COLLECTIONS
LOCK-FREE PARALLEL ACCESS COLLECTIONS
 
Lock free parallel access collections
Lock free parallel access collectionsLock free parallel access collections
Lock free parallel access collections
 
Patching the gap in collaborating on models
Patching the gap in collaborating on modelsPatching the gap in collaborating on models
Patching the gap in collaborating on models
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
Mahsa ahmadi (1)
Mahsa ahmadi (1)Mahsa ahmadi (1)
Mahsa ahmadi (1)
 
Advanced computer architecture unit 5
Advanced computer architecture  unit 5Advanced computer architecture  unit 5
Advanced computer architecture unit 5
 
Scalable machine learning
Scalable machine learningScalable machine learning
Scalable machine learning
 
A Program Transformation Technique to Support Aspect-Oriented Programming wit...
A Program Transformation Technique to Support Aspect-Oriented Programming wit...A Program Transformation Technique to Support Aspect-Oriented Programming wit...
A Program Transformation Technique to Support Aspect-Oriented Programming wit...
 
2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester2 parallel processing presentation ph d 1st semester
2 parallel processing presentation ph d 1st semester
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesIs Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
 
GoF Design patterns I: Introduction + Structural Patterns
GoF Design patterns I:   Introduction + Structural PatternsGoF Design patterns I:   Introduction + Structural Patterns
GoF Design patterns I: Introduction + Structural Patterns
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Aq4301224227
Aq4301224227Aq4301224227
Aq4301224227
 
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
An Adjacent Analysis of the Parallel Programming Model Perspective: A Survey
 
Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel Hardware
 
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben EvansJava Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
Java Core | Modern Java Concurrency | Martijn Verburg & Ben Evans
 

Recently uploaded

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 

Recently uploaded (20)

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 

Parallel Execution of Model Management Programs (STAF 2017)

  • 1. Parallel Execution of Model Management Programs Sina Madani supported through CROSSMINER EC project supervised by Dr. Dimitris Kolovos and Prof. Richard Paige
  • 2. Introduction  Motivation for the project  Brief literature review  Proposed Solution  Work so far  Expected contributions  Plan for Evaluation
  • 3. Problem and Motivation  Model-Driven Engineering (MDE) used in industrial contexts  e.g. automotive, civil engineering, reverse-engineered code  Current tools not built to handle Very Large Models (VLMs)  “Very Large” meaning millions of elements  Scalability is a widely cited concern with MDE  Execution of programs with VLMs as input is very slow  Almost all mainstream MDE tools are single-threaded in execution
  • 4. “Model Management”  Simply doing things with models!  Querying  Validation  Model-to-Model Transformations (M2M)  Model-to-Text Transformations (M2T) / code generation  Comparison  Merging  Migration
  • 5. Related Works – overview  Increasing interest in improving performance of MDE tools  Main approaches:  Incrementality – only computing the delta of changes (caching)  Laziness – delaying computation until it can no longer be avoided  Reactivity – event-driven computations (incremental and lazy)  Parallelism – splitting the computation across multiple threads  Distribution – using multiple computers to perform the computation
  • 6. Incrementality  The most explored solution (especially in graph transformations)  Querying / Pattern matching (Bergmann, Varró, Ráth...)  Model-to-Model transformations (Jouault & Tisi, 2010)  Model-to-Text transformations (Ogunyomi, 2016)  Model Validation (Cabot & Teniente, 2006)  Partial evaluation (Razavi & Kontogiannis, 2012)
  • 7. Lazy and Reactive approaches  Lazy ATL (Tisi et al., 2011)  Navigation (source) and Generation (target)  Lazy OCL (Tisi et al., 2015)  Includes lazy collections based on iterators  Reactive ATL (Pérez et al., 2015)  Automated model transformations  VIATRA-3 (Bergmann et al., 2015)  Event-driven incremental model transformations  Suitable for real-time applications
  • 8. Parallel and Distributed  A number of works on parallel graph transformations  LinTra (Burgueño et al., 2013 / 2015)  Concurrent model transformation by streaming from tuple space  Parallel ATL (Tisi et al., 2013)  Task-parallel approach, 2.2x speedup with 4 cores / threads  Distributed ATL based on MapReduce (Benelallam et al., 2015)  Data-parallel approach, ~3x speedup with 8 nodes  Efficient Model Partitioning (Benelallam et al., 2016)
  • 9. Un(der)-explored areas  Most work focused on model-to-model transformations  Distributed execution on multi-threaded machines  Combining parallelism with incrementality/laziness  Using GPUs for accelerating execution
  • 10. Incremental Lazy Parallel Distributed GPGPU Reactive Querying Too many to name! ATL ATL ATL-MR IncQuery-D Reactive-ATL Viatra3 M2M Too many to name! ATL ATL LinTra ATL LinTra ATL Viatra3 Validation OCL OCL OCL EVL (partial) M2T EGL Comparison Dark Green: Main focus areas of research Light Green: Potential interest / contributions of research (if time permits) Grey: Unlikely to be within research scope
  • 11. Proposed Solution – overview  Concurrent execution of model management tasks  Using Epsilon as implementation testbed  Offers DSLs for various tasks based on a common language  Hybrid (imperative/declarative) DSLs  Convenient for generalising concurrency concerns across tasks  Open-source Eclipse project (intend to merge our changes)  Minimising impact to existing codebase is a concern
  • 12. Epsilon at a glance http://eclipse.org/epsilon
  • 13. Proposed Solution – challenges  Concurrent modification  Dependencies pose a problem – require synchronisation or duplication  Parallelisation of imperative constructs  EOL potentially allows execution of any Java program!  Can limit scope to pure functions on collections  GPU acceleration requires a limited programming model  Low-level APIs  Only primitives and one-dimensional arrays as data  No / limited branching logic
  • 14. Preliminary Work  Focused on Epsilon Validation Language (EVL)  Read-only model (except fixes), so simplifies concurrency  Element (data) and constraint (task) parallel  Implemented and tested multiple parallel approaches  ThreadPoolExecutor, concurrent collections... (minimal synchronization)  Equivalence testing with original implementation  Results show promising speed-ups (~3x with 4 threads)  Tests seem to be passing so far
  • 15. EVL example – couples in movies context Couple { constraint twoDifferentPeople { guard: self.commonMovies.size() > 5 check { if (self.p1.name == self.p2.name) { return not self.p1.movies.includesAll(self.p2.movies); } return true; } message: "Couple contains the same person!" } }
  • 16. Future Work & Expected Contributions  Concurrent implementation of EVL, ECL, EPL, EGX  and likely ETL for comparing our results and approach to other works  Investigation into GPU acceleration of model management programs  Investigation into distributed parallelism of model management tasks  Combining parallel execution with laziness and incrementality
  • 17. Evaluation Plan  Two primary aspects:  Correctness – does our implementation behave as it should?  Performance – how much faster is our approach/implementation?  Testing on very large models and complex programs  Though finding them is proving to be a challenge!  Equivalence testing concurrent / non-concurrent implementations  Also comparing with other tools for consistency  Requires writing semantically identical scripts in other languages
  • 18. Current Status  Started in January  Mostly literature review  Finalising tests for Parallel EVL v1.0  More scripts and models which can be re-used for evaluating other tasks  Testing with lots of cores / threads on computing cluster  Looking into distributed processing (Spark, Hadoop, Kafka...)  Next tasks: Pattern-matching (EPL), Model-to-Text (EGX)  Intend to see how much of Parallel EVL approach can be re-used
  • 19. Summary  Current model management execution engines are inefficient  Mostly single-threaded, sequential  Laziness and incrementality are desirable but insufficient  Concurrency is hard!
  • 20. Questions? Thank you for listening! Contact: Sina Madani sm1748@york.ac.uk Thread 1 Thread 2 Thread N Results Concurrent Collections Merge Batch jobs ExecutorService Submit
  • 21. Proposed Solution – justification  Why parallelism?  Single-threaded CPU performance relatively stagnant  All general-purpose CPUs now multi-core (increasing thread counts)  Distributed / cloud computing resources more ubiquitous  Data-parallel approach (SIMD)  Scalability concerns are with # of model elements  Allows for partial distribution of data per processing unit  Minimises synchronisation / contention -> better performance  Suitable for stream processing (e.g. GPUs, various frameworks)  Avoids rule dependencies

Editor's Notes

  1. Models allow for appropriate level of abstraction Would prefer to perform as many operations at model-level as possible, as opposed to lower-level artifacts
  2. Parallel ATL Implicit (task-level) parallelism Parallelisation made possible by declarative, independent nature of ATL language constraints: Outputs of transformations to targets are immediate when a rule is matched – cannot be used as intermediate data OCL expressions cannot navigate the target model Single-valued properties are “final” Multi-valued properties can only be added to. OCL expressions in guard or bindings don’t have side-effects. Parallelisation of ATL rules and OCL expressions are completely independent/orthogonal i.e. can have one without the other. Two sub-problems: decomposition and synchronization. Need to deal with concurrent access of shared memory. Note: a “Match” is a set of expression evaluations over source model elements Decomposition Simply executing rule applications for each match has too much overhead for VLMs. Task Parallelism – MISD (https://en.wikipedia.org/wiki/MISD) Each task executes a different rule Works over full source and target models Each task can be further sub-divided into matching and rule application but these sub-tasks are not independent – can’t apply the rule without having something to apply it to! Synchronization is needed in/between: CRUD on target model elements (and properties of the elements) CRUD on trace link operations and target model Match and apply Jobs and trace links Apply phase only happens when all matchers have finished executing Other runtime data and/or engine implementation internals Further optimizations for reducing synchronization needs to be done on the framework (i.e. EMF) side. Speedup of 1.5 – 2.5 times Performance gains larger for smaller models?! Future work: static analysis of rule dependencies to prevent re-introducing locks on data access Parallel LinTra One idea is to use high-level MT language and use a lower-level language like Java for handling the distribution and concurrency based on Linda. Stream model elements from tuple space (could be distributed) Perform transformation Write to output tuple space All of this done using multiple threads concurrently Need to find a way to estimate workload on each thread Some threads could get computationally “light” rules Distribution of model elements across threads Focused on evolving models (in-place) Source and target conform to same metamodel Particularly applicable to migration E.g. reverse-engineering a system to create a model from it Extending LinTra to be in-place Uses XAP Elastic Caching Edition from Gigaspace Technologies Allows for multiple distributed tuple spaces that can hold (serializable) Java objects XAP internally deals with concurrency (transparent to the user) Multiple threads can access tuple space(s) Can query tuple space using SQL-like syntax Meta-modelled in Java Entities have unique identifiers Relationships established by storing target entity IDs in source entities Out-place MT means building target model from scratch using the transformation rules In-place MT means evolving the input (source) model to get to the output (target) model Recursive in-place will apply rules one-by-one to target model. It is stateful. Non-recursive makes a “leap” from input to output model, without considering intermediate steps (similar to out-place transformations) LinTra uses non-recursive in-place MT Out-place models may have dependencies between transformation rules LinTra uses Master-Slave pattern Master co-ordinates transformation and creates slaves Slaves run transformation rules on sub-models (partitions) of the source model, as if they were independent Dependencies can be retrieved when needed – see Blackboard paradigm Handling of relationships after CUD events Handling of rule conflicts - confluence (i.e. when multiple rules alter the same part of model – so order of execution matters) “Encapsulates all the concurrent mechanisms needed for parallel execution of model transformations” Achieved 2-3 times speed-up (average 2.57) on a 16-core system compared to ATL Experiment only on classical Class2Relational transformation In-place MUCH faster than out-place (speed-up of up to 955x compared to sequential ATL!) Future research: Use a higher-level concurrent language or framework on top of Java for implementing transformations Higher-order transformations automatically generating parallelisable code? Optimising number of threads and the work each thread does based on hardware configuration (perhaps at runtime) E.g. executing same transformation over subset of model (i.e. SIMD)? Could be good for distributed GPGPU Distributed ATL “Nice” properties of ATL: Locality Single assignment on target properties Non-recursive rule application (single match) Target model cannot be navigated Rules are not as entangled in ATL, so more amenable to parallelisation Each map worker runs the full transformation on a subset of model elements (“Local match-apply”) Intelligent assignment not considered – could increase data locality for further gains in performance but requires static analysis Upon completion, each map worker sends the Set<ModelElement> it created and tracing information to the reduce function. Trace information used to resolve exact binding to target elements “Global resolve” phase brings together the partial models and updates properties of unresolved bindings At the beginning of this phase, all target elements are created and local bindings are resolved Sometimes source and target elements may not be transformed in the same node during the mapping phase thus, trace links used to defer this to reduce (“Global resolve”) phase. Trace metamodel extended to include additional properties required for resolving bindings in the reduce phase. ATL VM on top of Hadoop Each node runs its own VM but handles either the map or reduce phase. Optimize number of “splits” (how many model elements per worker?) should be set to # of elements divided by number of workers – ideally a one-to-one mapping. XMI is not thread-safe and has to be fully loaded into memory Evaluation: Two nodes minimum to get same speed as sequential (non-distributed) 8 nodes results in 2.5 to 3 times speedup over sequential ATL Speedup of up to 6x with 8 nodes Speedup improves with model size Future work: Parallelise global resolve to reduce I/O bottlenecks Efficient load balancing using static analysis Pipelining transformations on MapReduce Efficient Model Partitioning Model transformations are not “flat” structures (which would be optimal for MapReduce) Computational complexity in pattern matching / exploring structure Data access is critical Inefficient distribution of data can lead to severe (I/O or network-bound) bottlenecks In a declarative relation transformation language like ATL, efficient distribution can be found using static analysis “Transformation footprints” used to compute dependencies Cost of computing efficient distribution for models with millions of elements can outweigh the benefits – need a fast heuristic! Computing full dependency graph and solving linear programming optimization problem would take too long Model is divided up into “splits”; equal to the number of machines Each machine has a set of elements assigned to it Each model element (per split) is assigned to one (and only one) set Need to balance making use of all machines whilst minimising dependencies For example, if all rules depend on a single element, then it wouldn’t be efficient to have the whole transformation assigned to one machine Want to minimize elements per machine and Maximize dependency overlap in each machine’s load. “Footprints” represent an abstract view of a rule application’s navigation Constructed from OCL guards and bindings AST recursively traversed to build dependency Solution uses a stream of model elements which are assigned by an algorithm to each machine Order of arrival can affect performance Uses a buffer to (partially) alleviate this High-priority elements (which can affect dependency graph) are assigned first Order of arrival not optimal, so can get lots of low-priorities in one lump Dependency graph and assignment happen on-the-fly Efficient partitioning depends almost entirely on quality of dependency graph approximation In future, can exploit meta-model / typical model topology to estimate dependencies Solution assumes nice properties of underlying framework Thread-safety (concurrent read/write) On-demand partial loading of models Fast look-up of elements (cached/index) Distributed Pattern Matching Models are usually labelled, directed graphs, so transformation can be done using graph rewriting Rules consist of LHS and RHS Rule application tries to find pattern in LHS and replace it with pattern in RHS This approach requires finding isomorphic subgraphs – an NP-complete problem i.e. given two graphs G and H, does G contain a similarly structured subgraph to H? Transformation-level parallelism applies the rewriting in parallel Inefficient if there are lots of dependencies Rule-based parallelism searches for the patterns in parallel Difficult to implement Hard to tell which rules are in conflict (i.e. affect each other’s output), as patterns are defined by metamodel elements Order of execution of rules which are not in metaconflict doesn’t matter Transformations executed in “Independence blocks” Uses heuristics to minimise conflicts Unclear how conflicts are actually resolved With rule-level parallelism, the best-case is O(1) and worst-case is O(n^k) where n is elements in host and k is elements in target (lots of math/algortihm)… Composed approach: “Master” co-ordinates execution of transformations “Primary Workers” are responsible for applying rewriting rules “Secondary Workers” compute a match for a rewriting rule using a pseudo-random function Master can have several Primary Workers Primary Worker can have several Secondary Workers but each secondary worker has one primary worker parent Transformation-level parallelism handled by Master with Primary Workers as clients Rule-level parallelism handled by Primary Workers with Secondary Workers as clients Master and Workers are different computers connected over network Communicate using a modified UDP Guaranteed delivery Preserves order IncQuery-D Uses incremental (graph) pattern matching by applying RETE algorithm https://en.wikipedia.org/wiki/Rete_algorithm Graph patterns represent conditions (or constraints) that have to be fulfilled by a part of the model space in order to execute some manipulation steps on the model. A model (i.e. part of the model space) satisfies a graph pattern, if the pattern can be matched to a subgraph of the model using a generalized graph pattern matching technique. Distributed, scalable incremental model querying Database technologies are not well-adapted to handle complex queries as needed in MDE Designed to scale-out memory intensive incremental queries Storage and indexing solutions are decoupled Can use various different persistence back-ends Model Access Adapter Provides a mechanism for uniquely identifying model elements in the entire distributed repositories Provides graph-like API to the user which translates user operations to the back-end query language; forwarding it to the underlying storage Provides a façade for propagating change notifications (in the models) to the underlying storage Distributed indexer Common queries like “MyType.allInstances()” are cached automatically The cache itself is distributed Adapted Rete algorithm for distributed environment Input, Worker and Productions nodes handle the processing Co-ordinator node used to keep Rete nodes updated and to start operations Uses acknowledgement messages as its termination protocol for retrieving query results in a consistent state Compared prototype implementation to state-of-the-art non-incremental distributed query engine Overhead of constructing Rete network makes it less efficient than non-incremental engine for smaller models Cost outweighs benefits for medium-size models Near-instantaneous query evaluation (after caching) even for models with well over 10 million elements
  3. Comparing methodologies / implementations using M2M Want to fulfil tasks which are (mostly) ignored by the literature All these approaches are orthogonal and could compliment each other well
  4. Give examples of concurrency issues?
  5. Explain the algorithm briefly: Context applies to element? Constraint applies to element (constraint guard)? Constraint is satisfied?
  6. Test on Java models e.g. “code smell” warnings for EVL ISBN validation for complex logic (in DBLP models) IMDB for simple logic