This document discusses parallelizing model management programs to improve scalability for very large models. It proposes making Epsilon, an open-source model management tool, concurrent by executing validation, transformation, querying and other tasks across multiple threads. Previous work on parallelism focused mainly on model-to-model transformations, while this research aims to enable concurrent execution of different Epsilon languages. Preliminary work implemented a parallel validation language with promising speed-ups. Future work includes investigating GPU acceleration and distributed parallelism, while combining parallelism with incremental and lazy techniques.
Presentation for "MEMO-F403 Preparatory work for the master thesis" (ULB).
https://bitbucket.org/OPiMedia/efficient-parallel-abstract-interpreter-in-scala-preparatory
(ATS6-DEV08) Integrating Contur ELN with other systems using a RESTful APIBIOVIA
In order to enable easy integration between Contur ELN and other informatics systems a RESTful API has been developed. Data may be extracted from ELN experiments using GET calls, but external applications can also insert results directly into the ELN record. In particular the API can be used with Accelrys Enterprise Platform to create complex flows for resolving scientific problems. Such protocols may be launched from within the ELN client.
River Trail: A Path to Parallelism in JavaScriptRoberto Falconi
River Trail enables new web usages with positive impact on performances using high-level parallel patterns, bounds checked array accesses, automatic heap management and familiar JavaScript libraries.
Presentation for "MEMO-F403 Preparatory work for the master thesis" (ULB).
https://bitbucket.org/OPiMedia/efficient-parallel-abstract-interpreter-in-scala-preparatory
(ATS6-DEV08) Integrating Contur ELN with other systems using a RESTful APIBIOVIA
In order to enable easy integration between Contur ELN and other informatics systems a RESTful API has been developed. Data may be extracted from ELN experiments using GET calls, but external applications can also insert results directly into the ELN record. In particular the API can be used with Accelrys Enterprise Platform to create complex flows for resolving scientific problems. Such protocols may be launched from within the ELN client.
River Trail: A Path to Parallelism in JavaScriptRoberto Falconi
River Trail enables new web usages with positive impact on performances using high-level parallel patterns, bounds checked array accesses, automatic heap management and familiar JavaScript libraries.
Have you ever had to endure the pain of manually creating threads?
This talk introduces enhancements in Visual Studio and .NET that make developing high-throughput, asynchronous, and low-latency applications attainable.
IPL: An Integration Property Language for Multi-Model Cyber-Physical SystemsIvan Ruchkin
Our talk from the 22nd International Symposium on Formal Methods. Full paper: http://www.cs.cmu.edu/~iruchkin/docs/ruchkin18-ipl.pdf
Abstract: "Design and verification of modern systems requires diverse models, which often come from a variety of disciplines, and it is challenging to manage their heterogeneity -- especially in the case of cyber-physical systems. To check consistency between models, recent approaches map these models to flexible static abstractions, such as architectural views. This model integration approach, however, comes at a cost of reduced expressiveness because complex behaviors of the models are abstracted away. As a result, it may be impossible to automatically verify important behavioral properties across multiple models, leaving systems vulnerable to subtle bugs. This paper introduces the Integration Property Language (IPL) that improves integration expressiveness using modular verification of properties that depend on detailed behavioral semantics while retaining the ability for static system-wide reasoning. We prove that the verification algorithm is sound and analyze its termination conditions. Furthermore, we perform a case study on a mobile robot to demonstrate IPL is practically useful and evaluate its performance. "
NEO4EMF, a Neo4j-based model repository and persistence framework allowing on-demand loading, storage, and unloading of large-scale EMF models.
Check us at : https://neo4emf.com
Fork us at : https://github.com/neo4emf/Neo4EMF
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2Elizabeth Steiner
Daniel Hettema will show you the ins and outs of Innoslate and discuss in-depth systems engineering concepts. You will learn how to customize schema extensions to make Innoslate even more relevant to your organization. He will discuss and demonstrate best practices for the Monte Carlo Simulator and the Discrete Event Simulator.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
All new computers have multicore processors. To exploit this hardware parallelism for improved
perf
ormance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both
lock
-
free and deterministic. The standard
forall primitive for parallel execution of for
-
loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variable
s (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper pre
sents an overview of a Prototype Library
that implements this POP
-
PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
Patching the gap in collaborating on modelsÁbel Hegedüs
EclipseCon France 2017 talk on Model Patches, a new component of the EMF DiffMerge project.
Details: https://www.eclipsecon.org/france2017/session/patching-gap-collaborating-models
Have you ever had to endure the pain of manually creating threads?
This talk introduces enhancements in Visual Studio and .NET that make developing high-throughput, asynchronous, and low-latency applications attainable.
IPL: An Integration Property Language for Multi-Model Cyber-Physical SystemsIvan Ruchkin
Our talk from the 22nd International Symposium on Formal Methods. Full paper: http://www.cs.cmu.edu/~iruchkin/docs/ruchkin18-ipl.pdf
Abstract: "Design and verification of modern systems requires diverse models, which often come from a variety of disciplines, and it is challenging to manage their heterogeneity -- especially in the case of cyber-physical systems. To check consistency between models, recent approaches map these models to flexible static abstractions, such as architectural views. This model integration approach, however, comes at a cost of reduced expressiveness because complex behaviors of the models are abstracted away. As a result, it may be impossible to automatically verify important behavioral properties across multiple models, leaving systems vulnerable to subtle bugs. This paper introduces the Integration Property Language (IPL) that improves integration expressiveness using modular verification of properties that depend on detailed behavioral semantics while retaining the ability for static system-wide reasoning. We prove that the verification algorithm is sound and analyze its termination conditions. Furthermore, we perform a case study on a mobile robot to demonstrate IPL is practically useful and evaluate its performance. "
NEO4EMF, a Neo4j-based model repository and persistence framework allowing on-demand loading, storage, and unloading of large-scale EMF models.
Check us at : https://neo4emf.com
Fork us at : https://github.com/neo4emf/Neo4EMF
Model-Based Systems Engineering Tool How To Use Innoslate Pt. 2Elizabeth Steiner
Daniel Hettema will show you the ins and outs of Innoslate and discuss in-depth systems engineering concepts. You will learn how to customize schema extensions to make Innoslate even more relevant to your organization. He will discuss and demonstrate best practices for the Monte Carlo Simulator and the Discrete Event Simulator.
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches.
All new computers have multicore processors. To exploit this hardware parallelism for improved
performance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both lock-free and deterministic. The standard
forall primitive for parallel execution of for-loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variables (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library
that implements this POP-PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
All new computers have multicore processors. To exploit this hardware parallelism for improved
perf
ormance, the predominant approach today is multithreading using shared variables and locks. This
approach has potential data races that can create a nondeterministic program. This paper presents a
promising new approach to parallel programming that is both
lock
-
free and deterministic. The standard
forall primitive for parallel execution of for
-
loop iterations is extended into a more highly structured
primitive called a Parallel Operation (POP). Each parallel process created by a POP may read shared
variable
s (or shared collections) freely. Shared collections modified by a POP must be selected from a
special set of predefined Parallel Access Collections (PAC). Each PAC has several Write Modes that
govern parallel updates in a deterministic way. This paper pre
sents an overview of a Prototype Library
that implements this POP
-
PAC approach for the C++ language, including performance results for two
benchmark parallel programs.
Patching the gap in collaborating on modelsÁbel Hegedüs
EclipseCon France 2017 talk on Model Patches, a new component of the EMF DiffMerge project.
Details: https://www.eclipsecon.org/france2017/session/patching-gap-collaborating-models
Please contact me to download this pres.A comprehensive presentation on the field of Parallel Computing.It's applications are only growing exponentially day by days.A useful seminar covering basics,its classification and implementation thoroughly.
Visit www.ameyawaghmare.wordpress.com for more info
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : NotesSubhajit Sahu
Highlighted notes of article while studying Concurrent Data Structures, CSE:
Is Multicore Hardware For General-Purpose Parallel Processing Broken?
By Uzi Vishkin
Communications of the ACM, April 2014, Vol. 57 No. 4, Pages 35-39
10.1145/2580945
Patterns and Anti-patterns
How to learn design patterns?
Categories of GoF patterns
The Fundamental theorem of software engineering
Real-world problems and how design patterns solve them with GoF structural patterns
This Chapter provides a Background Review of Parallel and Distributed Computing. a focus is made on the concept of SISD, SIMD, MISD, MIMD.
It also gives an understanding of the notion of HPC (High-Performance Computing). A survey is done using some case studies to show why parallelism is needed. The chapter discusses the Amdahl's Law and the limitations. Gustafson's Law is also discussed.
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
With the advent of multi-cores every processor has built-in parallel computational power and that can only be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core architectures. In this paper we have presented a simple, highly efficient and scalable implementation of a common matrix multiplication algorithm using a newly developed parallel programming model SPC3 PM for general purpose multi-core processors. From our study it is found that matrix multiplication done concurrently on multi-cores using SPC3 PM requires much less execution time than that required using the present standard parallel programming environments like OpenMP. Our approach also shows scalability, better and uniform speedup and better utilization of available cores than that the algorithm written using standard OpenMP or similar parallel programming tools. We have tested our approach for up to 24 cores with different matrices size varying from 100 x 100 to 10000 x 10000 elements. And for all these tests our proposed approach has shown much improved performance and scalability
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Parallel Execution of Model Management Programs (STAF 2017)
1. Parallel Execution of Model
Management Programs
Sina Madani
supported through CROSSMINER EC project
supervised by Dr. Dimitris Kolovos and Prof. Richard Paige
2. Introduction
Motivation for the project
Brief literature review
Proposed Solution
Work so far
Expected contributions
Plan for Evaluation
3. Problem and Motivation
Model-Driven Engineering (MDE) used in industrial contexts
e.g. automotive, civil engineering, reverse-engineered code
Current tools not built to handle Very Large Models (VLMs)
“Very Large” meaning millions of elements
Scalability is a widely cited concern with MDE
Execution of programs with VLMs as input is very slow
Almost all mainstream MDE tools are single-threaded in execution
5. Related Works – overview
Increasing interest in improving performance of MDE tools
Main approaches:
Incrementality – only computing the delta of changes (caching)
Laziness – delaying computation until it can no longer be avoided
Reactivity – event-driven computations (incremental and lazy)
Parallelism – splitting the computation across multiple threads
Distribution – using multiple computers to perform the computation
7. Lazy and Reactive approaches
Lazy ATL (Tisi et al., 2011)
Navigation (source) and Generation (target)
Lazy OCL (Tisi et al., 2015)
Includes lazy collections based on iterators
Reactive ATL (Pérez et al., 2015)
Automated model transformations
VIATRA-3 (Bergmann et al., 2015)
Event-driven incremental model transformations
Suitable for real-time applications
8. Parallel and Distributed
A number of works on parallel graph transformations
LinTra (Burgueño et al., 2013 / 2015)
Concurrent model transformation by streaming from tuple space
Parallel ATL (Tisi et al., 2013)
Task-parallel approach, 2.2x speedup with 4 cores / threads
Distributed ATL based on MapReduce (Benelallam et al., 2015)
Data-parallel approach, ~3x speedup with 8 nodes
Efficient Model Partitioning (Benelallam et al., 2016)
9. Un(der)-explored areas
Most work focused on model-to-model transformations
Distributed execution on multi-threaded machines
Combining parallelism with incrementality/laziness
Using GPUs for accelerating execution
10. Incremental Lazy Parallel Distributed GPGPU Reactive
Querying Too many to
name!
ATL ATL ATL-MR
IncQuery-D
Reactive-ATL
Viatra3
M2M Too many to
name!
ATL ATL
LinTra
ATL
LinTra
ATL
Viatra3
Validation OCL OCL OCL
EVL (partial)
M2T EGL
Comparison
Dark Green: Main focus areas of research
Light Green: Potential interest / contributions of research (if time permits)
Grey: Unlikely to be within research scope
11. Proposed Solution – overview
Concurrent execution of model management tasks
Using Epsilon as implementation testbed
Offers DSLs for various tasks based on a common language
Hybrid (imperative/declarative) DSLs
Convenient for generalising concurrency concerns across tasks
Open-source Eclipse project (intend to merge our changes)
Minimising impact to existing codebase is a concern
13. Proposed Solution – challenges
Concurrent modification
Dependencies pose a problem – require synchronisation or duplication
Parallelisation of imperative constructs
EOL potentially allows execution of any Java program!
Can limit scope to pure functions on collections
GPU acceleration requires a limited programming model
Low-level APIs
Only primitives and one-dimensional arrays as data
No / limited branching logic
14. Preliminary Work
Focused on Epsilon Validation Language (EVL)
Read-only model (except fixes), so simplifies concurrency
Element (data) and constraint (task) parallel
Implemented and tested multiple parallel approaches
ThreadPoolExecutor, concurrent collections... (minimal synchronization)
Equivalence testing with original implementation
Results show promising speed-ups (~3x with 4 threads)
Tests seem to be passing so far
15. EVL example – couples in movies
context Couple {
constraint twoDifferentPeople {
guard: self.commonMovies.size() > 5
check {
if (self.p1.name == self.p2.name) {
return not self.p1.movies.includesAll(self.p2.movies);
}
return true;
}
message: "Couple contains the same person!"
}
}
16. Future Work & Expected Contributions
Concurrent implementation of EVL, ECL, EPL, EGX
and likely ETL for comparing our results and approach to other works
Investigation into GPU acceleration of model management programs
Investigation into distributed parallelism of model management tasks
Combining parallel execution with laziness and incrementality
17. Evaluation Plan
Two primary aspects:
Correctness – does our implementation behave as it should?
Performance – how much faster is our approach/implementation?
Testing on very large models and complex programs
Though finding them is proving to be a challenge!
Equivalence testing concurrent / non-concurrent implementations
Also comparing with other tools for consistency
Requires writing semantically identical scripts in other languages
18. Current Status
Started in January
Mostly literature review
Finalising tests for Parallel EVL v1.0
More scripts and models which can be re-used for evaluating other tasks
Testing with lots of cores / threads on computing cluster
Looking into distributed processing (Spark, Hadoop, Kafka...)
Next tasks: Pattern-matching (EPL), Model-to-Text (EGX)
Intend to see how much of Parallel EVL approach can be re-used
19. Summary
Current model management execution engines are inefficient
Mostly single-threaded, sequential
Laziness and incrementality are desirable but insufficient
Concurrency is hard!
20. Questions?
Thank you for listening!
Contact:
Sina Madani
sm1748@york.ac.uk
Thread 1 Thread 2 Thread N
Results
Concurrent
Collections
Merge
Batch
jobs
ExecutorService
Submit
21. Proposed Solution – justification
Why parallelism?
Single-threaded CPU performance relatively stagnant
All general-purpose CPUs now multi-core (increasing thread counts)
Distributed / cloud computing resources more ubiquitous
Data-parallel approach (SIMD)
Scalability concerns are with # of model elements
Allows for partial distribution of data per processing unit
Minimises synchronisation / contention -> better performance
Suitable for stream processing (e.g. GPUs, various frameworks)
Avoids rule dependencies
Editor's Notes
Models allow for appropriate level of abstraction
Would prefer to perform as many operations at model-level as possible, as opposed to lower-level artifacts
Parallel ATL
Implicit (task-level) parallelism
Parallelisation made possible by declarative, independent nature of ATL language constraints:
Outputs of transformations to targets are immediate when a rule is matched – cannot be used as intermediate data
OCL expressions cannot navigate the target model
Single-valued properties are “final”
Multi-valued properties can only be added to.
OCL expressions in guard or bindings don’t have side-effects.
Parallelisation of ATL rules and OCL expressions are completely independent/orthogonal
i.e. can have one without the other.
Two sub-problems: decomposition and synchronization.
Need to deal with concurrent access of shared memory.
Note: a “Match” is a set of expression evaluations over source model elements
Decomposition
Simply executing rule applications for each match has too much overhead for VLMs.
Task Parallelism – MISD (https://en.wikipedia.org/wiki/MISD)
Each task executes a different rule
Works over full source and target models
Each task can be further sub-divided into matching and rule application
but these sub-tasks are not independent – can’t apply the rule without having something to apply it to!
Synchronization is needed in/between:
CRUD on target model elements (and properties of the elements)
CRUD on trace link operations and target model
Match and apply
Jobs and trace links
Apply phase only happens when all matchers have finished executing
Other runtime data and/or engine implementation internals
Further optimizations for reducing synchronization needs to be done on the framework (i.e. EMF) side.
Speedup of 1.5 – 2.5 times
Performance gains larger for smaller models?!
Future work: static analysis of rule dependencies to prevent re-introducing locks on data access
Parallel LinTra
One idea is to use high-level MT language and use a lower-level language like Java for handling the distribution and concurrency based on Linda.
Stream model elements from tuple space (could be distributed)
Perform transformation
Write to output tuple space
All of this done using multiple threads concurrently
Need to find a way to estimate workload on each thread
Some threads could get computationally “light” rules
Distribution of model elements across threads
Focused on evolving models (in-place)
Source and target conform to same metamodel
Particularly applicable to migration
E.g. reverse-engineering a system to create a model from it
Extending LinTra to be in-place
Uses XAP Elastic Caching Edition from Gigaspace Technologies
Allows for multiple distributed tuple spaces that can hold (serializable) Java objects
XAP internally deals with concurrency (transparent to the user)
Multiple threads can access tuple space(s)
Can query tuple space using SQL-like syntax
Meta-modelled in Java
Entities have unique identifiers
Relationships established by storing target entity IDs in source entities
Out-place MT means building target model from scratch using the transformation rules
In-place MT means evolving the input (source) model to get to the output (target) model
Recursive in-place will apply rules one-by-one to target model. It is stateful.
Non-recursive makes a “leap” from input to output model, without considering intermediate steps (similar to out-place transformations)
LinTra uses non-recursive in-place MT
Out-place models may have dependencies between transformation rules
LinTra uses Master-Slave pattern
Master co-ordinates transformation and creates slaves
Slaves run transformation rules on sub-models (partitions) of the source model, as if they were independent
Dependencies can be retrieved when needed – see Blackboard paradigm
Handling of relationships after CUD events
Handling of rule conflicts - confluence (i.e. when multiple rules alter the same part of model – so order of execution matters)
“Encapsulates all the concurrent mechanisms needed for parallel execution of model transformations”
Achieved 2-3 times speed-up (average 2.57) on a 16-core system compared to ATL
Experiment only on classical Class2Relational transformation
In-place MUCH faster than out-place (speed-up of up to 955x compared to sequential ATL!)
Future research:
Use a higher-level concurrent language or framework on top of Java for implementing transformations
Higher-order transformations automatically generating parallelisable code?
Optimising number of threads and the work each thread does based on hardware configuration (perhaps at runtime)
E.g. executing same transformation over subset of model (i.e. SIMD)?
Could be good for distributed GPGPU
Distributed ATL
“Nice” properties of ATL:
Locality
Single assignment on target properties
Non-recursive rule application (single match)
Target model cannot be navigated
Rules are not as entangled in ATL, so more amenable to parallelisation
Each map worker runs the full transformation on a subset of model elements (“Local match-apply”)
Intelligent assignment not considered – could increase data locality for further gains in performance
but requires static analysis
Upon completion, each map worker sends the Set<ModelElement> it created and tracing information to the reduce function.
Trace information used to resolve exact binding to target elements
“Global resolve” phase brings together the partial models and updates properties of unresolved bindings
At the beginning of this phase, all target elements are created and local bindings are resolved
Sometimes source and target elements may not be transformed in the same node during the mapping phase
thus, trace links used to defer this to reduce (“Global resolve”) phase.
Trace metamodel extended to include additional properties required for resolving bindings in the reduce phase.
ATL VM on top of Hadoop
Each node runs its own VM but handles either the map or reduce phase.
Optimize number of “splits” (how many model elements per worker?) should be set to # of elements divided by number of workers – ideally a one-to-one mapping.
XMI is not thread-safe and has to be fully loaded into memory
Evaluation:
Two nodes minimum to get same speed as sequential (non-distributed)
8 nodes results in 2.5 to 3 times speedup over sequential ATL
Speedup of up to 6x with 8 nodes
Speedup improves with model size
Future work:
Parallelise global resolve to reduce I/O bottlenecks
Efficient load balancing using static analysis
Pipelining transformations on MapReduce
Efficient Model Partitioning
Model transformations are not “flat” structures (which would be optimal for MapReduce)
Computational complexity in pattern matching / exploring structure
Data access is critical
Inefficient distribution of data can lead to severe (I/O or network-bound) bottlenecks
In a declarative relation transformation language like ATL, efficient distribution can be found using static analysis
“Transformation footprints” used to compute dependencies
Cost of computing efficient distribution for models with millions of elements can outweigh the benefits – need a fast heuristic!
Computing full dependency graph and solving linear programming optimization problem would take too long
Model is divided up into “splits”; equal to the number of machines
Each machine has a set of elements assigned to it
Each model element (per split) is assigned to one (and only one) set
Need to balance making use of all machines whilst minimising dependencies
For example, if all rules depend on a single element, then it wouldn’t be efficient to have the whole transformation assigned to one machine
Want to minimize elements per machine and
Maximize dependency overlap in each machine’s load.
“Footprints” represent an abstract view of a rule application’s navigation
Constructed from OCL guards and bindings
AST recursively traversed to build dependency
Solution uses a stream of model elements which are assigned by an algorithm to each machine
Order of arrival can affect performance
Uses a buffer to (partially) alleviate this
High-priority elements (which can affect dependency graph) are assigned first
Order of arrival not optimal, so can get lots of low-priorities in one lump
Dependency graph and assignment happen on-the-fly
Efficient partitioning depends almost entirely on quality of dependency graph approximation
In future, can exploit meta-model / typical model topology to estimate dependencies
Solution assumes nice properties of underlying framework
Thread-safety (concurrent read/write)
On-demand partial loading of models
Fast look-up of elements (cached/index)
Distributed Pattern Matching
Models are usually labelled, directed graphs, so transformation can be done using graph rewriting
Rules consist of LHS and RHS
Rule application tries to find pattern in LHS and replace it with pattern in RHS
This approach requires finding isomorphic subgraphs – an NP-complete problem
i.e. given two graphs G and H, does G contain a similarly structured subgraph to H?
Transformation-level parallelism applies the rewriting in parallel
Inefficient if there are lots of dependencies
Rule-based parallelism searches for the patterns in parallel
Difficult to implement
Hard to tell which rules are in conflict (i.e. affect each other’s output), as patterns are defined by metamodel elements
Order of execution of rules which are not in metaconflict doesn’t matter
Transformations executed in “Independence blocks”
Uses heuristics to minimise conflicts
Unclear how conflicts are actually resolved
With rule-level parallelism, the best-case is O(1) and worst-case is O(n^k) where n is elements in host and k is elements in target
(lots of math/algortihm)…
Composed approach:
“Master” co-ordinates execution of transformations
“Primary Workers” are responsible for applying rewriting rules
“Secondary Workers” compute a match for a rewriting rule using a pseudo-random function
Master can have several Primary Workers
Primary Worker can have several Secondary Workers
but each secondary worker has one primary worker parent
Transformation-level parallelism handled by Master
with Primary Workers as clients
Rule-level parallelism handled by Primary Workers
with Secondary Workers as clients
Master and Workers are different computers connected over network
Communicate using a modified UDP
Guaranteed delivery
Preserves order
IncQuery-D
Uses incremental (graph) pattern matching by applying RETE algorithm
https://en.wikipedia.org/wiki/Rete_algorithm
Graph patterns represent conditions (or constraints) that have to be fulfilled by a part of the model space in order to execute some manipulation steps on the model. A model (i.e. part of the model space) satisfies a graph pattern, if the pattern can be matched to a subgraph of the model using a generalized graph pattern matching technique.
Distributed, scalable incremental model querying
Database technologies are not well-adapted to handle complex queries as needed in MDE
Designed to scale-out memory intensive incremental queries
Storage and indexing solutions are decoupled
Can use various different persistence back-ends
Model Access Adapter
Provides a mechanism for uniquely identifying model elements in the entire distributed repositories
Provides graph-like API to the user which translates user operations to the back-end query language; forwarding it to the underlying storage
Provides a façade for propagating change notifications (in the models) to the underlying storage
Distributed indexer
Common queries like “MyType.allInstances()” are cached automatically
The cache itself is distributed
Adapted Rete algorithm for distributed environment
Input, Worker and Productions nodes handle the processing
Co-ordinator node used to keep Rete nodes updated and to start operations
Uses acknowledgement messages as its termination protocol for retrieving query results in a consistent state
Compared prototype implementation to state-of-the-art non-incremental distributed query engine
Overhead of constructing Rete network makes it less efficient than non-incremental engine for smaller models
Cost outweighs benefits for medium-size models
Near-instantaneous query evaluation (after caching) even for models with well over 10 million elements
Comparing methodologies / implementations using M2M
Want to fulfil tasks which are (mostly) ignored by the literature
All these approaches are orthogonal and could compliment each other well
Give examples of concurrency issues?
Explain the algorithm briefly:
Context applies to element?
Constraint applies to element (constraint guard)?
Constraint is satisfied?
Test on Java models
e.g. “code smell” warnings for EVL
ISBN validation for complex logic (in DBLP models)
IMDB for simple logic