Distributed model-to-model transformations can be computationally expensive for large models or complex transformations. The authors present an approach to distribute ATL model transformations using MapReduce. Local match and apply phases are performed in parallel by mappers. Global resolve is done by reducers to combine local results. An evaluation shows near-linear speedup on Amazon EMR for models up to 100,000 lines of code. Challenges include load balancing, persistence for concurrent read/write, and parallelizing all transformation phases.
This slide first introduces the sequential pattern mining problem and also presents some required definitions in order to understand GSP algorithm. At then end there is a brief introduction of GSP algorithm and some practical constraints which it supports.
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa
Short presentation of the paper "Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization"
https://arxiv.org/abs/1707.06468
The slides from my parallel programming talk at LCA 2011. It is an overview of several languages that offer parallel programming paradigms with a strong bias towards functional programmin
This slide first introduces the sequential pattern mining problem and also presents some required definitions in order to understand GSP algorithm. At then end there is a brief introduction of GSP algorithm and some practical constraints which it supports.
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Fabian Pedregosa
Short presentation of the paper "Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization"
https://arxiv.org/abs/1707.06468
The slides from my parallel programming talk at LCA 2011. It is an overview of several languages that offer parallel programming paradigms with a strong bias towards functional programmin
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
Online learning, Vowpal Wabbit and Hadoop
Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Scaling out logistic regression with SparkBarak Gitsis
Large scale multinomial logistic regression with Spark. Contains animated gifs. Analysis of LBFGS. Real world spark configurations. SimilarWeb categorization algorithm
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
This talk covers the current parallel capabilities in MATLAB*. Learn about its parallel language and distributed and tall arrays. Interact with GPUs both on the desktop and in the cluster. Combine this information into an interesting algorithmic framework for data analysis and simulation.
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
Online learning, Vowpal Wabbit and Hadoop
Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Scaling out logistic regression with SparkBarak Gitsis
Large scale multinomial logistic regression with Spark. Contains animated gifs. Analysis of LBFGS. Real world spark configurations. SimilarWeb categorization algorithm
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
45 min talk about collecting home network performance measures, analyzing and forecasting time series data, and building anomaly detection system.
In this talk, we will go through the whole process of data mining and knowledge discovery. Firstly we write a script to run speed test periodically and log the metric. Then we parse the log data and convert them into a time series and visualize the data for a certain period.
Next we conduct some data analysis; finding trends, forecasting, and detecting anomalous data. There will be several statistic or deep learning techniques used for the analysis; ARIMA (Autoregressive Integrated Moving Average), LSTM (Long Short Term Memory).
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
This talk covers the current parallel capabilities in MATLAB*. Learn about its parallel language and distributed and tall arrays. Interact with GPUs both on the desktop and in the cluster. Combine this information into an interesting algorithmic framework for data analysis and simulation.
Presentation from DevWeek 2014 on task and data parallelism. This session explains the TPL APIs and then covers various scenarios for extracting concurrency, reducing synchronization, putting thresholds on parallelization, and other topics.
This is the "Deep Dive" talk given at the first Apache Flink Meetup Stockholm. The talk describes three components of the Apache Flink Internals: (a) job life-cycle, (b) the batch optimizer and (c) native iterations.
Presentation given on Monday 10 September at the ROOT Users' Workshop 2018 in Sarajevo. Progress update on the Automated Parallel Computation of Collaborative Statistical Models project, a collaboration between the Netherlands eScience Center and Nikhef.
We present an update on our recent efforts to further parallelize RooFit. We have performed extensive benchmarks and identified at least three bottlenecks that will benefit from parallelization. To tackle these and possible future bottlenecks, we designed a parallelization layer that allows us to parallelize existing classes with minimal effort, but with high performance and retaining as much of the existing class's interface as possible. The high-level parallelization model is a task-stealing approach. The implementation is currently based on the bi-directional memory mapped pipe (BidirMMapPipe), but could in the future be replaced by other modes of communication between processes.
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems.
Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second.
In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.
In this deck from the GPU Technology Conference, Thorsten Kurth from Lawrence Berkeley National Laboratory and Josh Romero from NVIDIA present: Exascale Deep Learning for Climate Analytics.
"We'll discuss how we scaled the training of a single deep learning model to 27,360 V100 GPUs (4,560 nodes) on the OLCF Summit HPC System using the high-productivity TensorFlow framework. We discuss how the neural network was tweaked to achieve good performance on the NVIDIA Volta GPUs with Tensor Cores and what further optimizations were necessary to provide excellent scalability, including data input pipeline and communication optimizations, as well as gradient boosting for SGD-type solvers. Scalable deep learning becomes more and more important as datasets and deep learning models grow and become more complicated. This talk is targeted at deep learning practitioners who are interested in learning what optimizations are necessary for training their models efficiently at massive scale."
Watch the video: https://wp.me/p3RLHQ-kgT
Learn more: https://ml4sci.lbl.gov/home
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
Structured Streaming is a new API in Spark 2.0 that simplifies the end to end development of continuous applications. One such continuous application is online model updates: Online models are incrementally updated with new data and can be continuously queried while being updated. As a result, they can be fast to train and leverage new data faster than offline algorithms. In this talk, we give a brief introduction the area of online learning and describe how online model updates can be built using structured streaming APIs. The end result is a robust pipeline for updating models that is scalable, fast and fault tolerant.
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...asimkadav
Machine learning methods, such as SVM and neural net- works, often improve their accuracy by using models with more parameters trained on large numbers of examples. Building such models on a single machine is often impracti- cal because of the large amount of computation required.
We introduce MALT, a machine learning library that inte- grates with existing machine learning software and provides data parallel machine learning. MALT provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model up- dates. MALT allows machine learning developers to specify the dataflow and apply communication and representation optimizations. Through its general-purpose API, MALT can be used to provide data-parallelism to existing ML appli- cations written in C++ and Lua and based on SVM, ma- trix factorization and neural networks. In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
1. Distributed Model-to-Model
Transformation with ATL on MapReduce
Jordi CABOT
ICREA
Universitat Oberta de Catalunya
Amine BENELALLAM, Abel GOMEZ,
and Massimo TISI
AtlanMod team (Inria, Mines Nantes, Lina)
The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
5. Scalability issues in MTs
Complex Transformations
taking hours to run
Very Large Models (VLMs)
not fitting into a memory of
a single machine
6. ● Frequent increase in scope between
releases
● +900 Meta-Classes & thousands of
properties
● Models go up to Gbs
Increasing complexity of data &
systems
8. Why not using GPL ?
Using a General Purpose Language (GPL) for distributed MT:
1. Required familiarity with concurrency theory
○ not common among MDE application developers
2. New class of errors w.r.t. sequential programming
○ e.g. linked to task synchronization and shared data access
3. Complex analysis for performance optimization
10. Case Study: Analysis of Data-Flow in
Java Programs (TTC13 [1])
[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
11. Case Study: Analysis of Data-Flow in
Java Programs
int fact (int a) {
int r = 1;
while (a>0) {
r *= a--;
}
return r;
}
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
(a) Java code (c) Data-Flow(b) Control-Flow
def use cfNext/dfNext
12. Atlanmod Transformation Language
(ATL)
module ControlFlow2DataFlow;
create OUT : DataFlow from IN : ControlFlow;
rule SimpleStatment {
from
s : ControlFlow!SimpleStmt (
not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) )
)
to
t : DataFlow!SimpleStmt (
txt <− s.txt ,
dfNext <− s.computeNextDataFlows ( )
)
}
[...]
Module
Rule
Input
pattern
Output
pattern
guard
binding
ATL helper
13. ATL Helper
helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) =
self.def ->collect(d | self.users(d)
->reject(fi | if fi = self then not fi.isInALoop else false endif )
->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self))))
->flatten();
helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!
FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean =
if input->exists(i | i = end) then true
else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v
| v = i) or forbidden ->exists(f| f = i)) in
if newInput ->isEmpty() then false
else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden)
endif
endif;
14. ATL Execution Semantic: Match
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
15. ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
16. ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method int fact(int a)
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
17. ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
18. ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
20. Why MapReduce for ATL?
● Well-suited for Write Once Read Many (WORM) data
● Two-phased execution model
Also MapReduce:
● Supports different types of inputs (XML, DB, Text)
● Handles machine failures, efficient communication, and performance issues
23. Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
24. Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
25. Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
26. Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
27. Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
28. Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
dfNext
29. Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
30. int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
Control-Flow to Data-Flow in MapReduce:
Global Resolve
red1
red2
34. Experiment I: Speed-up Curve
● 5 models extracted from
automatically generated Java files:
○ similar size (~1500 LOCs)
○ sequential transformation ranges from
620s to 778s
● Run on identical set of machines
(m1.large) over Amazon Elastic
MapReduce (EMR)
○ 10 times for each number of nodes
○ 280 hours of computation
● Almost linear speed-up up to 8
nodes
○ ~3 times faster on 8 nodes
35. Experiment II: Size/Speed-Up Correlation
● 5 models extracted from automatically
generated Java files:
○ increasing size (13.500 to 105.000 LOCs)
○ sequential transformation ranges from 319s to
17 998s (~4h)
● Run on a cluster of 12 instances built on top of
OpenVC
○ 8 slaves
○ 4 machines orchestrating Hadoop/Hbase
● Almost-linear speed-up for large models
○ Up to 6X faster on 8 nodes
● Speed-up increases with model size
37. Challenges In Distributing Model
Transformation
Fact II: Persistence
backends are not suited
for R/W concurrency
Rule applications might
not have the same
complexity
Unable to parallelize
the reduce phase
Unable to guarantee a balanced
workload, MapReduce default
scheduler is not enough
Fact I: Models might
densely interconnected &
unbalanced
38. NeoEMF an Extensible Persistence
Backend
● Lazy loading and unloading
○ enabling transformation of big
models
● Distributed storage and access
○ permitting the parallelization of the
reduce phase
● Compliant with MapReduce
● Fail-safe (no data loss)
Model
Manager
Persistence
Manager
Persistence
Backend
NeoEMF
/Map
EMF
/Graph
Model-based Tools
Caching
Strategy
Model Access API
Persistence
API
Backend API
Client
Code
/HBase
HBase ZooKeeperGraphDB MapDB
[1] NeoEMF: http://www.neoemf.com
39. Future Work
1. Optimization of load balancing
○ efficient distribution of the input model over map workers
2. Parallelization of the Global Resolve phase and the transformation of Very
Large Models
○ integrating ATL-MR with NeoEMF/HBase
40. Conclusion
● We align Rule-based Model Transformation with the MapReduce execution
model
○ We introduce an execution semantics of ATL on top of MapReduce
○ We experimentally show the good scalability of our solution
● For ATL users: Keep the same syntax and embrace the Cloud
● For MapReduce users: Model Transformation as yet another high-level
language for MapReduce
41. Check us out on Github
https://github.com/atlanmod/ATL_MR