SlideShare a Scribd company logo
Distributed Model-to-Model
Transformation with ATL on MapReduce
Jordi CABOT
ICREA
Universitat Oberta de Catalunya
Amine BENELALLAM, Abel GOMEZ,
and Massimo TISI
AtlanMod team (Inria, Mines Nantes, Lina)
The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
Context
Model Transformation
Transformation spec {
S::Square →T1::Triangle
S::Circle → T1::Octagon
....
}
Source Models
1
2 5
4 63
Target Model
1
2 5
4 63
Consumes Produces
Consumes
Why Distributing Model
Transformations ?
>:(
Scalability issues in MTs
Complex Transformations
taking hours to run
Very Large Models (VLMs)
not fitting into a memory of
a single machine
● Frequent increase in scope between
releases
● +900 Meta-Classes & thousands of
properties
● Models go up to Gbs
Increasing complexity of data &
systems
Distributing Model Transformation
Consumes Produces
Consumes Produces
Distributed
Environment
Transformation
spec
Source Model
1
2
5
4
6
3
Target Model
1
2 5
4 63
Why not using GPL ?
Using a General Purpose Language (GPL) for distributed MT:
1. Required familiarity with concurrency theory
○ not common among MDE application developers
2. New class of errors w.r.t. sequential programming
○ e.g. linked to task synchronization and shared data access
3. Complex analysis for performance optimization
--MEETs-->
Meet ATL-MR
Case Study: Analysis of Data-Flow in
Java Programs (TTC13 [1])
[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
Case Study: Analysis of Data-Flow in
Java Programs
int fact (int a) {
int r = 1;
while (a>0) {
r *= a--;
}
return r;
}
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
(a) Java code (c) Data-Flow(b) Control-Flow
def use cfNext/dfNext
Atlanmod Transformation Language
(ATL)
module ControlFlow2DataFlow;
create OUT : DataFlow from IN : ControlFlow;
rule SimpleStatment {
from
s : ControlFlow!SimpleStmt (
not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) )
)
to
t : DataFlow!SimpleStmt (
txt <− s.txt ,
dfNext <− s.computeNextDataFlows ( )
)
}
[...]
Module
Rule
Input
pattern
Output
pattern
guard
binding
ATL helper
ATL Helper
helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) =
self.def ->collect(d | self.users(d)
->reject(fi | if fi = self then not fi.isInALoop else false endif )
->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self))))
->flatten();
helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!
FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean =
if input->exists(i | i = end) then true
else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v
| v = i) or forbidden ->exists(f| f = i)) in
if newInput ->isEmpty() then false
else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden)
endif
endif;
ATL Execution Semantic: Match
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method int fact(int a)
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
ATL Execution Semantic: Apply
phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
MapReduce
Log0
Record
Log1
Log2
map1
Log3
Log4
Log5
map2
Log6
Log7
Log8
map3
<+,1>
<+,1>
<*,1>
SPLIT1SPLIT2SPLIT3
<X,1>
<+,1>
<*,1>
<X,1>
<*,1>
<+,1>
shuffle/sort
<+,1>
<+,1>
<+,1>
<+,1>
<*,1>
<*,1>
<*,1>
<X,1>
red1
red2
<X,1>
<*,3>
<X,2>
<+,4>
Map phase Reduce phase
Why MapReduce for ATL?
● Well-suited for Write Once Read Many (WORM) data
● Two-phased execution model
Also MapReduce:
● Supports different types of inputs (XML, DB, Text)
● Handles machine failures, efficient communication, and performance issues
ATL & MapReduce
Alignment
Semantics Alignment
Reduce
read
traces
global
resolve
Map
read
model
subset
create
trace
properties
local
match/
apply
save
model
match apply map reduce
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce:
Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
dfNext
Control-Flow to Data-Flow in MapReduce:
Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
Control-Flow to Data-Flow in MapReduce:
Global Resolve
red1
red2
Extended Tracing Model
ATL-MR in Action
Hadoop Distributed File System (HDFS)
objectUID_5
objectUID_6
objectUID_7
objectUID_8
map2
objectUID_1
objectUID_2
objectUID_3
objectUID_4 map1
load transformation
data
<rule2,traceUID5>
<rule1,traceUID6>
<rule1,traceUID7>
<rule2,traceUID8>
<rule1,traceUID1>
<rule2,traceUID2>
<rule2,traceUID3>
<rule1,traceUID4>
shuffle/sort
<rule2,traceUID2>
<rule2,traceUID3>
<rule2,traceUID5>
<rule2,traceUID8>
<rule1,traceUID1>
<rule1,traceUID4>
<rule1,traceUID6>
<rule1,traceUID7> red1
red2
save traces and
partial models
LMA mode1
GR mode(2)
load traces and
partial models
save models
[1] LMA: Local Match/Apply
[2] GR: Global Resolve
[3] ATL-MR: https://github.com/atlanmod/ATL_MR
Evaluation
Experiment I: Speed-up Curve
● 5 models extracted from
automatically generated Java files:
○ similar size (~1500 LOCs)
○ sequential transformation ranges from
620s to 778s
● Run on identical set of machines
(m1.large) over Amazon Elastic
MapReduce (EMR)
○ 10 times for each number of nodes
○ 280 hours of computation
● Almost linear speed-up up to 8
nodes
○ ~3 times faster on 8 nodes
Experiment II: Size/Speed-Up Correlation
● 5 models extracted from automatically
generated Java files:
○ increasing size (13.500 to 105.000 LOCs)
○ sequential transformation ranges from 319s to
17 998s (~4h)
● Run on a cluster of 12 instances built on top of
OpenVC
○ 8 slaves
○ 4 machines orchestrating Hadoop/Hbase
● Almost-linear speed-up for large models
○ Up to 6X faster on 8 nodes
● Speed-up increases with model size
Challenges
Challenges In Distributing Model
Transformation
Fact II: Persistence
backends are not suited
for R/W concurrency
Rule applications might
not have the same
complexity
Unable to parallelize
the reduce phase
Unable to guarantee a balanced
workload, MapReduce default
scheduler is not enough
Fact I: Models might
densely interconnected &
unbalanced
NeoEMF an Extensible Persistence
Backend
● Lazy loading and unloading
○ enabling transformation of big
models
● Distributed storage and access
○ permitting the parallelization of the
reduce phase
● Compliant with MapReduce
● Fail-safe (no data loss)
Model
Manager
Persistence
Manager
Persistence
Backend
NeoEMF
/Map
EMF
/Graph
Model-based Tools
Caching
Strategy
Model Access API
Persistence
API
Backend API
Client
Code
/HBase
HBase ZooKeeperGraphDB MapDB
[1] NeoEMF: http://www.neoemf.com
Future Work
1. Optimization of load balancing
○ efficient distribution of the input model over map workers
2. Parallelization of the Global Resolve phase and the transformation of Very
Large Models
○ integrating ATL-MR with NeoEMF/HBase
Conclusion
● We align Rule-based Model Transformation with the MapReduce execution
model
○ We introduce an execution semantics of ATL on top of MapReduce
○ We experimentally show the good scalability of our solution
● For ATL users: Keep the same syntax and embrace the Cloud
● For MapReduce users: Model Transformation as yet another high-level
language for MapReduce
Check us out on Github
https://github.com/atlanmod/ATL_MR
Questions

More Related Content

What's hot

Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
jakehofman
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
Understanding Garbage Collection
Understanding Garbage CollectionUnderstanding Garbage Collection
Understanding Garbage Collection
Doug Hawkins
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
 
CRDTs and Redis
CRDTs and RedisCRDTs and Redis
CRDTs and Redis
Carlos Baquero
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
Barak Gitsis
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
HONGJOO LEE
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13
Naren P.R.
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
Aleksandr Kuboskin, CFA
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
Mila, Université de Montréal
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
HONGJOO LEE
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
Xiangrui Meng
 
Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014
Ir. Dr. R.Badlishah Ahmad
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
GIUSEPPE DI BERNARDO
 
Experiments & Experiences with Scilab in Undergraduate Education
Experiments & Experiences with Scilab in Undergraduate Education Experiments & Experiences with Scilab in Undergraduate Education
Experiments & Experiences with Scilab in Undergraduate Education
Naren P.R.
 

What's hot (20)

Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Understanding Garbage Collection
Understanding Garbage CollectionUnderstanding Garbage Collection
Understanding Garbage Collection
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
CRDTs and Redis
CRDTs and RedisCRDTs and Redis
CRDTs and Redis
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13Cape2013 scilab-workshop-19Oct13
Cape2013 scilab-workshop-19Oct13
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
Os5
Os5Os5
Os5
 
Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014Scilab-by-dr-gomez-june2014
Scilab-by-dr-gomez-june2014
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Experiments & Experiences with Scilab in Undergraduate Education
Experiments & Experiences with Scilab in Undergraduate Education Experiments & Experiences with Scilab in Undergraduate Education
Experiments & Experiences with Scilab in Undergraduate Education
 

Similar to SLE2015: Distributed ATL

Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
Sasha Goldshtein
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
Vasia Kalavri
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
Siddharth Mathur
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
Patrick Bos
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdf
wafawafa52
 
Pregel
PregelPregel
Pregel
Weiru Dai
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPC
Victor Eijkhout
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
Po-Ting Wu
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
Ortus Solutions, Corp
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Daniel S. Katz
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
inside-BigData.com
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
Sudhang Shankar
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Java 8
Java 8Java 8
Java 8
vilniusjug
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
asimkadav
 

Similar to SLE2015: Distributed ATL (20)

Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Making fitting in RooFit faster
Making fitting in RooFit fasterMaking fitting in RooFit faster
Making fitting in RooFit faster
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
FPGA_Logic.pdf
FPGA_Logic.pdfFPGA_Logic.pdf
FPGA_Logic.pdf
 
Pregel
PregelPregel
Pregel
 
Integrative Parallel Programming in HPC
Integrative Parallel Programming in HPCIntegrative Parallel Programming in HPC
Integrative Parallel Programming in HPC
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Java 8
Java 8Java 8
Java 8
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 

Recently uploaded

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Hivelance Technology
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 

Recently uploaded (20)

First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 

SLE2015: Distributed ATL

  • 1. Distributed Model-to-Model Transformation with ATL on MapReduce Jordi CABOT ICREA Universitat Oberta de Catalunya Amine BENELALLAM, Abel GOMEZ, and Massimo TISI AtlanMod team (Inria, Mines Nantes, Lina) The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
  • 3. Model Transformation Transformation spec { S::Square →T1::Triangle S::Circle → T1::Octagon .... } Source Models 1 2 5 4 63 Target Model 1 2 5 4 63 Consumes Produces Consumes
  • 5. Scalability issues in MTs Complex Transformations taking hours to run Very Large Models (VLMs) not fitting into a memory of a single machine
  • 6. ● Frequent increase in scope between releases ● +900 Meta-Classes & thousands of properties ● Models go up to Gbs Increasing complexity of data & systems
  • 7. Distributing Model Transformation Consumes Produces Consumes Produces Distributed Environment Transformation spec Source Model 1 2 5 4 6 3 Target Model 1 2 5 4 63
  • 8. Why not using GPL ? Using a General Purpose Language (GPL) for distributed MT: 1. Required familiarity with concurrency theory ○ not common among MDE application developers 2. New class of errors w.r.t. sequential programming ○ e.g. linked to task synchronization and shared data access 3. Complex analysis for performance optimization
  • 10. Case Study: Analysis of Data-Flow in Java Programs (TTC13 [1]) [1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
  • 11. Case Study: Analysis of Data-Flow in Java Programs int fact (int a) { int r = 1; while (a>0) { r *= a--; } return r; } int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r int fact(int a) int r = 1; while (a>0) r *= a--; return r; (a) Java code (c) Data-Flow(b) Control-Flow def use cfNext/dfNext
  • 12. Atlanmod Transformation Language (ATL) module ControlFlow2DataFlow; create OUT : DataFlow from IN : ControlFlow; rule SimpleStatment { from s : ControlFlow!SimpleStmt ( not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) ) ) to t : DataFlow!SimpleStmt ( txt <− s.txt , dfNext <− s.computeNextDataFlows ( ) ) } [...] Module Rule Input pattern Output pattern guard binding ATL helper
  • 13. ATL Helper helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) = self.def ->collect(d | self.users(d) ->reject(fi | if fi = self then not fi.isInALoop else false endif ) ->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self)))) ->flatten(); helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow! FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean = if input->exists(i | i = end) then true else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v | v = i) or forbidden ->exists(f| f = i)) in if newInput ->isEmpty() then false else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden) endif endif;
  • 14. ATL Execution Semantic: Match phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 15. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 16. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method int fact(int a) rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt
  • 17. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a)
  • 18. ATL Execution Semantic: Apply phase int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r;
  • 20. Why MapReduce for ATL? ● Well-suited for Write Once Read Many (WORM) data ● Two-phased execution model Also MapReduce: ● Supports different types of inputs (XML, DB, Text) ● Handles machine failures, efficient communication, and performance issues
  • 23. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 24. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 25. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2
  • 26. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2 dfNext
  • 27. Control-Flow to Data-Flow in MapReduce: Local Match/Apply int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; map1 map2 dfNext
  • 28. Control-Flow to Data-Flow in MapReduce: Global Resolve int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; red1 red2 dfNext
  • 29. Control-Flow to Data-Flow in MapReduce: Global Resolve int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; red1 red2
  • 30. int fact (int a) int r = 1; while (a>0) r *= a--; return r; a r rule:Method rule:Stmnt rule:Stmnt rule:Stmnt rule:Stmnt int fact(int a) int r = 1; while (a>0) r *= a--; return r; Control-Flow to Data-Flow in MapReduce: Global Resolve red1 red2
  • 32. ATL-MR in Action Hadoop Distributed File System (HDFS) objectUID_5 objectUID_6 objectUID_7 objectUID_8 map2 objectUID_1 objectUID_2 objectUID_3 objectUID_4 map1 load transformation data <rule2,traceUID5> <rule1,traceUID6> <rule1,traceUID7> <rule2,traceUID8> <rule1,traceUID1> <rule2,traceUID2> <rule2,traceUID3> <rule1,traceUID4> shuffle/sort <rule2,traceUID2> <rule2,traceUID3> <rule2,traceUID5> <rule2,traceUID8> <rule1,traceUID1> <rule1,traceUID4> <rule1,traceUID6> <rule1,traceUID7> red1 red2 save traces and partial models LMA mode1 GR mode(2) load traces and partial models save models [1] LMA: Local Match/Apply [2] GR: Global Resolve [3] ATL-MR: https://github.com/atlanmod/ATL_MR
  • 34. Experiment I: Speed-up Curve ● 5 models extracted from automatically generated Java files: ○ similar size (~1500 LOCs) ○ sequential transformation ranges from 620s to 778s ● Run on identical set of machines (m1.large) over Amazon Elastic MapReduce (EMR) ○ 10 times for each number of nodes ○ 280 hours of computation ● Almost linear speed-up up to 8 nodes ○ ~3 times faster on 8 nodes
  • 35. Experiment II: Size/Speed-Up Correlation ● 5 models extracted from automatically generated Java files: ○ increasing size (13.500 to 105.000 LOCs) ○ sequential transformation ranges from 319s to 17 998s (~4h) ● Run on a cluster of 12 instances built on top of OpenVC ○ 8 slaves ○ 4 machines orchestrating Hadoop/Hbase ● Almost-linear speed-up for large models ○ Up to 6X faster on 8 nodes ● Speed-up increases with model size
  • 37. Challenges In Distributing Model Transformation Fact II: Persistence backends are not suited for R/W concurrency Rule applications might not have the same complexity Unable to parallelize the reduce phase Unable to guarantee a balanced workload, MapReduce default scheduler is not enough Fact I: Models might densely interconnected & unbalanced
  • 38. NeoEMF an Extensible Persistence Backend ● Lazy loading and unloading ○ enabling transformation of big models ● Distributed storage and access ○ permitting the parallelization of the reduce phase ● Compliant with MapReduce ● Fail-safe (no data loss) Model Manager Persistence Manager Persistence Backend NeoEMF /Map EMF /Graph Model-based Tools Caching Strategy Model Access API Persistence API Backend API Client Code /HBase HBase ZooKeeperGraphDB MapDB [1] NeoEMF: http://www.neoemf.com
  • 39. Future Work 1. Optimization of load balancing ○ efficient distribution of the input model over map workers 2. Parallelization of the Global Resolve phase and the transformation of Very Large Models ○ integrating ATL-MR with NeoEMF/HBase
  • 40. Conclusion ● We align Rule-based Model Transformation with the MapReduce execution model ○ We introduce an execution semantics of ATL on top of MapReduce ○ We experimentally show the good scalability of our solution ● For ATL users: Keep the same syntax and embrace the Cloud ● For MapReduce users: Model Transformation as yet another high-level language for MapReduce
  • 41. Check us out on Github https://github.com/atlanmod/ATL_MR