Efficient Distributed Model Transformation Partitioning

Efficient Model Partitioning for
Distributed Model Transformations
SLE’16, 1 Nov. 2016, Amsterdam, Netherlands
Amine Benelallam
Massimo Tisi
AtlanMod team
Nantes, France
Jesús Sánchez Cuadrado
Juan de Lara
Universidad Autónoma
de Madrid, Spain
Jordi Cabot
ICREA
Open University
of Catalonia, Spain
1

2
e
hgf
a
dcb
a
d
c
b
e
h
g
f
Distributed (MOF-Compliant) model access and persistence API
a
dcb g
e
hgf d
Coordination
Task node (worker) Data node Concurrent Read/Write
Data distribution Parallel local transformation Parallel global compositionSplit1Split2
a
b c d
e
f g h
a
b c d
e
f g h
g
a
a
g
e
d
e
d
System
Assumps.
● On-demand loading, to
ensure that only needed
elements are loaded
● concurrent read/write to the
persistence backend
● fast look-up of already
loaded elements by using
caching and/or indexing
mechanisms
A.Benelallam et.al.: Distributed model-to-model transformation with ATL on MapReduce. In Proceedings of the 2015th
ACM SIGPLAN Int. Conf. on SLE

3
What makes it different than for other distributed applications?
Model Partitioning for Distributed MTs

4
I need an
example !!
Class2Relational

Atlanmod Transformation Language (ATL)
module Class2Relational;
create OUT : Relational from IN : Class ;
rule Class2Table {
from
c : Class ! Class ( not c.isAbstract )
to
out : Relational ! Table (
col <− Sequence { key } −>union ( c . attr−>select( e | not
e.multiValued ) ) −>union ( c.assoc−>select ( e | not e.mvalued )
) ,
keys <− Sequence { key } −>union ( c . assoc
−>select ( e | not e.mvalued ) )
),
key : Relational ! Column (
name <− c.name+’objectId ’ ,
type <− thisModule.getObjectIdType
)
}
[ … ]
Module
Rule
Input
pattern
Output
pattern
guard
ATL helper
binding

Running example
7
Model elmt. Dependencies
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute

Partitioning: Scenario I
8
Distributed (MOF-Compliant) model access API
Task nodesInput model
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1
a1
c1
c2
t1
att3
att2

9
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1 t1
c2
a1
c1 att3
att2
p1
att1
a1
c1
c2
t1
att3
att2

10
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
p1
att1
a1
c1
t1
att3
att2
c2
att1
a1
c1
t1
att3
att2
c2
8 + 7 = 15
p1
att1
a1
c1
c2
t1
att3
att2

Partitioning: Scenario II
11
p1
att1
a1
c1
t1
c2
p1 {p1, c1, a1, c2, t1}
c1 {c1, a1, att1, t1}
a1 {a1, c2}
att1 {att1, t1}
c2 {c2, att2, att3, t1}
att2 {att2, t1}
att3 {att3, t1}
t1 {t1}
c2
t1
att3
att2
6 + 4 = 10 (↑%33)
p1
att1
a1
c1
c2
t1
att3
att2

# 1 Dense Structure
● Even though models are structured:
● Their density is often high & irregular
● The structure of the computation is
only known @runtime
# 2 Variating complexity
● Graph computations is often
data-driven and dictated by the
structure of the graph
● Irregular computation structure =>
Irregular computation cost
12
Simple Complex
Highly-dense

Model-data partitioning
13
● Access patterns tend to have poor
data locality
● High data access to computation
ratio
● Guarantee a balanced computational load
● Ensure a good data locality
Difficult to

x
Proposal
14
● Existing graph-data partitioning approaches
are not suitable, they either:
a. Assume that the dependency graph exists
b. Reason only on the vertex-connectivity
● We Propose a two steps approach:

I- Footprint
extraction
15
● Extract access patterns as sequences of steps
● Resulting footprints have the form:
[sourceType][. ‘( ‘?[propertyName][ ‘ ) ‘ ?∗]?]+
● Parse OCL expressions in guards, bindings, and
Helpers
● Visit OCL’s AST and perform one of the
following unary|binary operations:
○ ⊲ : chain the naviagtionCallExp
○ ⊕ : decouple the LHS and RHS into two
separate footprints (e.g. conditional
expression)
○ Ⓧ : if RHS is accessible from LHS then ⊲
otherwise ⊕ (e.g. select)
● Organize footprints by sourceType

16
:OpCallExp
(flatten)
:IteratorExp
(collect)
:IteratorExp
(reject)
:IteratorExp
(select)
:NavCallExp
(assoc)
:VarExp
(cc)
:AttrCallExp
(multiValued)
:AttrCallExp
(isAbstract)
:NavCallExp
(classes)
:VarExp
(p)
:VarExp
(c)
:VarExp
(a)
Footprint
extraction
I- Footprint
extraction
p.classes
-> reject (c | c.isAbstract)
-> collect (cc | cc.assoc
-> select (a | a.mValued))
-> flatten();

17
:OpCallExp
(flatten)
:IteratorExp
(collect)
:IteratorExp
(reject)
:IteratorExp
(select)
:NavCallExp
(assoc)
:VarExp
(cc)
:AttrCallExp
(multiValued)
:AttrCallExp
(isAbstract)
:NavCallExp
(classes)
:VarExp
(p)
:VarExp
(c)
:VarExp
(a)
Footprint
extraction
I- Footprint
extraction
p.classes
-> flatten();
FP= {Package.classes}
FP= { Package(p) }
⊲

18
:OpCallExp
(flatten)
:IteratorExp
(collect)
:IteratorExp
(reject)
:IteratorExp
(select)
:NavCallExp
(assoc)
:VarExp
(cc)
:AttrCallExp
(multiValued)
:AttrCallExp
(isAbstract)
:NavCallExp
(classes)
:VarExp
(p)
:VarExp
(c)
:VarExp
(a)
Footprint
extraction
I- Footprint
extraction
p.classes
-> flatten();
FP= { Package.classes}
FP= {Class(c)}
⊲
Ⓧ
FP= { Package(p) }
⊲

19
:OpCallExp
(flatten)
:IteratorExp
(collect)
:IteratorExp
(reject)
:IteratorExp
(select)
:NavCallExp
(assoc)
:VarExp
(cc)
:AttrCallExp
(multiValued)
:AttrCallExp
(isAbstract)
:NavCallExp
(classes)
:VarExp
(p)
:VarExp
(c)
:VarExp
(a)
Footprint
extraction
I- Footprint
extraction
p.classes
-> select (a |
a.mValued))
-> flatten();
FP= { Package.classes}
FP= {Class(c)}
⊲
Ⓧ
FP= { Package(p) }
⊲
FP= { Class.ass }
FP= { Package.classes.ass }
FP= { Class(cc) } FP= { Attribute(a) }
FP= {Class.ass}
⊲ ⊲
Ⓧ

20
:OpCallExp
(flatten)
:IteratorExp
(collect)
:IteratorExp
(reject)
:IteratorExp
(select)
:NavCallExp
(assoc)
:VarExp
(cc)
:AttrCallExp
(multiValued)
:AttrCallExp
(isAbstract)
:NavCallExp
(classes)
:VarExp
(p)
:VarExp
(c)
:VarExp
(a)
FP= { Package.classes} FP= { Class.ass }
FP= { Package(p) } FP= {Class(c)} FP= { Class(cc) } FP= { Attribute(a) }
FP= {Class.ass}
⊲ ⊲ ⊲ ⊲
Ⓧ
Ⓧ Ⓧ
Footprint
extraction
I- Footprint
extraction
p.classes
-> flatten();

I- Resulting
Footprints
21
Rules Footprints
Package2Schema Package.classes.assoc
Package.types
Class2Table
Class.assoc
Class.attr
DataType.allInstances
Attribute2Column Attribute.type
MVAttribute2Column Attribute.type
Attribute.owner
Association2Column DataType.allInstances
MVAssociation2Column Association.type
Types Footprints
Package Package.classes.assoc
Package.types
Class
Class.assoc
Class.attr
Attribute Attribute.type
Attribute.owner
Association Association.type
Type Ø

II- Model
partitioning
22
p1 : Package
● Greedy & bi-objective algorithm
a. Maximizing data locality
b. Balancing the machine load
● On-live approximation of dependency graph in the
form of <machine-id,nextStep>
● A buffer to delay the processing of elements not
participating to the construction of the approximate
dependency graph
● Instant assignment based on a score function

23
II- Model
partitioning
p1 : Package

24
p1att3a1 c1 c2 t1att1att2
Input stream elmt. Per Machine Dependencies
c1
c2
t1
att2
a1
Types Footprints
Package.types
Class
Class.assoc, Class.attr
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

25
p1
att3a1 c1 c2 t1att1att2
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

26
p1
att3a1 c1 c2 att1att2
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
t1

27
p1
att3a1 c1 c2
t1
att1
att2
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>; }
t1 <2,Ø>
att2
a1
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

28
p1
att3a1 c1
c2
t1
att1
att2
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 <2,Ø>
att2 {<1,Ø>}
att3 {<1,Ø>}
a1
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

29
p1
att3
a1
c1
c2
t1
att1
att2
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

30
p1
att3
a1
c1
c2
t1
att1
att2
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package.types
Class
Attribute.owner
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package

31
Input stream
Buffer
p1
att1
c2 att2
c1
att3
a1
t1
c1
a1
c2
t1
7 + 5 = 12 (↑%20)
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
elmt. Per Machine Dependencies
c1 {<1,assoc>; <2,Ø>}
c2 {<1,assoc>}
t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}
att2 {<1,Ø>}
att3 {<1,Ø>}
a1 {<2,Ø>}
Types Footprints
Package.types
Class
Attribute.owner
Type Ø

Evaluation
32
Eclipse Modeling Framework
NeoEMF/HBase
HDFS
XML Metadata Interchange
ATL-MR
HadoopTasknodes
ATL-MR Master ATL-MR Slaves
HadoopData
nodes
1. Distribute input
2. Monitor
3. Return output

34
Limitations
● The performance of our approach may be reduced
when
a. having elements with big amounts of dependencies or
sometimes exceeding the average size of the split (e.g.
allInstances() operation)
b. having approximate graph containing false positive
dependencies (e.g. select() or reject())
c. having an unfavourable order of streamed elements.

Conclusion
● We presented our solution for efficient partitioning of distributed MTs as a
greedy algorithm.
○ We introduced an algorithm for the footprints extraction
○ We presented our greedy algorithm for stream model partitioning
○ We experimentally show the scalability of our solution (up to 16% in average)
● In future work we plan to:
○ Extending our work to balanced edge partitioning and conducting a more exhaustive study
on the impact of the model density on the partitioning strategy.
○ Improving the distribution of the intermediate transformation data (tracing information)
35

Questions
Check us out on github
https://github.com/atlanmod/ATL_MR
36

Efficient Distributed Model Transformation Partitioning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Efficient Distributed Model Transformation Partitioning

Similar to Efficient Distributed Model Transformation Partitioning (20)

Recently uploaded

Recently uploaded (20)

Efficient Distributed Model Transformation Partitioning