The document describes a two-step approach for partitioning models to enable efficient distributed model transformations. The first step extracts access patterns from transformation rules as sequences of model element footprints. The second step uses these footprints to partition the input model elements across worker nodes in a way that maximizes data locality while balancing computational load. The partitioning algorithm uses an online approximation of the model's dependency graph and buffering to assign elements to machines.
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Efficient Distributed Model Transformation Partitioning
1. Efficient Model Partitioning for
Distributed Model Transformations
SLE’16, 1 Nov. 2016, Amsterdam, Netherlands
Amine Benelallam
Massimo Tisi
AtlanMod team
Nantes, France
Jesús Sánchez Cuadrado
Juan de Lara
Universidad Autónoma
de Madrid, Spain
Jordi Cabot
ICREA
Open University
of Catalonia, Spain
1
2. 2
e
hgf
a
dcb
a
d
c
b
e
h
g
f
Distributed (MOF-Compliant) model access and persistence API
a
dcb g
e
hgf d
Coordination
Task node (worker) Data node Concurrent Read/Write
Data distribution Parallel local transformation Parallel global compositionSplit1Split2
a
b c d
e
f g h
a
b c d
e
f g h
g
a
a
g
e
d
e
d
System
Assumps.
● On-demand loading, to
ensure that only needed
elements are loaded
● concurrent read/write to the
persistence backend
● fast look-up of already
loaded elements by using
caching and/or indexing
mechanisms
A.Benelallam et.al.: Distributed model-to-model transformation with ATL on MapReduce. In Proceedings of the 2015th
ACM SIGPLAN Int. Conf. on SLE
3. 3
What makes it different than for other distributed applications?
Model Partitioning for Distributed MTs
5. Atlanmod Transformation Language (ATL)
module Class2Relational;
create OUT : Relational from IN : Class ;
rule Class2Table {
from
c : Class ! Class ( not c.isAbstract )
to
out : Relational ! Table (
col <− Sequence { key } −>union ( c . attr−>select( e | not
e.multiValued ) ) −>union ( c.assoc−>select ( e | not e.mvalued )
) ,
keys <− Sequence { key } −>union ( c . assoc
−>select ( e | not e.mvalued ) )
),
key : Relational ! Column (
name <− c.name+’objectId ’ ,
type <− thisModule.getObjectIdType
)
}
[ … ]
Module
Rule
Input
pattern
Output
pattern
guard
ATL helper
binding
12. # 1 Dense Structure
● Even though models are structured:
● Their density is often high & irregular
● The structure of the computation is
only known @runtime
# 2 Variating complexity
● Graph computations is often
data-driven and dictated by the
structure of the graph
● Irregular computation structure =>
Irregular computation cost
12
Simple Complex
Highly-dense
13. Model-data partitioning
13
● Access patterns tend to have poor
data locality
● High data access to computation
ratio
● Guarantee a balanced computational load
● Ensure a good data locality
Difficult to
14. x
Proposal
14
● Existing graph-data partitioning approaches
are not suitable, they either:
a. Assume that the dependency graph exists
b. Reason only on the vertex-connectivity
● We Propose a two steps approach:
15. I- Footprint
extraction
15
● Extract access patterns as sequences of steps
● Resulting footprints have the form:
[sourceType][. ‘( ‘?[propertyName][ ‘ ) ‘ ?∗]?]+
● Parse OCL expressions in guards, bindings, and
Helpers
● Visit OCL’s AST and perform one of the
following unary|binary operations:
○ ⊲ : chain the naviagtionCallExp
○ ⊕ : decouple the LHS and RHS into two
separate footprints (e.g. conditional
expression)
○ Ⓧ : if RHS is accessible from LHS then ⊲
otherwise ⊕ (e.g. select)
● Organize footprints by sourceType
22. II- Model
partitioning
22
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
● Greedy & bi-objective algorithm
a. Maximizing data locality
b. Balancing the machine load
● On-live approximation of dependency graph in the
form of <machine-id,nextStep>
● A buffer to delay the processing of elements not
participating to the construction of the approximate
dependency graph
● Instant assignment based on a score function
23. 23
II- Model
partitioning
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
24. 24
p1att3a1 c1 c2 t1att1att2
Input stream elmt. Per Machine Dependencies
c1
c2
t1
att2
a1
Types Footprints
Package Package.classes.assoc
Package.types
Class
Class.assoc, Class.attr
DataType.allInstances
Attribute Attribute.type
Attribute.owner
Association Association.type
DataType.allInstances
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
25. 25
p1
att3a1 c1 c2 t1att1att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package Package.classes.assoc
Package.types
Class
Class.assoc, Class.attr
DataType.allInstances
Attribute Attribute.type
Attribute.owner
Association Association.type
DataType.allInstances
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
26. 26
p1
att3a1 c1 c2 att1att2
Input stream elmt. Per Machine Dependencies
c1 {<1,assoc>; }
c2 {<1,assoc>; }
t1
att2
a1
Types Footprints
Package Package.classes.assoc
Package.types
Class
Class.assoc, Class.attr
DataType.allInstances
Attribute Attribute.type
Attribute.owner
Association Association.type
DataType.allInstances
Type Ø
Buffer
II- Model
partitioning
● Parameters
● avgSize = 4
● var = 2
● buffCap = 2
p1 : Package
c1 : Classc2 : Class t1 : Type
att1 : Attributea1 : Assoc
att2 : Attributeatt3 : Attribute
t1
34. 34
Limitations
● The performance of our approach may be reduced
when
a. having elements with big amounts of dependencies or
sometimes exceeding the average size of the split (e.g.
allInstances() operation)
b. having approximate graph containing false positive
dependencies (e.g. select() or reject())
c. having an unfavourable order of streamed elements.
35. Conclusion
● We presented our solution for efficient partitioning of distributed MTs as a
greedy algorithm.
○ We introduced an algorithm for the footprints extraction
○ We presented our greedy algorithm for stream model partitioning
○ We experimentally show the scalability of our solution (up to 16% in average)
● In future work we plan to:
○ Extending our work to balanced edge partitioning and conducting a more exhaustive study
on the impact of the model density on the partitioning strategy.
○ Improving the distribution of the intermediate transformation data (tracing information)
35