Fase 2015 - Map-based Transparent Persistence for Very Large Models

MAP-BASED
TRANSPARENT
PERSISTENCE
FOR VERY LARGE
MODELS lin
Abel Gómez
Massimo Tisi
Gerson Sunyé
and Jordi Cabot
1

OUTLINE
▌The landscape in MDE
▌Motivation: running example and
current persistence approaches
▌Towards a simple EMF-based
persistence layer
▌NEOEMF/MAP: A transparent
persistence layer for EMF models
▌Our experimental evaluation in a
nutshell
▌Conclusions and future work
© ATLANMOD - atlanmod-contact@mines-nantes.fr
2

INTRODUCTION
Why another
persistence
solution?
3

THE LANDSCAPE IN MDE
▌ Models and code generation are the center of the
software-engineering processes
▌ Modeling tools are built around modeling frameworks (EMF
has become the de facto standard)
▌ The technologies at the core of modeling frameworks were
designed to support simple modeling activities
▌ Since its publication, the XMI standard has been the
preferred format for storing and sharing models and
metamodels
▌ Clear limits arise when current technologies are applied to
VLMs: XML is not the right technology for VLMs (verbosity,
costly serialization/deserialization…)
▌ Some solutions exist, but problems in managing memory
and persisting data are still under-studied in MDE
4

MOTIVATION
Running example
Current
persistence
approaches
5

RUNING EXAMPLE
Java Metamodel
(excerpt)
nsURI: ’http://java’
6

RUNING EXAMPLE
Java Metamodel
(excerpt)
nsURI: ’http://java’
Instance
7

MOTIVATION
▌ Within a modeling ecosystem, all tools that
need to access or manipulate models have to
pass through a single model management
interface
▌ In some of these ecosystems (e.g. EMF) the
model management interface is
automatically generated from the
metamodel
8

THE GENERATED MODEL
MANAGEMENT INTERFACE
▌ // Creation of objects
▌ Package p1 := Factory.createPackage();
▌ ClassDeclaration c1 := Factory.createClassDeclaration();
▌ BodyDeclaration b1 := Factory.createBodyDeclaration();
▌ BodyDeclaration b2 := Factory.createBodyDeclaration();
▌ Modifier m1 := Factory.createModifier();
▌ Modifier m2 := Factory.createModifier();
▌ // Initialization of attributes
▌ p1.setName("package1");
▌ c1.setName("class1");
▌ b1.setName("bodyDecl1");
▌ b2.setName("bodyDecl2");
▌ m1.setVisibility(VisibilityKind.PUBLIC);
▌ m2.setVisibility(VisibilityKind.PUBLIC);
▌ // Initialization of references
▌ p1.getOwnedElements().add(c1);
▌ c1.getBodyDeclarations().add(b1);
▌ c1.getBodyDeclarations().add(b2);
▌ b1.setModifier(m1);
▌ b2.setModifier(m2)
9

MOTIVATION
▌ Without any specific memory-management
solution, the model would need to be fully
contained in memory for any access or
modification
▌ Models that exceed the main memory
would cause a significant performance drop
or the application crash
10

STANDARD TECHNOLOGIES FOR
PERSISTING MODELS IN EMF
▌XML-based (XMI)
│ Pros: Readability, fast for small models
│ Cons: Needs to load/keep the whole
model in memory.
▌Connected Data Objects (CDO)
│ Pros: on-demand loading, transactions,
versioning, notifications
│ Cons: Only the relational mapping is
regularly maintained, does not scale well
with VLMs
11

NEW TRENDS IN PERSISTING
MODELS IN EMF
▌ Morsa (document-oriented)
│ On-demand loading, incremental updates, fully compatible with the
EMF API
│ Requires its own query language to get good performance
▌ MongoEMF (document-oriented)
│ Uses the standard EMF API
│ It behaves different than the standard back-ends
▌ EMF fragments
│ Uses the standard proxy mechanism to partition models in small
chunks
│ Requires modifications on the metamodels to get the benefits of
partitions
▌ NeoEMF/Graph, a.k.a. Neo4EMF (graph-based)
│ Models are a set of highly interconnected elements → graphs are
the most natural way to represent them
│ The generated API only performs one-step navigations → only a
significant gain in performance is obtained when using native
queries on the underlying persistence back-end
12

MOTIVATION
▌ We need a transparent persistence layer
able to automatically persist, load and
unload model elements with no changes to
the application code
13

NEOEMF/MAP
DESIGN
GOALS
Towards a simple
EMF-based
persistence layer
14

MODEL-PERSISTENCE LAYER
▌NEOEMF/MAP must…
… be an exact replacement
… use a replaceable underlying engine
… allow different types of caching
… be memory friendly
… provide on-demand load capabilities
… free unused memory
… outperform current persistence layers
using the standard API
Interoperability
requirements
Performance
requirements
15

MODEL-PERSISTENCE LAYER
Model
Manager
Persistence
Manager
Persistence
Backend
NeoEMF
/Map
EMF
/Graph
CDO
XMI
Serialization
Model-based Tools
XMI File GraphDB MapDB
Caching
Strategy
RelationalDB
Model Access API
Persistence API
Backend API
Client
Code
16

NEOEMF/MAP
A TRANSPARENT
PERSISTENCE
LAYER FOR
EMF MODELS
Memory Management
Map-based data model
Model operations as
map operations
17

MEMORY MANAGEMENT
▌ Decoupling dependencies among objects
by assigning a unique identifier to all
model objects allows:
▌ Lightweight on-demand loading
│ Each live model object has a lightweight
delegate object that is in charge of on-
demand loading the element data and
keeping track of the element’s state
▌ Efficient garbage collection in the JRE
│ No hard Java references are kept among
model objects. Any model object not directly
referenced by the application will be
deallocated
18

MAP-BASED DATA MODEL
▌ The unique identifier allows flattening the
graph structure into a set of key-value
mappings
▌ Operations on hash-maps have a
constant cost
▌ Three different (hash-)maps are used to
store models’ information:
│ Property map: keeps all objects’ data in a
centralized place
│ Type map: tracks how objects interact with
the meta-level (e.g. instance of)
│ Containment map: defines the models’
structure in terms of containment references
19

▌Property map
│ Key: OID + EstructuralFeature
│ Value: data
Key Value
{ ‘c1’, ‘name’ } ‘class1’
{ ‘c1’, ‘bodyDeclarations’ } { ‘b1’, ‘b2’ }
20

▌Type map
│ Key: OID
│ Value: nsURI + EObject’s EClass
Key Value
‘c1’ 〈 nsUri=‘http://java’, class=‘ClassDeclaration’ 〉
21

▌Containment
map
│ Key: OID
│ Value: Container’s OID +
EStructuralFeature (from parent to child).
Key Value
‘c1’ 〈 container=‘p1’, featureName=‘ownedElements’ 〉
22

MODEL OPERATIONS AS
MAP OPERATIONS
LOOKUPS INSERTS
METHOD MIN. MAX. MIN. MAX
OPERATIONS ON OBJECTS
getType 1 1 0 0
getContainer 1 1 0 0
getContainerFeature 1 1 0 0
OPERATIONS ON PROPERTIES
get* 1 1 0 0
set* 0 3 1 3
isSet* 1 1 0 0
unset* 1 1 0 1
OPERATIONS ON MUTI-VALUED FEATURES
add 1 3 1 3
remove 1 2 1 2
clear 0 0 1 1
size 1 1 0 0
23

EXPERIMENTAL
EVALUATION
Conditions of the
experiments
Results
Summary
24

EXPERIMENTAL EVALUATION
▌ Based on our joint experience with industrial
partners:
│ We obtained three models from OSS using
reverse engineering…
│ … that resemble models from real-world
scenarios
│ We defined a set of queries (GraBaTs’09 and
industry-like)
│ Only the standard EMF API is used → Queries
are backend-agnostic
│ Three heap sizes: 8GB, 512MB and 256MB
# MODEL SIZE IN XMI ELEMENTS
1 org.eclipse.gmt.modisco.java 19.3MB 80.665
2 org.eclipse.jdt.core 420.6MB 1.557.007
3 org.eclipse.jdt.* 984.7MB 3.609.454
25

EXPERIMENTAL EVALUATION
▌ Selected back-ends:
│ NEOEMF/MAP (MapDB)
│ NEOEMP/GRAPH (Neo4j embedded)
│ CDO (H2 embedded)
▌ Discarded back-ends:
│ MongoEMF → does not strictly comply with
the standard EMF behavior
│ EMF-fragments → requires manual
modifications in the source models or
metamodels
│ Morsa → only a small subset of the
experiments ran successfully
Configuration details: Intel Core i7 3740QM (2.70GHz), 16 GB of DDR3 SDRAM (800MHz), Samsung SM841
SATA3 SSD Hard Disk (6GB/s), Windows 7 Enterprise 64, JRE 1.7.0_40-b43, Eclipse 4.4.0, EMF 2.10.1,
NeoEMF/Map uses MapDB 0.9.10, NeoEMF/Graph uses Neo4j 1.9.2, CDO 4.3.1 runs on top of H2 1.3.168
26

EXPERIMENT I
9 s
161 s
412 s
41 s
1161 s
3767 s
12 s 120 s
301 s
Model 1 Model 2 Model 3
Import model from XMI (8GB)
NeoEMF/Map NeoEMF/Graph CDO
27

EXPERIMENT II
4 s
35 s
79 s
3 s 25 s
62 s
16 s
201 s
708 s
14 s
133 s
309 s
Model traversal 8GB (incl. loading & unloading)
XMI NeoEMF/Map NeoEMF/Graph CDO28

EXPERIMENT II
4 s 3 s
42 s
366 s
15 s
235 s
763 s
13 s
550 s 548 s
Model traversal 512MB (incl. loading & unloading)
XMI NeoEMF/Map NeoEMF/Graph CDO29

EXPERIMENT III
0 s 0 s 0 s0 s
2 s
19 s
0 s 0 s
2 s
Model queries that do not traverse the model 8GB
30

EXPERIMENT IV
1 s 24 s
61 s
11 s
188 s
717 s
9 s
48 s
367 s
GraBaTs’09 8GB
NeoEMF/Map
NeoEMF/Graph
CDO
2 s 36 s
101 s
17 s
359 s
1328 s
9 s
131 s
294 s
Unused Methods 8GB
NeoEMF/Map
NeoEMF/Graph
CDO
31

EXPERIMENT V
1 s 24 s
62 s
11 s
191 s
677 s
9 s
118 s
296 s
Model modification and saving 8GB
32

EXPERIMENT V
1 s
160 s
472 s
11 s
224 s
9 s
723 s
Model modification and saving 256MB
33

SUMMARY
▌ NeoEMF/Map performs better than any other
solution when using the standard API
▌ NeoEMF/Map presents import times in the
same order of magnitude than CDO, but it is
about a 33% slower for the largest model →
NeoEMF/Map is affected by the overhead
produced by modifications on big lists
(>100.000 elements) that grow monotonically
(caching is needed)
▌ The simple data model with low-cost
operations implemented by NeoEMF/Map
contrasts with the more complex data model
implemented by NeoEMF/Graph (consistently
slower by a factor between 7 and 9)
34

SUMMARY
▌ Traversal of a very large model is much
faster (up to 9×) by using the
NeoEMF/Map
▌ If load and unload times are considered
NeoEMF/Map also outperforms XMI
▌ The fast model-traversal ability of
NeoEMF/Map is exploited by the pattern
followed by most of the queries in the
modernization domain
▌ Queries that traverse the model to apply
and persist changes perform significantly
better on NeoEMF/Map (5× faster on big
models, 9× on small models).
35

CONCLUSIONS
Conclusions
Future work
36

CONCLUSIONS
▌ Map-based persistence layer to handle VLMs
▌ Comparison against relational-based and
graph-based alternatives
▌ EMF as the implementation technology
▌ We used queries from some of our industrial
partners in the model-driven modernization
domain as experiments
▌ Typical model-access APIs, with fine-grained
methods with one-step-navigation queries, do
not benefit from complex relational or graph-
based data structures.
▌ Low-level data structures, like hash-tables,
with low and constant access times provide
better results
37

FUTURE WORK
▌ Caching strategies:
│ Element unloading (which element is not
needed anymore?)
│ Element prefetching (which element will be
needed in future?)
▌ Benefits of other backends depending on
the specific application scenario:
│ Graph-based persistence solutions when
some of our requirements can be dropped
│ Bypassing the model access API by
translating the queries to high performance
native graph-database queries may provide
great benefits
38

MAP-BASED
TRANSPARENT
PERSISTENCE
FOR VERY LARGE
MODELS lin
Abel Gómez
Massimo Tisi
Gerson Sunyé
and Jordi Cabot

Fase 2015 - Map-based Transparent Persistence for Very Large Models

Recommended

Recommended

More Related Content

Similar to Fase 2015 - Map-based Transparent Persistence for Very Large Models

Similar to Fase 2015 - Map-based Transparent Persistence for Very Large Models (20)

More from abgolla

More from abgolla (6)

Recently uploaded

Recently uploaded (20)

Fase 2015 - Map-based Transparent Persistence for Very Large Models