Budapest University of Technology and Economics
Department of Measurement and Information Systems
Budapest University of T...
Overview
 Introduction
 MDE scalability challenges for model queries
 Overview: scaling out in the cloud
 Evaluation: ...
SCALABILITY IN MDE
Scalability challenges in MDE
 Complex instance models and queries
 Instance model complexity
o Size
o Structure
 Query...
Model sizes
 Instance models with several million elements
o AUTOSAR models [1]
o Source code models
o Sensor data
Source...
EMF-IncQuery
 State of the art incremental graph query engine
 Open source Eclipse project by BUTE and others
 Typical ...
Single workstation limitations
 Majority of tools mostly work for <1M model
elements due to algorithmic complexity
 Best...
OVERVIEW OF THE
INCQUERY-D APPROACH
In-memory
EMF model
Architecture
In-memory storage
Transaction
Rete
net
Indexer
layer
Indexing
Production network
• Stores...
DB shard 0
Architecture
In-memory storageServer 1
DB shard 1
Server 2
DB shard 2
Server 3
DB shard 3
Transaction
Server 0
...
Rete net
 Asynchronous communication
 Consistency guaranteed by a termination protocol
indexer indexer indexer indexer
p...
IncQuery-D
 Scaling out by…
o Sharding the data
o Sharding the pattern matcher network →
Avoid memory bottleneck
 Furthe...
Scalability considerations
 Construction process
1. Shard the data in the storage layer
2. Derive a Rete net layout from ...
EVALUATION
 Benchmark goal
o Evaluate the feasibility of the concept
o Measure the scalability characteristics
o Workload profile si...
 Load and first validation: load the graph to the databases
and execute the query
 Transformation: query the graph and d...
 Load and first validation: load the graph to the databases
and initialize the Rete net and retrieve the results
 Revali...
Implementation
Server 1
DB shard 1
Server 2
DB shard 2
Server 3
DB shard 3
Transaction
In-memory
EMF model
DB shard 0
Serv...
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0....
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0....
0.25
1
4
16
64
256
1024
4096
0.1 /
0.008
0.2 /
0.015
0.5 /
0.03
0.9 /
0.06
1.7 /
0.114
3.5 /
0.231
7.1 /
0.47
14.1 /
0.945...
CONCLUSIONS
Conclusions
 Novel approach for the distributed execution of
incremental graph queries
 Distributed Rete network
o Middl...
Future work
 Tooling and automation
o Evolve the prototype into a developer tool
 Explore optimization possibilities
o A...
Upcoming SlideShare
Loading in...5
×

IncQuery-D: Incremental Queries in the Cloud

927

Published on

A presentation about IncQuery-D, our distributed, incremental graph query engine in the cloud. Presented at the STAF conference's Big MDE workshop.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
927
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

IncQuery-D: Incremental Queries in the Cloud

  1. 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Budapest University of Technology and Economics Fault Tolerant Systems Research Group INCQUERY-D: INCREMENTAL QUERIES IN THE CLOUD Gábor Szárnyas, Benedek Izsó, István Ráth, Dániel Varró
  2. 2. Overview  Introduction  MDE scalability challenges for model queries  Overview: scaling out in the cloud  Evaluation: a feasibility study  Conclusions and future work
  3. 3. SCALABILITY IN MDE
  4. 4. Scalability challenges in MDE  Complex instance models and queries  Instance model complexity o Size o Structure  Query complexity o MDE workloads involve much more complex queries than typical data-driven applications (e.g. model validation, transformations, …)  Scalability challenges arise due to their combination
  5. 5. Model sizes  Instance models with several million elements o AUTOSAR models [1] o Source code models o Sensor data Source: Markus Scheidgen, How Big are Models – An Estimation, 2012. [2] application model size software models 0 – 109 sensor data 109 geo-spatial models 109 – 1012 [1] http://wiki.eclipse.org/Auto_IWG_WP2 [2] http://hwl.hu-berlin.de/fileadmin/user_upload/documents/howbig_techreport.pdf
  6. 6. EMF-IncQuery  State of the art incremental graph query engine  Open source Eclipse project by BUTE and others  Typical use cases o Validation o Incremental model transformation o Model synchronization, view maintenance
  7. 7. Single workstation limitations  Majority of tools mostly work for <1M model elements due to algorithmic complexity  Best tools for <10M model elements due to JVM’s limitations o A JVM cannot handle 15+ GB heap memory efficiently o Long GC pauses o Specialized JVMs (e.g. Azul Systems’ Zing) • Commercial, experimental • May require special hardware  Proposed solution o Scale out: distributed system
  8. 8. OVERVIEW OF THE INCQUERY-D APPROACH
  9. 9. In-memory EMF model Architecture In-memory storage Transaction Rete net Indexer layer Indexing Production network • Stores intermediate query results • Propagates changes EMF-IncQuery
  10. 10. DB shard 0 Architecture In-memory storageServer 1 DB shard 1 Server 2 DB shard 2 Server 3 DB shard 3 Transaction Server 0 Rete net Indexer layer IncQuery-D middleware Rete net Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication EMF-IncQuery IncQuery-D
  11. 11. Rete net  Asynchronous communication  Consistency guaranteed by a termination protocol indexer indexer indexer indexer production DB shard 0 DB shard 1 DB shard 2 DB shard 3
  12. 12. IncQuery-D  Scaling out by… o Sharding the data o Sharding the pattern matcher network → Avoid memory bottleneck  Further advantages o Agnostic to the representation of the graph • Property graph, (EMF, RDF) • Information from the metamodel is only used for indexing o Query layer decoupled from the data storage • Storage layer freely exchangeable • Indexing is independent of storage features
  13. 13. Scalability considerations  Construction process 1. Shard the data in the storage layer 2. Derive a Rete net layout from the query 3. Allocate the middleware indexers 4. Allocate the Rete nodes in the cloud  Design aspects for scalability o Local resource limitations o Load balancing o Minimize remote communication • Given problem characteristics, global resource requirements can be calculated • Approach intrinsically supports dynamic scaling
  14. 14. EVALUATION
  15. 15.  Benchmark goal o Evaluate the feasibility of the concept o Measure the scalability characteristics o Workload profile similar to real world model validation  Scenarios o Batch – “traditional” batch graph search o Incremental – Rete network  Operations o Simulates a user’s interaction with a model o Load and first validation; transformation; revalidation Evaluation of IncQuery-D
  16. 16.  Load and first validation: load the graph to the databases and execute the query  Transformation: query the graph and delete some elements  Revalidation: execute the query Batch graph scenarioIncremental scenario – IncQuery-D Transformation RevalidationGraphML DB shards Result set Load and first validation DB shards Result set
  17. 17.  Load and first validation: load the graph to the databases and initialize the Rete net and retrieve the results  Revalidation: retrieve the results from the Rete net  Transformation: incrementally query the graph and delete some elements, propagate the changes Batch graph scenarioIncremental scenario – IncQuery-D Transformation RevalidationGraphML DB shards Result set Rete net Load and first validation DB shards Result set Rete net
  18. 18. Implementation Server 1 DB shard 1 Server 2 DB shard 2 Server 3 DB shard 3 Transaction In-memory EMF model DB shard 0 Server 0 Rete net Indexer layer IncQuery-D middleware Rete net Neo4j 4 Ubuntu Linux servers 16 GB RAM 2×2.5 GHz Intel Xeon CPU Detailed benchmark description: http://incquery.net/publications/incquery-d Cypher through REST Akka (asynchronous communication) Akka (asynchronous communication)
  19. 19. 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Load and first validation phase Small overhead for the Rete network’s construction 50M+: approx. 30 minutesParallel loading of the graph from a GraphML representation
  20. 20. 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Transformation phase 1. Elementary model query 2. Model manipulation • Both implemented with Cypher • The query evaluation time is dominating • Query is supported by the Rete net • Only the manipulation implemented with Cypher • Overhead due to change propagation is negligible • 1.5 OOM faster • Performs a transformation over a 55M model in one minute
  21. 21. 0.25 1 4 16 64 256 1024 4096 0.1 / 0.008 0.2 / 0.015 0.5 / 0.03 0.9 / 0.06 1.7 / 0.114 3.5 / 0.231 7.1 / 0.47 14.1 / 0.945 28.0 / 1.907 55.8 / 3.853 time[s] model size [million elements / file size in GB] Neo4j/Cypher (batch) IncQuery-D (incremental) Revalidation phase Near instant response time for very large models Different characteristics, 4 OOM for the largest model Revalidation time is independent of node size
  22. 22. CONCLUSIONS
  23. 23. Conclusions  Novel approach for the distributed execution of incremental graph queries  Distributed Rete network o Middleware for change propagation and indexing o Incremental query layer decoupled from a sharded graph database  Results o Working proof of concept o Near instantaneous query evaluation up to 50M+ model elements o Improves scalability of transformations significantly
  24. 24. Future work  Tooling and automation o Evolve the prototype into a developer tool  Explore optimization possibilities o Allocation of Rete nodes o Dynamic reallocation of Rete nodes o Sharding strategy, resource usage, network communication overhead  Cloud readiness  Experiment with distributed EMF model stores o CDO, MongoEMF, Morsa, …

×