Distributed Incremental 
Model Queries over the Cloud 
Budapest University of Technology and Economics 
Department of Measurement and Information Systems 
Dániel Varró 
Budapest University of Technology and Economics 
Fault Tolerant Systems Research Group
Outline of the Talk 
Motivation & Background: 
• Change detection in CPS 
•Design Space Exploration 
Incremental Model Queries: 
The EMF-IncQuery framework 
• Language - Execution 
Distributed Incremental 
Model Queries (IncQuery-D) 
•Architecture - 
Performance Benchmarks 
•Distributed model load 
• Incremental query evaluation 
 Main Contributors 
o István Ráth (lead) 
o Ákos Horváth 
o Gábor Bergmann 
o Ábel Hegedüs 
o Zoltán Ujhelyi 
o Benedek Izsó 
o Gábor Szárnyas 
o Csaba Debreceni 
o Dénes Harmath 
o József Makai 
o Dániel Stein
Challenges for IoT / CPS 
Cyber 
world 
Physical 
world 
Problem 
Solution 
scheme 
Deployment 
Service 
Solution 
pattern 
Component 
service 
offering 
Challenge: 
Detect changes 
• in system state 
• in environment 
Abstractions 
Design space 
exploration
CHANGE DETECTION BY 
INCREMENTAL QUERIES
Big Data Analytics for CPS 
Sensors / Services Data and Event sources Cloud based apps 
Data 
Store 
Data 
Store 
EvenCtlsoud based 
Computation 
Polling 
Events
Challenge: Change Detection in CPS 
Sensors / Services Data and Event sources Cloud based apps 
Data 
Store 
Data 
Store 
EvenCtlsoud based 
Computation 
Polling 
Events 
?
Change Detection in CPS by Incremental Queries 
Sensors / Services Data and Event sources Cloud based apps 
Polling 
Data 
Store 
Events 
Data 
Store 
UUnniiffiieedd CChhaannggee DDeetteeccttiioonn bbyy 
Distributed Incremental Queries 
Cloud based 
Computation
MOTIVATION FOR 
INCREMENTAL MODEL QUERIES
Motivation: Early validation of design rules 
SystemSignalGroup design rule (from AUTOSAR) 
o A SystemSignal and its group must be in the same IPdu 
o Challenge: find violations quickly in large models 
o New difficulties 
• reverse 
navigation 
• complex 
manual 
solution 
AUTOSAR: 
• standardized SW architecture 
of the automotive industry 
• now supported by modern modeling tools 
Design Rule/Well-formedness constraint: 
• each valid car architecture needs to respect 
• designers are immediately notified if violated 
Challenge: 
• 500 design rules in AUTOSAR tools 
• 1 million elements in AUTOSAR models 
• models constantly evolve by designers
Domain-Specific Modeling Languages 
Abstract 
Meta-model 
Model 
«type»
Validation of Well-formedness Constraints 
Meta-model 
Model 
Query 
pattern switchWOSignal(sw) { 
Switch(sw); 
neg find switchHasSignal(sw); 
} 
pattern switchHasSignal(sw) { 
Switch(sw); 
Signal(sig); 
Signal.mountedTo(sig, sw); 
} 
Modify 
User 
Result
Model sizes in practice 
 Models with 10M+ elements are common: 
o Car industry 
o Avionics 
o Source code analysis 
 Models evolve and change continuously 
Validation can take hours 
Application Model size 
System models 108 
Sensor data 109 
Geospatial models 1012 
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
MODEL QUERIES AND 
GRAPH PATTERN MATCHING
What is a model query? 
 For a programmer: 
o A piece of code that searches for parts of the model 
 For the scientist: 
o Query = set of constraints that have to be satisfied by 
(parts of) the (graph) model 
o Result = set of model element tuples that satisfy the 
constraints of the query 
oMatch = bind constraint variables to model elements 
 A query engine: Supports 
o the definitionexecution 
of model queries 
Query(A,B)  ∧condi(Ai,Bi) 
• all tuples of model elements a,b 
• satisfying the query condition 
• along the match A=a and B=b 
• parameters A,B can be input/ output
Graph Pattern Matching for Queries 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor switch: Switch 
 Match: 
o m: L G 
(graph morphism) 
o CSP: 
• Variables: Nodes of L 
• Constraints: Edges of L 
• Domain values: G 
o Complexity: |G|^|L| 
L 
G 
straight 
left 
switchPosition 
switch 
sensor 
All sensors with a switch that belongs to a route must directly be linked to the same route.
Graph Pattern Matching (Local Search) 
switchPosition 
route: Route sp: SwitchPosition 
switch 
routeDefinition 
sensor 
sensor: Sensor switch: Switch 
 Search Plan: 
o Select the first node 
to be matched 
o Define an ordering on 
graph pattern edges 
 Search is restarted from 
scratch each time 
1 
2 
0 
3 
4 
straight 
left
Incremental Graph Pattern Matching 
switchPosition 
route: Route sp: SwitchPosition 
switch 
routeDefinition 
sensor 
sensor: Sensor switch: Switch 
 Main idea: More space to less time 
o Cache matches of patterns 
o Instantly retrieve match (if valid) 
o Update caches upon model changes 
o Notify about relevant changes 
 Approaches: 
o TREAT, LEAPS, RETE, … 
o Tools: VIATRA, GROOVE, MoTE, TCore 
straight 
left 
route sp switch sensor 
r1 sp1 sw1
INCREMENTAL MODEL QUERIES: 
THE EMF-INCQUERY PROJECT
EMF-IncQuery: An Open Source Eclipse Project 
• Declarative graph query 
language 
• Transitive closure, 
Negative cond., etc. 
• Compositional, reusable 
Definition 
• Incremental evaluation 
• Cache result set 
•Maintain incrementally 
upon model change 
Execution 
• Derived features, 
• On-the-fly validation 
• View generation, 
Notifications, Soft links, 
Databinding, 
Features 
http://eclipse.org/incquery
The IncQuery (IQ) Graph Query Language 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor Switch: Switch 
 IQ: declarative query language 
o Attribute constraints 
o Local + global queries 
o Compositionality+Reusabilility 
o Recursion, Negation, 
o Transitive Closure 
o Syntax: DATALOG style 
pattern routeSensor(sensor: Sensor) = { 
TrackElement.sensor(switch,sensor); 
Switch(switch); 
SwitchPosition. switch(sp, switch); 
SwitchPosition(sp); 
Route.switchPosition(route, sp); 
Route(route); 
neg find head(route, sensor); 
} 
pattern head(R, Sen) = { 
Route.routeDefinition(R, Sen); 
} 
ModelQuery(A,B): 
• tuples of model elements A, B 
• satisfying the query condition 
• enumerate 1 / all instances 
• A,B can be input or output 
switchPosition 
switch 
sensor
Incremental Query Evaluation by RETE 
 AUTOSAR well-formedness validation rule 
Communication 
channel 
Logical signal Mapping Physical signal 
 Instance model 
Invalid model fragment 
Valid model fragment
Incremental Query Evaluation by RETE 
Read the changes in the 
PFrFRioMlileplltoaahtddghea ietftwhyeineottphrrhkueeeetsm rucnhnlootaoddsndeeegeltsess 
result set (deltas) 
join 
join 
antijoin 
Result set 
Communication 
channel 
Logical signal Mapping Physical signal
Performance of EMF-INCQUERY 
 Incremental graph queries based on Rete 
 Built for the Eclipse Modeling Framework 
model size 
runtime 
batch 
queries 
incremental 
queries 
Runtime is proportional to 
the size of the modification.
Performance of EMF-INCQUERY 
Storing partial 
memory results 
consumption 
incremental 
queries 
batch 
queries 
memory 
limit 
model size
Selected Applications (EMF-IncQuery) 
• Complex traceability 
• Query driven views 
• Abstract models by 
derived objects 
Toolchain for 
IMA configs 
• Connect to Matlab 
Simulink model 
• Export: Matlab2EMF 
• Change model in EMF 
• Re-import: 
EMF2Matlab 
MATLAB-EMF 
Bridge 
• Live models 
(refreshed 25 
frame/s) 
• Complex event 
processing 
Gesture 
recognition 
• Experiments on open 
source Java projects 
• Local search vs. 
Incremental vs. 
Native Java code 
Detection of 
bad code smells 
• Rules for operations 
• Complex structural 
constraints (as GP) 
• Hints and guidance 
• Potentially infinite 
state space 
Design Space 
Exploration 
• Itemis (developer) 
• Embraer 
• Thales 
• ThyssenKrupp 
• CERN 
Known Users
INCQUERY-D: DISTRIBUTED 
INCREMENTAL MODEL QUERIES
Goals of INCQUERY-D 
 Objectives 
o Distributed incremental pattern matching 
o Adaptation of EMF-INCQUERY’s tooling to graph DBs 
o Executed over cloud infrastructure (COTS hardware) 
 Achieve scalability by avoiding memory bottleneck 
o Sharding separately 
• Data 
• Indexers 
• Query network 
o In memory: 
• Index + Query 
Assumptions 
• All Rete nodes fit on a server node 
• Indexers can be filled efficiently 
• Modification size ≪ model size 
• The application requires the complete result 
set of the query (opposed to just one match)
Dimensions of Scalability 
 Infrastructure 
o Number of machines 
o Available memory / CPU 
o Network performance 
o Number of concurrent users 
 Model 
o Model size 
o Model characteristics 
 Queries 
o Number of queries 
o Query complexity 
Metrics
INCQUERY-D Architecture 
EMF-INCQUERY INCQUERY-D 
Join 
Database 
shard 1 
Server 1 
Join 
Database 
shard 2 
Server 2 
Triple store (4store), 
Document DB (Mongo), 
RDF over Column family 
Database 
shard 3 
Server 3 
Transaction 
Database 
shard 0 
In-memory 
EMF model 
Server 0 
Antijoin 
Rete net 
Indexer 
layer 
Akka 
Distributed query evaluation network 
Distributed indexer Model access adapter Indexing Indexer Indexer Indexer Indexer 
In-memory storage 
Distributed indexing, 
notification 
Production network 
• Stores intermediate query results 
• Propagates changes 
Distributed persistent 
storage 
Distributed production network 
• Each intermediate node can be allocated 
to a different host 
• Remote internode (Cumulus) 
communication
Termination Protocol in INCQUERY-D 
Database 
shard 0 
When a production node reached 
an ACK message is sent back Stack added to each update msg 
Database 
shard 1 
Server 1 
• Registers the Rete nodes the 
message passes through 
Database 
shard 2 
Server 2 
User retrieves 
query result 
Database 
shard 3 
Server 3 
Transaction 
Server 0 
INCQUERY-D 
Join 
Join 
Antijoin 
Indexer Indexer Indexer Indexer
IncQuery-D Architectural Layers 
• Gremlin, Cypher 
• SPARQL 
• IQPL (IncQuery) 
High-Level 
Query Lang 
• Distributed Indexers 
(MONDIX) 
• SPARQL 
Low-Level 
Query Lang 
• Cayley 
• Titan 
• 4store 
Distributed 
Graph DB 
• MongoDB 
• Cassandra 
• 4store 
Native 
Storage 
• RDF 
• XMI / Ecore 
• Property Graphs 
Storage 
Format 
• Global queries 
• Complex navigations 
• Efficient element access by indices 
• Local queries 
• Can be transparent (via indexers) 
• Integrates popular graph storages 
• Efficient NoSQL storages 
• Triple stores 
• Standardized data formats 
• Popular interchange formats
Summary: Key Components of IncQuery-D 
Distributed 
Model Storage 
• Adaptable to 
different back-end 
storages 
• Agnostic to 
graph repres. 
• TripleStores 
(RDF), EMF, 
Property 
graph 
Model Access 
Adapter 
• Surrogate key 
to identify 
distibuted 
elements 
• Graph manip. 
API 
• Change 
notifications 
Distributed 
Indexer 
• Type-instance 
indices, etc. 
• Stored on 
multiple 
servers 
• Protects 
exceeding 
memory limits 
Distributed 
Query Evaluator 
• Distributed 
RETE network 
• Distributed 
termination 
protocol 
• Constructed 
and deployed 
by coordinator 
node 
Decouple and separately distribute Storage, Indexer and Query layers
USAGE PHASES
Load 
Model 
(1) Loading a Query 
Update 
Model 
Request 
Result 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Cloud 
Infra-structure 
Construct 
RETE 
Load 
Query 
Construct RETE 
• From EMF-IncQuery specs 
• Should incorporate 
infrastructure constraints 
Deploy RETE 
• Managed by a 
coordinator node 
• Intelligent sharding of 
RETE nodes
Load 
Model 
(2) Loading a Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Load model 
• Model traversal 
• Init indexers 
• Network 
communication
Load 
Model 
(3) Updating a Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Model manipulation 
• Update messages 
• Create / Delete
(4) Requesting Query Result 
Load 
Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Evaluate 
Query 
Maintain 
Result Set 
Cloud 
Infra-structure 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Evaluate query 
• Process incoming 
messages 
• Propagate along 
RETE network 
Retrieve results 
• instantly
(5) Monitoring and Reconfiguration 
Load 
Model 
Update 
Model 
Request 
Result 
Model 
shards 
Deploy 
RETE 
RETE 
Network 
Allocate 
RETE 
Evaluate 
Query 
Maintain 
Result Set 
Cloud 
Infra-structure 
Monitor  Manage 
Construct 
RETE 
Model 
Access 
Adapter 
Load 
Query 
Visualized on a 
web-based dashboard 
OS metrics JVM metrics Akka metrics Rete metrics
DEPLOYMENT PROCESS FOR 
DISTRIBUTED RETE
RETE Deployment Process 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
route: Route sp: SwitchPosition 
routeDefinition 
sensor: Sensor Switch: Switch 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
pattern routeSensor(sensor: Sensor) = { 
TrackElement.sensor(switch,sensor); 
Switch(switch); 
SwitchPosition. switch(sp, switch); 
SwitchPosition(sp); 
Route.switchPosition(route, sp); 
Route(route); 
neg find head(route, sensor); 
} 
pattern head(R, Sen) = { 
Route.routeDefinition(R, Sen); 
} 
switchPosition 
switch 
sensor
Tooling: RDF Pattern Language 
import http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl 
pattern posLength(Segment, SegmentLength) { 
Segment(Segment); 
Segment.Segment_length(Segment, SegmentLength); 
check(SegmentLength = 0); 
} EMF-IncQuery syntax 
Vocabulary railway.rdf 
base http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl# 
pattern posLength(Segment, SegmentLength) { 
Segment(Segment); 
Segment_length(Segment, SegmentLength); 
check(SegmentLength = 0); 
} 
segment: Segment 
segment.length 0 
RDF-IncQuery syntax 
Xbase (compiles to Java) 
Javascript
RETE Deployment Process 
 Construct language-independent 
constraints 
 Resolution of 
o syntactic sugar 
o type information 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
Variables rroouuttee sspp sswwiittcchh 
PPaarraammeetteerr sensor 
Edge: SwitchPosition.switch 
Edge: TrackElement.sensor 
Edge: Route.switchPosition 
Negation: head 
Constraints
RETE Deployment Process 
 Construct RETE structure 
(platform independently) 
 Optimizations: 
o Model statistics 
o Expected usage profile 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
join 
join 
join
RETE Deployment Process 
 Architecture model 
(Cloud infrastructure) 
o Virtual Machines 
• Memory limits 
• CPU speed 
• Storage capacity 
o Communication Channels 
• Bandwidth 
 Specified by a textual DSL 
(Xtext) 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
1 2 
3 4 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor
RETE Deployment Process 
Machine Allocated Nodes 
1 In1, In2, Join2 
2 In3 
3 In4 
4 Join1, Join3 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
1 2 
3 4 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor 
Join1 
Join3 
Join2 
In1 In2 In3 In4
RETE Deployment Process 
 Configuration scripts for 
o Deployment 
o Communication 
middleware 
 Derived by automated 
code generation 
o Using Eclipse technology: 
EMF-IncQuery + Xtend 
Query 
Language 
Query 
Predicates 
RETE 
Structure 
Platform 
Description 
Allocation / 
Mapping 
Deployment 
Descriptor
DISTRIBUTED 
PERFORMANCE BENCHMARKS
The Train Benchmark 
 Model validation workload: 
o User edits the model 
o Instant validation of 
well-formedness constraints 
o Model is repaired accordingly 
 Scenario: 
o Load 
o Check 
o Edit 
o Re-Check 
 Models: 
o Randomly generated 
o Close to real world instances 
o Following different metrics 
o Customized distributions 
o Low number of violations 
 Queries: 
o Two simple queries 
(2 objects, attributes) 
o Two complex queries 
(4-7 joins, negation, etc.) 
o Validated match sets 
Incremental Batch validation validation 
Instance 
model 
Read Check ! Edit  ReCheck  
100x
Evaluation of distributed scalability 
 Extensions to previous work (single workstation) 
o Generation of large instance models 
o Distributed, parallel loading of models 
o Distributed transformation and validation 
Benchmark Distributed benchmark 
Model size 1K – 13M 1K – 88M 
Load method Batch Distributed, parallel 
Transformation and validation Single workstation Multiple servers
IncremBenattcahl sgcreanpahr isoc e–nIanrciQouery-D 
 Load and first validation: load the graph to the databases 
and execute initialize the the Rete query 
net and retrieve the results 
 Transformation: query the incrementally graph query and the delete graph some 
and 
delete elements 
some elements, propagate the changes 
 Revalidation: execute retrieve the query 
results from the Rete net 
Load and first 
validation 
GraphML Transformation Revalidation 
DB shards Result set 
Rete net 
DB shards Result set 
Rete net
Benchmark environment 
 Private cloud 
 Different DBMSs 
 Query 
o The DBMS’s own query language 
o IncQuery-D 
SPARQL Gremlin
4096 
1024 
256 
64 
16 
4 
1 
Runtime [s] 
Load and first validation 
55M model: approx. 15 minutes 
Rete network’s 
initialization 
overhead pays off 
Model size [million elements] 
4store IncQuery-D Titan IncQuery-D 4store
256 
128 
64 
32 
16 
8 
4 
2 
1 
Runtime [s] 
Model modification 
1. Elementary model query 
2. Model modification 
2 orders of 
magnitude 
– Query from the Rete network’s indexer 
– Propagation of modifications is fast 
Model size [million elements] 
4store IncQuery-D Titan IncQuery-D 4store
128,00 
32,00 
8,00 
2,00 
0,50 
0,13 
0,03 
0,01 
Runtime [s] 
Revalidation 
Different characteristics 
Sub-second response 
time for models with 
88M elements 
Model size [million elements] 
memory 
limit 
4store IncQuery-D Titan IncQuery-D 4store
Benchmarking Conclusions 
 Memory consumption 
o Single workstation: 13M model, 4 GB 
o Cloud of four servers: 55M model, 4×8 GB 
 Runtime 
o Same order of magnitude and similar characteristics to 
the single workstation tool 
INCQUERY-D is scalable and significantly more efficient for query 
evaluation than the native query engines in 4store, Titan and Neo4j
Applications of Distributed Incremental Queries 
• Query based optimistic locking 
• Queries for Attribute Based Access 
Control 
Collaborative 
Modeling (MONDO) 
• Events vs. Changes 
• Handle compound changes as events 
Complex Event 
Processing w/ 
compound changes 
• System evolves along operations 
• Cost / Objectives associated to 
• States + Environment + Trajectory 
Rule-based Design 
Space Exploration
CONFIGURATION SYNTHESIS BY 
DESIGN SPACE EXPLORATION
TRANS-IMA Project (Avionics) 
Goal: Allocate SW components to 
ARINC653 compliant IMA platform 
58 
Functional 
Architecture 
Component 
database 
Platform 
description 
Allocation 
Integrated 
System 
Model 
Inputs: 
• Platform Independent Model (PIM) 
(functional + nonfunc. reqs; Simulink) 
• Platform Description Model (PDM) 
for ARINC 653 (DSL) 
Output: 
• Integrated system model 
• Ready for simulation 
• End-to-end traceability
Designing ARINC653 configurations 
(critical + non-critical) 
Supply fresh air 
Supply hot air 
Monitor 
temperature 
Set 
temperature 
SW functionality 
Pack 
Controller 
Zone 
Controller 
3 
System 
Display 
AirCond 
Panel 
3 
Redundancy 
requirement
Job instances, Partitions, Modules 
SW functionality 
(critical + non-critical) 
Pack 
Controller 
Zone 
Controller 
3 
System 
Display 
AirCond 
Panel 
3 
Job instances 
1 
2 3 
4 
5 6 
7 
8 
Partitions 
Modules 
Constraints 
2 
5 
3 
4 
8 
8 
8 
8 
Memory needs 
+ constraints 
Do not mix critical 
and non-crit. jobs 
Do not mix 
instances of the 
same critical job 
Additional constraints 
• WCET, 
• scheduling, etc. 
• interfaces 
• datatypes
Allocating communication channels 
SW functionality 
Pack 
Controller 
Zone 
Controller 
3 
System 
Display 
AirCond 
Panel 
3 
1 
2 
3 
7 
4 
5 
6 
8 
Communication 
channels 
Humidity 
Pressure 
Temperature
Design Space Exploration 
Design Space Exploration 
62 
Design 
Alternative 1 
Design 
Alternative 2 
Design 
Alternative 3 
Design 
Alternative 4 
Objectives 
Global 
Constraints 
Initial Design 
Solvers 
• CLP solvers (Choco) 
• model finders (Alloy) 
• meta-heuristics + 
multi-objective optimization
Design Space Exploration (Example) 
63 
Consistency Analysis 
Design 
Alternative 1 
Design 
Alternative 2 
Design 
Alternative 3 
Design 
Alternative 4 
Objectives 
Global 
Constraints 
Initial Design 
A 
B 
x=2 
C 
A 
A A 
B 
x=? 
C 
I1 I2
Design Space Exploration (Example) 
 
 
64 
+ Filled 
Attributes 
Consistency Analysis 
Design 
Alternative 1 
Design 
Alternative 2 
Design 
Alternative 3 
Design 
Alternative 4 
Objectives 
Global 
Constraints 
Initial Design 
A 
B 
x=2 
C 
x=? 
A 
B 
x=5 
C 
C 
C 
A 
C 
O 
C1 
C2 
 
+ Objects 
I1 I2 
+ Relations
Rule Based Design Space Exploration 
Design Space Exploration 
65 
Design 
Alternative 1 
Design 
Alternative 2 
Design 
Alternative 3 
Design 
Alternative 4 
Objectives 
Global 
Constraints 
Operations 
Initial Design 
Special state space exploration 
• potentially infinite state space 
• „dense” solution space
Rule-Based Guided Design Space Exploration 
Design Space Exploration 
66 
Seq of Transf. 
Rules 1 
Seq of Transf. 
Rules 2 
Seq of Transf. 
Rules 3 
Seq of Transf. 
Rules 4 
Model queries 
as Objectives 
Model queries 
as Constraints 
Transf. Rules 
as Operations 
Initial 
Model 
Guidance for exploration: Hints 
• designer / end user 
• formal analysis
Conclusions

SERENE 2014 School: Daniel varro serene2014_school

  • 1.
    Distributed Incremental ModelQueries over the Cloud Budapest University of Technology and Economics Department of Measurement and Information Systems Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  • 2.
    Outline of theTalk Motivation & Background: • Change detection in CPS •Design Space Exploration Incremental Model Queries: The EMF-IncQuery framework • Language - Execution Distributed Incremental Model Queries (IncQuery-D) •Architecture - Performance Benchmarks •Distributed model load • Incremental query evaluation Main Contributors o István Ráth (lead) o Ákos Horváth o Gábor Bergmann o Ábel Hegedüs o Zoltán Ujhelyi o Benedek Izsó o Gábor Szárnyas o Csaba Debreceni o Dénes Harmath o József Makai o Dániel Stein
  • 3.
    Challenges for IoT/ CPS Cyber world Physical world Problem Solution scheme Deployment Service Solution pattern Component service offering Challenge: Detect changes • in system state • in environment Abstractions Design space exploration
  • 4.
    CHANGE DETECTION BY INCREMENTAL QUERIES
  • 5.
    Big Data Analyticsfor CPS Sensors / Services Data and Event sources Cloud based apps Data Store Data Store EvenCtlsoud based Computation Polling Events
  • 6.
    Challenge: Change Detectionin CPS Sensors / Services Data and Event sources Cloud based apps Data Store Data Store EvenCtlsoud based Computation Polling Events ?
  • 7.
    Change Detection inCPS by Incremental Queries Sensors / Services Data and Event sources Cloud based apps Polling Data Store Events Data Store UUnniiffiieedd CChhaannggee DDeetteeccttiioonn bbyy Distributed Incremental Queries Cloud based Computation
  • 8.
  • 9.
    Motivation: Early validationof design rules SystemSignalGroup design rule (from AUTOSAR) o A SystemSignal and its group must be in the same IPdu o Challenge: find violations quickly in large models o New difficulties • reverse navigation • complex manual solution AUTOSAR: • standardized SW architecture of the automotive industry • now supported by modern modeling tools Design Rule/Well-formedness constraint: • each valid car architecture needs to respect • designers are immediately notified if violated Challenge: • 500 design rules in AUTOSAR tools • 1 million elements in AUTOSAR models • models constantly evolve by designers
  • 10.
    Domain-Specific Modeling Languages Abstract Meta-model Model «type»
  • 11.
    Validation of Well-formednessConstraints Meta-model Model Query pattern switchWOSignal(sw) { Switch(sw); neg find switchHasSignal(sw); } pattern switchHasSignal(sw) { Switch(sw); Signal(sig); Signal.mountedTo(sig, sw); } Modify User Result
  • 12.
    Model sizes inpractice Models with 10M+ elements are common: o Car industry o Avionics o Source code analysis Models evolve and change continuously Validation can take hours Application Model size System models 108 Sensor data 109 Geospatial models 1012 Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
  • 13.
    MODEL QUERIES AND GRAPH PATTERN MATCHING
  • 14.
    What is amodel query? For a programmer: o A piece of code that searches for parts of the model For the scientist: o Query = set of constraints that have to be satisfied by (parts of) the (graph) model o Result = set of model element tuples that satisfy the constraints of the query oMatch = bind constraint variables to model elements A query engine: Supports o the definitionexecution of model queries Query(A,B) ∧condi(Ai,Bi) • all tuples of model elements a,b • satisfying the query condition • along the match A=a and B=b • parameters A,B can be input/ output
  • 15.
    Graph Pattern Matchingfor Queries route: Route sp: SwitchPosition routeDefinition sensor: Sensor switch: Switch Match: o m: L G (graph morphism) o CSP: • Variables: Nodes of L • Constraints: Edges of L • Domain values: G o Complexity: |G|^|L| L G straight left switchPosition switch sensor All sensors with a switch that belongs to a route must directly be linked to the same route.
  • 16.
    Graph Pattern Matching(Local Search) switchPosition route: Route sp: SwitchPosition switch routeDefinition sensor sensor: Sensor switch: Switch Search Plan: o Select the first node to be matched o Define an ordering on graph pattern edges Search is restarted from scratch each time 1 2 0 3 4 straight left
  • 17.
    Incremental Graph PatternMatching switchPosition route: Route sp: SwitchPosition switch routeDefinition sensor sensor: Sensor switch: Switch Main idea: More space to less time o Cache matches of patterns o Instantly retrieve match (if valid) o Update caches upon model changes o Notify about relevant changes Approaches: o TREAT, LEAPS, RETE, … o Tools: VIATRA, GROOVE, MoTE, TCore straight left route sp switch sensor r1 sp1 sw1
  • 18.
    INCREMENTAL MODEL QUERIES: THE EMF-INCQUERY PROJECT
  • 19.
    EMF-IncQuery: An OpenSource Eclipse Project • Declarative graph query language • Transitive closure, Negative cond., etc. • Compositional, reusable Definition • Incremental evaluation • Cache result set •Maintain incrementally upon model change Execution • Derived features, • On-the-fly validation • View generation, Notifications, Soft links, Databinding, Features http://eclipse.org/incquery
  • 20.
    The IncQuery (IQ)Graph Query Language route: Route sp: SwitchPosition routeDefinition sensor: Sensor Switch: Switch IQ: declarative query language o Attribute constraints o Local + global queries o Compositionality+Reusabilility o Recursion, Negation, o Transitive Closure o Syntax: DATALOG style pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } ModelQuery(A,B): • tuples of model elements A, B • satisfying the query condition • enumerate 1 / all instances • A,B can be input or output switchPosition switch sensor
  • 21.
    Incremental Query Evaluationby RETE AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal Instance model Invalid model fragment Valid model fragment
  • 22.
    Incremental Query Evaluationby RETE Read the changes in the PFrFRioMlileplltoaahtddghea ietftwhyeineottphrrhkueeeetsm rucnhnlootaoddsndeeegeltsess result set (deltas) join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  • 23.
    Performance of EMF-INCQUERY Incremental graph queries based on Rete Built for the Eclipse Modeling Framework model size runtime batch queries incremental queries Runtime is proportional to the size of the modification.
  • 24.
    Performance of EMF-INCQUERY Storing partial memory results consumption incremental queries batch queries memory limit model size
  • 25.
    Selected Applications (EMF-IncQuery) • Complex traceability • Query driven views • Abstract models by derived objects Toolchain for IMA configs • Connect to Matlab Simulink model • Export: Matlab2EMF • Change model in EMF • Re-import: EMF2Matlab MATLAB-EMF Bridge • Live models (refreshed 25 frame/s) • Complex event processing Gesture recognition • Experiments on open source Java projects • Local search vs. Incremental vs. Native Java code Detection of bad code smells • Rules for operations • Complex structural constraints (as GP) • Hints and guidance • Potentially infinite state space Design Space Exploration • Itemis (developer) • Embraer • Thales • ThyssenKrupp • CERN Known Users
  • 26.
  • 27.
    Goals of INCQUERY-D Objectives o Distributed incremental pattern matching o Adaptation of EMF-INCQUERY’s tooling to graph DBs o Executed over cloud infrastructure (COTS hardware) Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  • 28.
    Dimensions of Scalability Infrastructure o Number of machines o Available memory / CPU o Network performance o Number of concurrent users Model o Model size o Model characteristics Queries o Number of queries o Query complexity Metrics
  • 29.
    INCQUERY-D Architecture EMF-INCQUERYINCQUERY-D Join Database shard 1 Server 1 Join Database shard 2 Server 2 Triple store (4store), Document DB (Mongo), RDF over Column family Database shard 3 Server 3 Transaction Database shard 0 In-memory EMF model Server 0 Antijoin Rete net Indexer layer Akka Distributed query evaluation network Distributed indexer Model access adapter Indexing Indexer Indexer Indexer Indexer In-memory storage Distributed indexing, notification Production network • Stores intermediate query results • Propagates changes Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode (Cumulus) communication
  • 30.
    Termination Protocol inINCQUERY-D Database shard 0 When a production node reached an ACK message is sent back Stack added to each update msg Database shard 1 Server 1 • Registers the Rete nodes the message passes through Database shard 2 Server 2 User retrieves query result Database shard 3 Server 3 Transaction Server 0 INCQUERY-D Join Join Antijoin Indexer Indexer Indexer Indexer
  • 31.
    IncQuery-D Architectural Layers • Gremlin, Cypher • SPARQL • IQPL (IncQuery) High-Level Query Lang • Distributed Indexers (MONDIX) • SPARQL Low-Level Query Lang • Cayley • Titan • 4store Distributed Graph DB • MongoDB • Cassandra • 4store Native Storage • RDF • XMI / Ecore • Property Graphs Storage Format • Global queries • Complex navigations • Efficient element access by indices • Local queries • Can be transparent (via indexers) • Integrates popular graph storages • Efficient NoSQL storages • Triple stores • Standardized data formats • Popular interchange formats
  • 32.
    Summary: Key Componentsof IncQuery-D Distributed Model Storage • Adaptable to different back-end storages • Agnostic to graph repres. • TripleStores (RDF), EMF, Property graph Model Access Adapter • Surrogate key to identify distibuted elements • Graph manip. API • Change notifications Distributed Indexer • Type-instance indices, etc. • Stored on multiple servers • Protects exceeding memory limits Distributed Query Evaluator • Distributed RETE network • Distributed termination protocol • Constructed and deployed by coordinator node Decouple and separately distribute Storage, Indexer and Query layers
  • 33.
  • 34.
    Load Model (1)Loading a Query Update Model Request Result Deploy RETE RETE Network Allocate RETE Cloud Infra-structure Construct RETE Load Query Construct RETE • From EMF-IncQuery specs • Should incorporate infrastructure constraints Deploy RETE • Managed by a coordinator node • Intelligent sharding of RETE nodes
  • 35.
    Load Model (2)Loading a Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Load model • Model traversal • Init indexers • Network communication
  • 36.
    Load Model (3)Updating a Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Model manipulation • Update messages • Create / Delete
  • 37.
    (4) Requesting QueryResult Load Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Evaluate Query Maintain Result Set Cloud Infra-structure Construct RETE Model Access Adapter Load Query Evaluate query • Process incoming messages • Propagate along RETE network Retrieve results • instantly
  • 38.
    (5) Monitoring andReconfiguration Load Model Update Model Request Result Model shards Deploy RETE RETE Network Allocate RETE Evaluate Query Maintain Result Set Cloud Infra-structure Monitor Manage Construct RETE Model Access Adapter Load Query Visualized on a web-based dashboard OS metrics JVM metrics Akka metrics Rete metrics
  • 39.
    DEPLOYMENT PROCESS FOR DISTRIBUTED RETE
  • 40.
    RETE Deployment Process Query Language Query Predicates RETE Structure route: Route sp: SwitchPosition routeDefinition sensor: Sensor Switch: Switch Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } switchPosition switch sensor
  • 41.
    Tooling: RDF PatternLanguage import http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment.Segment_length(Segment, SegmentLength); check(SegmentLength = 0); } EMF-IncQuery syntax Vocabulary railway.rdf base http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl# pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment_length(Segment, SegmentLength); check(SegmentLength = 0); } segment: Segment segment.length 0 RDF-IncQuery syntax Xbase (compiles to Java) Javascript
  • 42.
    RETE Deployment Process Construct language-independent constraints Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables rroouuttee sspp sswwiittcchh PPaarraammeetteerr sensor Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head Constraints
  • 43.
    RETE Deployment Process Construct RETE structure (platform independently) Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  • 44.
    RETE Deployment Process Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure 1 2 3 4 Platform Description Allocation / Mapping Deployment Descriptor
  • 45.
    RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure 1 2 3 4 Platform Description Allocation / Mapping Deployment Descriptor Join1 Join3 Join2 In1 In2 In3 In4
  • 46.
    RETE Deployment Process Configuration scripts for o Deployment o Communication middleware Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  • 47.
  • 48.
    The Train Benchmark Model validation workload: o User edits the model o Instant validation of well-formedness constraints o Model is repaired accordingly Scenario: o Load o Check o Edit o Re-Check Models: o Randomly generated o Close to real world instances o Following different metrics o Customized distributions o Low number of violations Queries: o Two simple queries (2 objects, attributes) o Two complex queries (4-7 joins, negation, etc.) o Validated match sets Incremental Batch validation validation Instance model Read Check ! Edit ReCheck 100x
  • 49.
    Evaluation of distributedscalability Extensions to previous work (single workstation) o Generation of large instance models o Distributed, parallel loading of models o Distributed transformation and validation Benchmark Distributed benchmark Model size 1K – 13M 1K – 88M Load method Batch Distributed, parallel Transformation and validation Single workstation Multiple servers
  • 50.
    IncremBenattcahl sgcreanpahr isoce–nIanrciQouery-D Load and first validation: load the graph to the databases and execute initialize the the Rete query net and retrieve the results Transformation: query the incrementally graph query and the delete graph some and delete elements some elements, propagate the changes Revalidation: execute retrieve the query results from the Rete net Load and first validation GraphML Transformation Revalidation DB shards Result set Rete net DB shards Result set Rete net
  • 51.
    Benchmark environment Private cloud Different DBMSs Query o The DBMS’s own query language o IncQuery-D SPARQL Gremlin
  • 52.
    4096 1024 256 64 16 4 1 Runtime [s] Load and first validation 55M model: approx. 15 minutes Rete network’s initialization overhead pays off Model size [million elements] 4store IncQuery-D Titan IncQuery-D 4store
  • 53.
    256 128 64 32 16 8 4 2 1 Runtime [s] Model modification 1. Elementary model query 2. Model modification 2 orders of magnitude – Query from the Rete network’s indexer – Propagation of modifications is fast Model size [million elements] 4store IncQuery-D Titan IncQuery-D 4store
  • 54.
    128,00 32,00 8,00 2,00 0,50 0,13 0,03 0,01 Runtime [s] Revalidation Different characteristics Sub-second response time for models with 88M elements Model size [million elements] memory limit 4store IncQuery-D Titan IncQuery-D 4store
  • 55.
    Benchmarking Conclusions Memory consumption o Single workstation: 13M model, 4 GB o Cloud of four servers: 55M model, 4×8 GB Runtime o Same order of magnitude and similar characteristics to the single workstation tool INCQUERY-D is scalable and significantly more efficient for query evaluation than the native query engines in 4store, Titan and Neo4j
  • 56.
    Applications of DistributedIncremental Queries • Query based optimistic locking • Queries for Attribute Based Access Control Collaborative Modeling (MONDO) • Events vs. Changes • Handle compound changes as events Complex Event Processing w/ compound changes • System evolves along operations • Cost / Objectives associated to • States + Environment + Trajectory Rule-based Design Space Exploration
  • 57.
    CONFIGURATION SYNTHESIS BY DESIGN SPACE EXPLORATION
  • 58.
    TRANS-IMA Project (Avionics) Goal: Allocate SW components to ARINC653 compliant IMA platform 58 Functional Architecture Component database Platform description Allocation Integrated System Model Inputs: • Platform Independent Model (PIM) (functional + nonfunc. reqs; Simulink) • Platform Description Model (PDM) for ARINC 653 (DSL) Output: • Integrated system model • Ready for simulation • End-to-end traceability
  • 59.
    Designing ARINC653 configurations (critical + non-critical) Supply fresh air Supply hot air Monitor temperature Set temperature SW functionality Pack Controller Zone Controller 3 System Display AirCond Panel 3 Redundancy requirement
  • 60.
    Job instances, Partitions,Modules SW functionality (critical + non-critical) Pack Controller Zone Controller 3 System Display AirCond Panel 3 Job instances 1 2 3 4 5 6 7 8 Partitions Modules Constraints 2 5 3 4 8 8 8 8 Memory needs + constraints Do not mix critical and non-crit. jobs Do not mix instances of the same critical job Additional constraints • WCET, • scheduling, etc. • interfaces • datatypes
  • 61.
    Allocating communication channels SW functionality Pack Controller Zone Controller 3 System Display AirCond Panel 3 1 2 3 7 4 5 6 8 Communication channels Humidity Pressure Temperature
  • 62.
    Design Space Exploration Design Space Exploration 62 Design Alternative 1 Design Alternative 2 Design Alternative 3 Design Alternative 4 Objectives Global Constraints Initial Design Solvers • CLP solvers (Choco) • model finders (Alloy) • meta-heuristics + multi-objective optimization
  • 63.
    Design Space Exploration(Example) 63 Consistency Analysis Design Alternative 1 Design Alternative 2 Design Alternative 3 Design Alternative 4 Objectives Global Constraints Initial Design A B x=2 C A A A B x=? C I1 I2
  • 64.
    Design Space Exploration(Example) 64 + Filled Attributes Consistency Analysis Design Alternative 1 Design Alternative 2 Design Alternative 3 Design Alternative 4 Objectives Global Constraints Initial Design A B x=2 C x=? A B x=5 C C C A C O C1 C2 + Objects I1 I2 + Relations
  • 65.
    Rule Based DesignSpace Exploration Design Space Exploration 65 Design Alternative 1 Design Alternative 2 Design Alternative 3 Design Alternative 4 Objectives Global Constraints Operations Initial Design Special state space exploration • potentially infinite state space • „dense” solution space
  • 66.
    Rule-Based Guided DesignSpace Exploration Design Space Exploration 66 Seq of Transf. Rules 1 Seq of Transf. Rules 2 Seq of Transf. Rules 3 Seq of Transf. Rules 4 Model queries as Objectives Model queries as Constraints Transf. Rules as Operations Initial Model Guidance for exploration: Hints • designer / end user • formal analysis
  • 67.