Incremental View Maintenance
for openCypher Queries
Gábor Szárnyas, József Marton
4th openCypher Implementers Meeting
MODEL-DRIVEN ENGINEERING
 Primarily for designing critical systems
 Models are first class citizens during development
o SysML / requirements, statecharts, etc.
o Validation and code generation techniques for correctness
Technology:
Eclipse Modeling Framework (EMF)
 Originally started at IBM as an implementation of the Object
Management Group’s (OMG) Meta Object Facility (MOF).
 i.e. an object-oriented model
 i.e. a property graph-like structure with a metamodel
MODEL VALIDATION
 Implemented with model queries
 Models are typed, attributed graphs
 Typical queries
o Get two components connected by a particular edge
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s)
o Check if two objects are reachable
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s)
o Property checks
MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)
Complex graph queries
1
switch
sensor C
sensor B
2
sensor A
segment
route
RAILWAY NETWORK MODEL
RAILWAY NETWORK MODEL
segment
segment
segmentswitch
sensor C
sensor B
sensor A
route 2
route 1
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
RAILWAY NETWORK MODEL
:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
G. Szárnyas, B. Izsó, I. Ráth, D. Varró:
The Train Benchmark: cross-technology performance
evaluation of continuous model queries.
Software and Systems Modeling, 2017
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»
INCREMENTAL VIEW MAINTENANCE (IVM)
In many use cases…
 queries are static
 data changes slowly
-> views can be maintained incrementally
Graph applications
 model validation
 simulation
 recommendation systems
 fraud detection
INGRAPH: IVM ON PROPERTY GRAPHS
Idea: map to relational algebra and use standard IVM techniques
 Challenging aspects
o Property graph data model
o Cypher language
 Formalise the language in relational algebra
 Use nested relational algebra -> closed on operations
Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks)
Gábor Szárnyas, József Marton, Dániel Varró:
Formalising openCypher Graph Queries in Relational Algebra.
ADBIS 2017
INGRAPH / GRAPH TO NESTED RELATIONS
INGRAPH / NESTED RELATIONAL ALGEBRA OPS
INGRAPH
 ingraph uses a procedural IVM approach: the Rete algorithm.
o Build caches for each operator
o Maintain caches upon changes
o Supports 15+ out of 25 LDBC BI queries
o Details to be published in a conference paper
o Extensible, but very heavy on memory
 The rest of the talk focuses on the algebraic approach.
Gábor Szárnyas, József Marton et al.:
Incremental View Maintenance on Property Graphs.
arXiv preprint will be available on the 1st week of June
Delta Queries for openCypher
DELTA QUERIES AT A GLANCE
𝐺
evaluate query 𝑄
for each Δ𝐺
evaluate Δ𝑄
changes
Δ𝐺1, Δ𝐺2, …
𝑄(𝐺)
Δ𝑄(Δ𝐺1)
Δ𝑄(Δ𝐺2)
⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ )
𝑄 and Δ𝑄 are calculated by the same engine.
IMPLEMENTATION: TRIGGERS IN NEO4J
 Event-driven programming in databases
 Neo4j: TransactionEventHandler interface
o afterCommit(TransactionData data, T state)
o beforeCommit(TransactionData data)
o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …
 Only the updated state of the graph is accessible.
 GraphAware framework: ImprovedTransactionData API
o Get properties and labels/types of deleted elements
Max de Marzi:
Triggers in Neo4j.
2015
Michal Bachman:
Neo4j Improved Transaction Event API.
2014
DERIVING DELTA QUERIES
a b
1 2
3 4
5 6
7 8
a b
1 2
5 6
7 8
𝑅 𝑚
Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which
define positive and negative changes, respectively.
But: most IVM techniques are defined for relational algebra.
Notation:
 𝑅 relation
 Δ𝑅 positive changes
 𝛻𝑅 negative changes
 𝑅 𝑚
: maintained relation of 𝑅 ⇒ 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅
 “−” denotes set minus (∖), “+” denotes set union (∪)
a b
1 2
3 4
5 6
𝑅 𝛻𝑅
Δ𝑅
RELATIONAL ALGEBRA FOR CYPHER
 Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.
 Expand is essentially a natural join.
 Natural join 𝑟 ⋈ 𝑠
 Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠
 Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠
 Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls
Andrés Taylor:
Neo4j Cypher implementation.
First openCypher Implementers Meeting, 2017
v1 v2 v3
1 2 3
1 2 6
RELATIONAL ALGEBRA FOR CYPHER
Natural join: 𝑟 ⋈ 𝑠
MATCH (v1)-[:r]->(v2)-[:s]->(v3)
RETURN *
Semijoin: 𝑟 ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE (v2)-[:s]->()
Antijoin: 𝑟 ഥ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE NOT (v2)-[:s]->()
Left outer join: 𝑟 ⟕ 𝑠
MATCH (v1)-[:r]->(v2)
OPTIONAL MATCH (v2)-[:s]->(v3)
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2
1 2
v1 v2
4 5
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 32
6
:r :s
:s
1 2:r
4 5
:r
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2 v3
1 2 3
1 2 6
4 5 null
4 5
v1 v2 v3
1 2 3
1 2 6
T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998
DERIVING DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the
Incremental Recomputation of Active Relational Expressions.
TKDE 1997
X. Qian, G. Wiederhold:
Incremental Recomputation of Active Relational Expressions.
TKDE 1991
DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
 The seminal paper
 Δ/𝛻 delta queries
for joins, selections,
projections, etc.
 Bag semantics
DELTA QUERIES
 Semijoins, antijoins,
outer joins
 Set semantics
 Later publications,
e.g. Zhou-Larson’s
ICDE’07 paper
improved these
T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998
EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 .
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c
EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
RETURN v1, v2, v3, v4
$pcs -> pass lists of nodes/edges as parameters
// This only works in embedded mode, see neo4j/issues/10239
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
// r1 = a⋈b⋈Δc
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
RETURN v1, v2, v3, v4
UNION ALL
// r2 = a⋈Δb⋈c
UNWIND $pbs AS pb
MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
UNION ALL
// r3 = Δa⋈b⋈c
UNWIND $pas AS pa
MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
Long WITH chains are cumbersome -> patterns+list comprehensions.
WITH [pc IN $pcs | // r1 = a⋈b⋈Δc
[(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) |
[v1, v2, v3, v4]]]
[pb IN $pbs | // r2 = a⋈Δb⋈c
[(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] +
[pa IN $pas | // r3 = Δa⋈b⋈c
[(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] AS r
RETURN
r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
EXAMPLE QUERY #2
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
RETURN v1, v2, v3, v4
v1 v2
v3v4
:b
:c
:d
:a
:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
sw: Switchsensor: Sensor
route: Route
swP:
SwitchPosition
NEGATIVE CONDITIONS
v1 v2
v3v4
:b
:c
:d
:a
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
RETURN v1, v2, v3, v4
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
DELTA QUERIES FOR JOINS AND ANTIJOINS
Natural join
 Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚
+ 𝑆 𝑚
⋈ Δ𝑇
 𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇
Antijoin
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
Expression 2Expression 1
Only 𝑆 𝑚 and 𝑇 𝑚
are available.
SUBEXPRESSIONS
1. Δ𝑇 ഥ⋉ 𝑇 = ?
 R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.
 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2
 If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃)
becomes a natural join, which is an intersection for relations
with the same schema.
 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2
 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2
⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
2. 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅
DELTAS FOR ANTIJOINS
Based on Griffin-Kumar’s ’98 paper.
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅
NEGATIVE CONDITIONS  𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚
𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑
Pushdown 𝛻𝑑:
𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑
𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚
v1 v2
v3v4
:b
:c
:d
:a
NEGATIVE CONDITIONS
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉
NEGATIVE CONDITIONS
⋉ ∈ . , where ∩ is a
single vertex, because and represent edges.
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋉ ∈ .
Δ ⋉ ∈ . Δ
⋉ ∈ .
Δ ⋉ ∈ . Δ
R1
R2
R3
R4
R5
R6
R7
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉
S1
S2
S3
S4
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,
WITH
[] AS pes, [] AS nes
WITH
[pe IN pes WHERE type(pe) = 'a'|pe] AS pas,
[pe IN pes WHERE type(pe) = 'b'|pe] AS pbs,
[pe IN pes WHERE type(pe) = 'c'|pe] AS pcs,
[pe IN pes WHERE type(pe) = 'd'|pe] AS pds,
[ne IN nes WHERE type(ne) = 'a'|ne] AS nas,
[ne IN nes WHERE type(ne) = 'b'|ne] AS nbs,
[ne IN nes WHERE type(ne) = 'c'|ne] AS ncs,
[ne IN nes WHERE type(ne) = 'd'|ne] AS nds
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
[nd IN nds | startNode(nd)] AS nd_v1s,
[nd IN nds | endNode(nd)] AS nd_v4s
// calculating s1s...s4s
// s1s: (𝑎⋉∇𝑑)
UNWIND nd_v1s AS v1
MATCH (v1)-[:a]->(v2)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
collect({v1: v1, v2: v2}) AS s1s
// s2s: (Δ𝑎⋉∇𝑑)
UNWIND pas AS pa
MATCH (v1)-[pa]->(v2)
WHERE v1 IN nd_v1s
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s,
collect({v1: v1, v2: v2}) AS s2s
// s3s: (𝑐⋉∇𝑑)
UNWIND nd_v4s AS v4
MATCH (v3)-[:c]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s, s2s,note
collect({v3: v3, v4: v4}) AS s3s
// s4s: (Δ𝑐⋉∇𝑑)
UNWIND pcs AS pc
MATCH (v3)-[pc]->(v4)
WHERE v4 IN nd_v1s
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s,
s1s, s2s, s3s,
collect({v3: v3, v4: v4}) AS s4s
// calculating r1...r7
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s
// r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
WHERE (v2)-[:b]->(v3)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
collect([v1, v2, v3, v4]) AS r1
// r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s4s AS s4
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1,
s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4
AS v4
WHERE (v2)-[:b]->(v3)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1,
collect([v1, v2, v3, v4]) AS r2
// r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2,
s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2,
collect([v1, v2, v3, v4]) AS r3
// r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s2s AS s2
UNWIND s3s AS s3
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3,
s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3,
collect([v1, v2, v3, v4]) AS r4
// r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑
UNWIND pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4,
collect([v1, v2, v3, v4]) AS r5
// r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pbs AS pb
MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4, r5,
collect([v1, v2, v3, v4]) AS r6
// r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pas AS pa
MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
r1, r2, r3, r4, r5, r6,
collect([v1, v2, v3, v4]) AS r7
WITH
r1 + r5 + r6 + r7 AS rp,
r2 + r3 + r4 AS rn
RETURN
[r IN rp WHERE NOT r IN rn] AS results
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
 Workaround: knowing the change workload helps
o Only consider changes in Δ𝑑 and 𝛻𝑑
o Query is cleaner and much more efficient
UNWIND $nds AS nd
WITH
startNode(nd) AS v1,
endNode(nd) AS v4
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
 This can outperform recomputing the query from scratch.
LIST OPERATIONS IN CYPHER
Instead of subqueries, use chained queries and combine lists.
WITH [1, 1, 2, 2, 3] AS xs
UNWIND xs AS x
RETURN
collect(DISTINCT x) AS unique
WITH [1, 2, 3] AS xs, [2] AS ys
RETURN
xs + ys AS append,
[x IN xs WHERE NOT x IN ys] AS subtraction,
[x IN xs WHERE x IN ys] AS intersection
WITH [1, 1, 2, 2, 3] AS xs
RETURN
reduce(acc = [], x in xs |
acc + CASE x IN acc
WHEN false THEN [x]
ELSE []
END) AS unique
Get unique list in openCypher
DELTA QUERIES IN CYPHER
Delta queries are complex. Features that would be nice:
 Subqueries //pattern comprehensions go some length
 Named subqueries //help reusability
 Subtracting lists //related: CIR-2017-180
 Use collection elements or function results for matching
These are probably too much to ask.
-> recommended approach: compile directly to query plans.
MATCH (n)
WITH collect(n) AS ns
MATCH (ns[0])
RETURN *
MATCH (n)
WITH collect(n) AS ns
WITH ns[0] AS x
MATCH (x)
RETURN *

CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2 (Non-First Normal Form): maps, lists
 No schema (schema-optionality)
 Graph structure
Queries
 Nulls, antijoins and left outerjoins
 Updates on property values
 Aggregates on aggregates, non-distributive functions
 Ordering + skip/limit
 Reachability queries
CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2
 No schema
 Graph
Queries
 Nulls
 Updates
 Aggregates
 Ordering
 Reachability
A. Gupta, I. S. Mumick:
Materialized Views.
MIT Press, 1999
R. Chirkova, J. Yang:
Materialized Views.
Foundations and Trends
in Databases, 2012








Decades of research -> 2 long surveys
OUR SURVEY OF RELATED IVM TECHNIQUES
DBTOASTER
 Shows the scale of the problem
 Relational data model and SQL queries
 R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.
 Approach
o Queries over an algebraic ring
o Higher-order recursive IVM
 Compiler in OCaml
 Backend with code generation for C++, Scala/Spark
Christoph Koch et al.:
DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views.
VLDB Journal 2014
FUTURE DIRECTIONS
 Work out derivation rules for Expand/VarExpand, …
 Automate delta query derivation
 Integrate to Neo4j
 Run performance experiments
o Train Benchmark (set semantics)
o LDBC Social Network Benchmark’s BI workload (bag semantics)
Short news
LDBC BENCHMARKS
 Social Network Benchmark
o Business Intelligence workload published
o openCypher reference implementation
o Next goal: full conference paper
 Graphalytics
o Competition is online at graphalytics.org
o Neo4j implementation (using the Graph Algorithms library) WIP
graphalytics-platforms-neo4j/pull/6
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.:
An early look at the LDBC Social Network Benchmark’s BI Workload.
GRADES-NDA at SIGMOD, 2018
ldbc/ldbc_snb_implementations
GRAPH ANALYTICS ON THE PANAMA PAPERS
 Network science approach: multidimensional graph metrics
from social network analysis, biology, physics, etc.
 Our work originally targeted software and system models.
 Progress in 2018
o Q1: implemented adapters for Neo4j and CSV
o Q2 goal: analyse Panama papers and using metrics
Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró:
Towards the Characterization of Realistic Models:
Evaluation of Multidisciplinary Graph Metrics,
MODELS 2016 ftsrg/model-analyzer
MAPPING CYPHER TO SQL
 Evaluate graph queries in an RDB - similar to ORM
 Approaches
o Cytosm: Cypher to SQL Mapper / gTop: graph topology
o GraphGen – extracting graphs from RDBs
o Ongoing work to map TCK to SQLite
B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne:
Cytosm: Declarative Property Graph Queries Without Data Migration.
GRADES 2017
K. Xirogiannopoulos, V. Srinivas, A. Deshpande:
GraphGen: Adaptive Graph Processing using Relational Databases.
GRADES 2017
cytosm/cytosm
KonstantinosX/graphgen-project
NEO4J APOC LIBRARY
 CSV loader that follows the schema of the neo4j-import tool
 Goal
o Use headers to generate LOAD CSV commands.
o 1st pass: CALL apoc.import.csv.node(file, labels, …)
o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)
 Result
o Many corner cases -> ~700 LOC + tests
o Covers most use cases, but is very slow
o APOC PR pending
neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121
SCOPING FOR OPENCYPHER
p
p
x
c
c
slizaa-opencypher-xtext/issues/7
 Xtext grammar for the Slizaa software analysis workbench
 Progress in 2018
o Q1: scope analyser implemented for Cypher grammar of M05
o Q2 goal: update to M10
Incremental View Maintenance for openCypher Queries

Incremental View Maintenance for openCypher Queries

  • 1.
    Incremental View Maintenance foropenCypher Queries Gábor Szárnyas, József Marton 4th openCypher Implementers Meeting
  • 2.
    MODEL-DRIVEN ENGINEERING  Primarilyfor designing critical systems  Models are first class citizens during development o SysML / requirements, statecharts, etc. o Validation and code generation techniques for correctness Technology: Eclipse Modeling Framework (EMF)  Originally started at IBM as an implementation of the Object Management Group’s (OMG) Meta Object Facility (MOF).  i.e. an object-oriented model  i.e. a property graph-like structure with a metamodel
  • 3.
    MODEL VALIDATION  Implementedwith model queries  Models are typed, attributed graphs  Typical queries o Get two components connected by a particular edge MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s) o Check if two objects are reachable MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s) o Property checks MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y) Complex graph queries
  • 4.
    1 switch sensor C sensor B 2 sensorA segment route RAILWAY NETWORK MODEL
  • 5.
  • 6.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight» RAILWAY NETWORK MODEL
  • 7.
    :FOLLOWS :MONITORED_BY :TARGET:REQUIRES MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT(route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw sw: Switchsensor: Sensor route: Route swP: SwitchPosition G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Software and Systems Modeling, 2017
  • 8.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 9.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 10.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 11.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 12.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 13.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 14.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 15.
    segment segment sensor CsensorBsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  • 16.
    INCREMENTAL VIEW MAINTENANCE(IVM) In many use cases…  queries are static  data changes slowly -> views can be maintained incrementally Graph applications  model validation  simulation  recommendation systems  fraud detection
  • 17.
    INGRAPH: IVM ONPROPERTY GRAPHS Idea: map to relational algebra and use standard IVM techniques  Challenging aspects o Property graph data model o Cypher language  Formalise the language in relational algebra  Use nested relational algebra -> closed on operations Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks) Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
  • 18.
    INGRAPH / GRAPHTO NESTED RELATIONS
  • 19.
    INGRAPH / NESTEDRELATIONAL ALGEBRA OPS
  • 20.
    INGRAPH  ingraph usesa procedural IVM approach: the Rete algorithm. o Build caches for each operator o Maintain caches upon changes o Supports 15+ out of 25 LDBC BI queries o Details to be published in a conference paper o Extensible, but very heavy on memory  The rest of the talk focuses on the algebraic approach. Gábor Szárnyas, József Marton et al.: Incremental View Maintenance on Property Graphs. arXiv preprint will be available on the 1st week of June
  • 21.
  • 22.
    DELTA QUERIES ATA GLANCE 𝐺 evaluate query 𝑄 for each Δ𝐺 evaluate Δ𝑄 changes Δ𝐺1, Δ𝐺2, … 𝑄(𝐺) Δ𝑄(Δ𝐺1) Δ𝑄(Δ𝐺2) ⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ ) 𝑄 and Δ𝑄 are calculated by the same engine.
  • 23.
    IMPLEMENTATION: TRIGGERS INNEO4J  Event-driven programming in databases  Neo4j: TransactionEventHandler interface o afterCommit(TransactionData data, T state) o beforeCommit(TransactionData data) o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …  Only the updated state of the graph is accessible.  GraphAware framework: ImprovedTransactionData API o Get properties and labels/types of deleted elements Max de Marzi: Triggers in Neo4j. 2015 Michal Bachman: Neo4j Improved Transaction Event API. 2014
  • 24.
    DERIVING DELTA QUERIES ab 1 2 3 4 5 6 7 8 a b 1 2 5 6 7 8 𝑅 𝑚 Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which define positive and negative changes, respectively. But: most IVM techniques are defined for relational algebra. Notation:  𝑅 relation  Δ𝑅 positive changes  𝛻𝑅 negative changes  𝑅 𝑚 : maintained relation of 𝑅 ⇒ 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅  “−” denotes set minus (∖), “+” denotes set union (∪) a b 1 2 3 4 5 6 𝑅 𝛻𝑅 Δ𝑅
  • 25.
    RELATIONAL ALGEBRA FORCYPHER  Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.  Expand is essentially a natural join.  Natural join 𝑟 ⋈ 𝑠  Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠  Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠  Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls Andrés Taylor: Neo4j Cypher implementation. First openCypher Implementers Meeting, 2017
  • 26.
    v1 v2 v3 12 3 1 2 6 RELATIONAL ALGEBRA FOR CYPHER Natural join: 𝑟 ⋈ 𝑠 MATCH (v1)-[:r]->(v2)-[:s]->(v3) RETURN * Semijoin: 𝑟 ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE (v2)-[:s]->() Antijoin: 𝑟 ഥ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE NOT (v2)-[:s]->() Left outer join: 𝑟 ⟕ 𝑠 MATCH (v1)-[:r]->(v2) OPTIONAL MATCH (v2)-[:s]->(v3) 1 3 4 2 5 6 :r :r :s :s v1 v2 1 2 v1 v2 4 5 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 32 6 :r :s :s 1 2:r 4 5 :r 1 3 4 2 5 6 :r :r :s :s v1 v2 v3 1 2 3 1 2 6 4 5 null 4 5 v1 v2 v3 1 2 3 1 2 6
  • 27.
    T. Griffin, B.Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998 DERIVING DELTA QUERIES T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995 T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. TKDE 1997 X. Qian, G. Wiederhold: Incremental Recomputation of Active Relational Expressions. TKDE 1991
  • 28.
    DELTA QUERIES T. Griffin,L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995  The seminal paper  Δ/𝛻 delta queries for joins, selections, projections, etc.  Bag semantics
  • 29.
    DELTA QUERIES  Semijoins,antijoins, outer joins  Set semantics  Later publications, e.g. Zhou-Larson’s ICDE’07 paper improved these T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998
  • 30.
    EXAMPLE QUERY #1 MATCH(v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 . v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  • 31.
    EXAMPLE QUERY #1 MATCH(v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 $pcs -> pass lists of nodes/edges as parameters // This only works in embedded mode, see neo4j/issues/10239 v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  • 32.
    POSITIVE DELTA QUERYFOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 // r1 = a⋈b⋈Δc UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r2 = a⋈Δb⋈c UNWIND $pbs AS pb MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r3 = Δa⋈b⋈c UNWIND $pas AS pa MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4
  • 33.
    POSITIVE DELTA QUERYFOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 Long WITH chains are cumbersome -> patterns+list comprehensions. WITH [pc IN $pcs | // r1 = a⋈b⋈Δc [(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) | [v1, v2, v3, v4]]] [pb IN $pbs | // r2 = a⋈Δb⋈c [(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] + [pa IN $pas | // r3 = Δa⋈b⋈c [(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] AS r RETURN r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
  • 34.
    EXAMPLE QUERY #2 MATCH(route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4 v1 v2 v3v4 :b :c :d :a :FOLLOWS :MONITORED_BY :TARGET:REQUIRES sw: Switchsensor: Sensor route: Route swP: SwitchPosition
  • 35.
    NEGATIVE CONDITIONS v1 v2 v3v4 :b :c :d :a MATCH(v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 ⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  • 36.
    DELTA QUERIES FORJOINS AND ANTIJOINS Natural join  Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚 + 𝑆 𝑚 ⋈ Δ𝑇  𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇 Antijoin  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 Expression 2Expression 1 Only 𝑆 𝑚 and 𝑇 𝑚 are available.
  • 37.
    SUBEXPRESSIONS 1. Δ𝑇 ഥ⋉𝑇 = ?  R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.  𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2  If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃) becomes a natural join, which is an intersection for relations with the same schema.  𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2  𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2 ⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 2. 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  • 38.
    DELTAS FOR ANTIJOINS Basedon Griffin-Kumar’s ’98 paper.  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇 ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  • 39.
    NEGATIVE CONDITIONS 𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 Pushdown 𝛻𝑑: 𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 v1 v2 v3v4 :b :c :d :a
  • 40.
    NEGATIVE CONDITIONS Δ ⋈⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ v1 v2 v3v4 :b :c :d :a  ,  ,  ,  , ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉
  • 41.
    NEGATIVE CONDITIONS ⋉ ∈. , where ∩ is a single vertex, because and represent edges. Δ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ R1 R2 R3 R4 R5 R6 R7 ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉ S1 S2 S3 S4 v1 v2 v3v4 :b :c :d :a  ,  ,  ,  ,
  • 42.
    WITH [] AS pes,[] AS nes WITH [pe IN pes WHERE type(pe) = 'a'|pe] AS pas, [pe IN pes WHERE type(pe) = 'b'|pe] AS pbs, [pe IN pes WHERE type(pe) = 'c'|pe] AS pcs, [pe IN pes WHERE type(pe) = 'd'|pe] AS pds, [ne IN nes WHERE type(ne) = 'a'|ne] AS nas, [ne IN nes WHERE type(ne) = 'b'|ne] AS nbs, [ne IN nes WHERE type(ne) = 'c'|ne] AS ncs, [ne IN nes WHERE type(ne) = 'd'|ne] AS nds WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, [nd IN nds | startNode(nd)] AS nd_v1s, [nd IN nds | endNode(nd)] AS nd_v4s // calculating s1s...s4s // s1s: (𝑎⋉∇𝑑) UNWIND nd_v1s AS v1 MATCH (v1)-[:a]->(v2) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, collect({v1: v1, v2: v2}) AS s1s // s2s: (Δ𝑎⋉∇𝑑) UNWIND pas AS pa MATCH (v1)-[pa]->(v2) WHERE v1 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, collect({v1: v1, v2: v2}) AS s2s // s3s: (𝑐⋉∇𝑑) UNWIND nd_v4s AS v4 MATCH (v3)-[:c]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s,note collect({v3: v3, v4: v4}) AS s3s // s4s: (Δ𝑐⋉∇𝑑) UNWIND pcs AS pc MATCH (v3)-[pc]->(v4) WHERE v4 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, collect({v3: v3, v4: v4}) AS s4s // calculating r1...r7 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s // r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, collect([v1, v2, v3, v4]) AS r1 // r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s4s AS s4 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, collect([v1, v2, v3, v4]) AS r2 // r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, collect([v1, v2, v3, v4]) AS r3 // r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s2s AS s2 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, collect([v1, v2, v3, v4]) AS r4 // r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑 UNWIND pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, collect([v1, v2, v3, v4]) AS r5 // r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pbs AS pb MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, collect([v1, v2, v3, v4]) AS r6 // r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pas AS pa MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, r6, collect([v1, v2, v3, v4]) AS r7 WITH r1 + r5 + r6 + r7 AS rp, r2 + r3 + r4 AS rn RETURN [r IN rp WHERE NOT r IN rn] AS results POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  • 43.
    POSITIVE DELTA QUERYFOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑  Workaround: knowing the change workload helps o Only consider changes in Δ𝑑 and 𝛻𝑑 o Query is cleaner and much more efficient UNWIND $nds AS nd WITH startNode(nd) AS v1, endNode(nd) AS v4 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4  This can outperform recomputing the query from scratch.
  • 44.
    LIST OPERATIONS INCYPHER Instead of subqueries, use chained queries and combine lists. WITH [1, 1, 2, 2, 3] AS xs UNWIND xs AS x RETURN collect(DISTINCT x) AS unique WITH [1, 2, 3] AS xs, [2] AS ys RETURN xs + ys AS append, [x IN xs WHERE NOT x IN ys] AS subtraction, [x IN xs WHERE x IN ys] AS intersection WITH [1, 1, 2, 2, 3] AS xs RETURN reduce(acc = [], x in xs | acc + CASE x IN acc WHEN false THEN [x] ELSE [] END) AS unique Get unique list in openCypher
  • 45.
    DELTA QUERIES INCYPHER Delta queries are complex. Features that would be nice:  Subqueries //pattern comprehensions go some length  Named subqueries //help reusability  Subtracting lists //related: CIR-2017-180  Use collection elements or function results for matching These are probably too much to ask. -> recommended approach: compile directly to query plans. MATCH (n) WITH collect(n) AS ns MATCH (ns[0]) RETURN * MATCH (n) WITH collect(n) AS ns WITH ns[0] AS x MATCH (x) RETURN * 
  • 46.
    CHALLENGES FOR PROPERTYGRAPH QUERIES Data model  NF2 (Non-First Normal Form): maps, lists  No schema (schema-optionality)  Graph structure Queries  Nulls, antijoins and left outerjoins  Updates on property values  Aggregates on aggregates, non-distributive functions  Ordering + skip/limit  Reachability queries
  • 47.
    CHALLENGES FOR PROPERTYGRAPH QUERIES Data model  NF2  No schema  Graph Queries  Nulls  Updates  Aggregates  Ordering  Reachability A. Gupta, I. S. Mumick: Materialized Views. MIT Press, 1999 R. Chirkova, J. Yang: Materialized Views. Foundations and Trends in Databases, 2012         Decades of research -> 2 long surveys
  • 48.
    OUR SURVEY OFRELATED IVM TECHNIQUES
  • 49.
    DBTOASTER  Shows thescale of the problem  Relational data model and SQL queries  R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.  Approach o Queries over an algebraic ring o Higher-order recursive IVM  Compiler in OCaml  Backend with code generation for C++, Scala/Spark Christoph Koch et al.: DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. VLDB Journal 2014
  • 50.
    FUTURE DIRECTIONS  Workout derivation rules for Expand/VarExpand, …  Automate delta query derivation  Integrate to Neo4j  Run performance experiments o Train Benchmark (set semantics) o LDBC Social Network Benchmark’s BI workload (bag semantics)
  • 51.
  • 52.
    LDBC BENCHMARKS  SocialNetwork Benchmark o Business Intelligence workload published o openCypher reference implementation o Next goal: full conference paper  Graphalytics o Competition is online at graphalytics.org o Neo4j implementation (using the Graph Algorithms library) WIP graphalytics-platforms-neo4j/pull/6 Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 ldbc/ldbc_snb_implementations
  • 53.
    GRAPH ANALYTICS ONTHE PANAMA PAPERS  Network science approach: multidimensional graph metrics from social network analysis, biology, physics, etc.  Our work originally targeted software and system models.  Progress in 2018 o Q1: implemented adapters for Neo4j and CSV o Q2 goal: analyse Panama papers and using metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró: Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics, MODELS 2016 ftsrg/model-analyzer
  • 54.
    MAPPING CYPHER TOSQL  Evaluate graph queries in an RDB - similar to ORM  Approaches o Cytosm: Cypher to SQL Mapper / gTop: graph topology o GraphGen – extracting graphs from RDBs o Ongoing work to map TCK to SQLite B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne: Cytosm: Declarative Property Graph Queries Without Data Migration. GRADES 2017 K. Xirogiannopoulos, V. Srinivas, A. Deshpande: GraphGen: Adaptive Graph Processing using Relational Databases. GRADES 2017 cytosm/cytosm KonstantinosX/graphgen-project
  • 55.
    NEO4J APOC LIBRARY CSV loader that follows the schema of the neo4j-import tool  Goal o Use headers to generate LOAD CSV commands. o 1st pass: CALL apoc.import.csv.node(file, labels, …) o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)  Result o Many corner cases -> ~700 LOC + tests o Covers most use cases, but is very slow o APOC PR pending neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121
  • 56.
    SCOPING FOR OPENCYPHER p p x c c slizaa-opencypher-xtext/issues/7 Xtext grammar for the Slizaa software analysis workbench  Progress in 2018 o Q1: scope analyser implemented for Cypher grammar of M05 o Q2 goal: update to M10