Incremental View Maintenance for openCypher Queries

Incremental View Maintenance
for openCypher Queries
Gábor Szárnyas, József Marton
4th openCypher Implementers Meeting

MODEL-DRIVEN ENGINEERING
 Primarily for designing critical systems
 Models are first class citizens during development
o SysML / requirements, statecharts, etc.
o Validation and code generation techniques for correctness
Technology:
Eclipse Modeling Framework (EMF)
 Originally started at IBM as an implementation of the Object
Management Group’s (OMG) Meta Object Facility (MOF).
 i.e. an object-oriented model
 i.e. a property graph-like structure with a metamodel

MODEL VALIDATION
 Implemented with model queries
 Models are typed, attributed graphs
 Typical queries
o Get two components connected by a particular edge
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s)
o Check if two objects are reachable
MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s)
o Property checks
MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)
Complex graph queries

1
switch
sensor C
sensor B
2
sensor A
segment
route
RAILWAY NETWORK MODEL

segment
segment
segmentswitch
sensor C
sensor B
sensor A
route 2
route 1

segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»

:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
sw: Switchsensor: Sensor
route: Route swP: SwitchPosition
G. Szárnyas, B. Izsó, I. Ráth, D. Varró:
The Train Benchmark: cross-technology performance
evaluation of continuous model queries.
Software and Systems Modeling, 2017

segment segment
sensor Csensor Bsensor A
route 2route 1
switch
segment
switchPosition
«diverging»
switchPosition
«straight»

INCREMENTAL VIEW MAINTENANCE (IVM)
In many use cases…
 queries are static
 data changes slowly
-> views can be maintained incrementally
Graph applications
 model validation
 simulation
 recommendation systems
 fraud detection

INGRAPH: IVM ON PROPERTY GRAPHS
Idea: map to relational algebra and use standard IVM techniques
 Challenging aspects
o Property graph data model
o Cypher language
 Formalise the language in relational algebra
 Use nested relational algebra -> closed on operations
Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks)
Gábor Szárnyas, József Marton, Dániel Varró:
Formalising openCypher Graph Queries in Relational Algebra.
ADBIS 2017

INGRAPH / GRAPH TO NESTED RELATIONS

INGRAPH / NESTED RELATIONAL ALGEBRA OPS

INGRAPH
 ingraph uses a procedural IVM approach: the Rete algorithm.
o Build caches for each operator
o Maintain caches upon changes
o Supports 15+ out of 25 LDBC BI queries
o Details to be published in a conference paper
o Extensible, but very heavy on memory
 The rest of the talk focuses on the algebraic approach.
Gábor Szárnyas, József Marton et al.:
Incremental View Maintenance on Property Graphs.
arXiv preprint will be available on the 1st week of June

DELTA QUERIES AT A GLANCE
𝐺
evaluate query 𝑄
for each Δ𝐺
evaluate Δ𝑄
changes
Δ𝐺1, Δ𝐺2, …
𝑄(𝐺)
Δ𝑄(Δ𝐺1)
Δ𝑄(Δ𝐺2)
⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ )
𝑄 and Δ𝑄 are calculated by the same engine.

IMPLEMENTATION: TRIGGERS IN NEO4J
 Event-driven programming in databases
 Neo4j: TransactionEventHandler interface
o afterCommit(TransactionData data, T state)
o beforeCommit(TransactionData data)
o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …
 Only the updated state of the graph is accessible.
 GraphAware framework: ImprovedTransactionData API
o Get properties and labels/types of deleted elements
Max de Marzi:
Triggers in Neo4j.
2015
Michal Bachman:
Neo4j Improved Transaction Event API.
2014

DERIVING DELTA QUERIES
a b
1 2
3 4
5 6
7 8
a b
1 2
5 6
7 8
𝑅 𝑚
Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which
define positive and negative changes, respectively.
But: most IVM techniques are defined for relational algebra.
Notation:
 𝑅 relation
 Δ𝑅 positive changes
 𝛻𝑅 negative changes
 𝑅 𝑚
: maintained relation of 𝑅 ⇒ 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅
 “−” denotes set minus (∖), “+” denotes set union (∪)
a b
1 2
3 4
5 6
𝑅 𝛻𝑅
Δ𝑅

RELATIONAL ALGEBRA FOR CYPHER
 Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.
 Expand is essentially a natural join.
 Natural join 𝑟 ⋈ 𝑠
 Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠
 Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠
 Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls
Andrés Taylor:
Neo4j Cypher implementation.
First openCypher Implementers Meeting, 2017

v1 v2 v3
1 2 3
1 2 6
RELATIONAL ALGEBRA FOR CYPHER
Natural join: 𝑟 ⋈ 𝑠
MATCH (v1)-[:r]->(v2)-[:s]->(v3)
RETURN *
Semijoin: 𝑟 ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE (v2)-[:s]->()
Antijoin: 𝑟 ഥ⋉ 𝑠
MATCH (v1)-[:r]->(v2)
WHERE NOT (v2)-[:s]->()
Left outer join: 𝑟 ⟕ 𝑠
MATCH (v1)-[:r]->(v2)
OPTIONAL MATCH (v2)-[:s]->(v3)
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2
1 2
v1 v2
4 5
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 3
4
2
5 6
:r
:r
:s
:s
1 32
6
:r :s
:s
1 2:r
4 5
:r
1 3
4
2
5 6
:r
:r
:s
:s
v1 v2 v3
1 2 3
1 2 6
4 5 null
4 5
v1 v2 v3
1 2 3
1 2 6

T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998
DERIVING DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the
Incremental Recomputation of Active Relational Expressions.
TKDE 1997
X. Qian, G. Wiederhold:
Incremental Recomputation of Active Relational Expressions.
TKDE 1991

DELTA QUERIES
T. Griffin, L. Libkin:
Incremental Maintenance of Views with Duplicates.
SIGMOD 1995
 The seminal paper
 Δ/𝛻 delta queries
for joins, selections,
projections, etc.
 Bag semantics

DELTA QUERIES
 Semijoins, antijoins,
outer joins
 Set semantics
 Later publications,
e.g. Zhou-Larson’s
ICDE’07 paper
improved these
T. Griffin, B. Kumar:
Algebraic Change Propagation for Semijoin and Outerjoin Queries.
SIGMOD Record 1998

EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
RETURN v1, v2, v3, v4
Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
= 𝑎 ⋈ 𝑏 𝑚
⋈ Δ𝑐 + 𝑎 𝑚
⋈ Δ𝑏 ⋈ 𝑐 𝑚
+ Δ𝑎 ⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 .
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c

EXAMPLE QUERY #1
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐
= 𝑎 𝑚
⋈ 𝑏 𝑚
⋈ 𝑐 𝑚
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
$pcs -> pass lists of nodes/edges as parameters
// This only works in embedded mode, see neo4j/issues/10239
v1 v2 v3
:b:a
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
v4
:c

POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
// r1 = a⋈b⋈Δc
UNWIND $pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4)
UNION ALL
// r2 = a⋈Δb⋈c
UNWIND $pbs AS pb
MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4)
UNION ALL
// r3 = Δa⋈b⋈c
UNWIND $pas AS pa
MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4)

POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐
Long WITH chains are cumbersome -> patterns+list comprehensions.
WITH [pc IN $pcs | // r1 = a⋈b⋈Δc
[(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) |
[v1, v2, v3, v4]]]
[pb IN $pbs | // r2 = a⋈Δb⋈c
[(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] +
[pa IN $pas | // r3 = Δa⋈b⋈c
[(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) |
[v1, v2, v3, v4]]] AS r
RETURN
r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4

EXAMPLE QUERY #2
MATCH (route:Route)
-[:FOLLOWS]->(swP:SwitchPosition)
-[:TARGET]->(sw:Switch)
-[:MONITORED_BY]->(sensor:Sensor)
WHERE NOT (route)-[:REQUIRES]->(sensor)
RETURN route, sensor, swP, sw
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
WHERE NOT (v1)-[:d]->(v4)
v1 v2
v3v4
:b
:c
:d
:a
:FOLLOWS
:MONITORED_BY
:TARGET:REQUIRES
sw: Switchsensor: Sensor
route: Route
swP:
SwitchPosition

NEGATIVE CONDITIONS
v1 v2
v3v4
:b
:c
:d
:a
MATCH (v1)
-[:a]->(v2)
-[:b]->(v3)
-[:c]->(v4)
 𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑

DELTA QUERIES FOR JOINS AND ANTIJOINS
Natural join
 Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚
+ 𝑆 𝑚
⋈ Δ𝑇
 𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇
Antijoin
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
Expression 2Expression 1
Only 𝑆 𝑚 and 𝑇 𝑚
are available.

SUBEXPRESSIONS
1. Δ𝑇 ഥ⋉ 𝑇 = ?
 R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.
 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2
 If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃)
becomes a natural join, which is an intersection for relations
with the same schema.
 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2
 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2
⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
2. 𝑅 𝑚
= 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅

DELTAS FOR ANTIJOINS
Based on Griffin-Kumar’s ’98 paper.
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚
+ Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗
𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 =
∗∗
𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
 𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚
− Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚
− 𝛻𝑇
∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2
∗∗ 𝑅 = 𝑅 𝑚
− Δ𝑅 + 𝛻𝑅

NEGATIVE CONDITIONS  𝑎 𝑣1, 𝑣2
 𝑏 𝑣2, 𝑣3
 𝑐 𝑣3, 𝑣4
 𝑑 𝑣1, 𝑣4
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚
𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚
⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑
Pushdown 𝛻𝑑:
𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑
𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚
v1 v2
v3v4
:b
:c
:d
:a

NEGATIVE CONDITIONS
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉

NEGATIVE CONDITIONS
⋉ ∈ . , where ∩ is a
single vertex, because and represent edges.
Δ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉
⋉ ∈ .
Δ ⋉ ∈ . Δ
⋉ ∈ .
Δ ⋉ ∈ . Δ
R1
R2
R3
R4
R5
R6
R7
⋉ ⋈ ⋈ ⋉
⋉ ⋈ ⋈ Δ ⋉
⋉ ⋈ Δ ⋈ ⋉
Δ ⋉ ⋈ ⋈ ⋉
⋈ ⋈ Δ ⋉
⋈ Δ ⋈ ⋉
Δ ⋈ ⋈ ⋉
S1
S2
S3
S4
v1 v2
v3v4
:b
:c
:d
:a  ,
 ,
 ,
 ,

WITH
[] AS pes, [] AS nes
WITH
[pe IN pes WHERE type(pe) = 'a'|pe] AS pas,
[pe IN pes WHERE type(pe) = 'b'|pe] AS pbs,
[pe IN pes WHERE type(pe) = 'c'|pe] AS pcs,
[pe IN pes WHERE type(pe) = 'd'|pe] AS pds,
[ne IN nes WHERE type(ne) = 'a'|ne] AS nas,
[ne IN nes WHERE type(ne) = 'b'|ne] AS nbs,
[ne IN nes WHERE type(ne) = 'c'|ne] AS ncs,
[ne IN nes WHERE type(ne) = 'd'|ne] AS nds
WITH
pas, nas, pbs, nbs, pcs, ncs, pds, nds,
[nd IN nds | startNode(nd)] AS nd_v1s,
[nd IN nds | endNode(nd)] AS nd_v4s
// calculating s1s...s4s
// s1s: (𝑎⋉∇𝑑)
UNWIND nd_v1s AS v1
MATCH (v1)-[:a]->(v2)
WITH
nd_v1s, nd_v4s,
collect({v1: v1, v2: v2}) AS s1s
// s2s: (Δ𝑎⋉∇𝑑)
UNWIND pas AS pa
MATCH (v1)-[pa]->(v2)
WHERE v1 IN nd_v1s
WITH
nd_v1s, nd_v4s,
s1s,
// s3s: (𝑐⋉∇𝑑)
UNWIND nd_v4s AS v4
MATCH (v3)-[:c]->(v4)
WITH
nd_v1s, nd_v4s,
s1s, s2s,note
// s4s: (Δ𝑐⋉∇𝑑)
UNWIND pcs AS pc
MATCH (v3)-[pc]->(v4)
WHERE v4 IN nd_v1s
WITH
nd_v1s, nd_v4s,
s1s, s2s, s3s,
// calculating r1...r7
WITH
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s
// r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
nd_v1s, nd_v4s, s1s, s2s, s3s, s4s,
s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4
AS v4
WHERE (v2)-[:b]->(v3)
WITH
collect([v1, v2, v3, v4]) AS r1
// r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s4s AS s4
WITH
r1,
AS v4
WHERE (v2)-[:b]->(v3)
WITH
r1,
// r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s1s AS s1
UNWIND s3s AS s3
WITH
r1, r2,
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
r1, r2,
// r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑)
UNWIND s2s AS s2
UNWIND s3s AS s3
WITH
r1, r2, r3,
AS v4
MATCH (v2)-[b:b]->(v3)
WHERE b IN pbs
WITH
r1, r2, r3,
// r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑
UNWIND pcs AS pc
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4)
WITH
r1, r2, r3, r4,
// r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pbs AS pb
MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4)
WITH
r1, r2, r3, r4, r5,
// r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑
UNWIND pas AS pa
MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4)
WITH
r1, r2, r3, r4, r5, r6,
WITH
r1 + r5 + r6 + r7 AS rp,
r2 + r3 + r4 AS rn
RETURN
[r IN rp WHERE NOT r IN rn] AS results
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑

POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
 Workaround: knowing the change workload helps
o Only consider changes in Δ𝑑 and 𝛻𝑑
o Query is cleaner and much more efficient
UNWIND $nds AS nd
WITH
startNode(nd) AS v1,
endNode(nd) AS v4
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)
 This can outperform recomputing the query from scratch.

LIST OPERATIONS IN CYPHER
Instead of subqueries, use chained queries and combine lists.
WITH [1, 1, 2, 2, 3] AS xs
UNWIND xs AS x
RETURN
collect(DISTINCT x) AS unique
WITH [1, 2, 3] AS xs, [2] AS ys
RETURN
xs + ys AS append,
[x IN xs WHERE NOT x IN ys] AS subtraction,
[x IN xs WHERE x IN ys] AS intersection
WITH [1, 1, 2, 2, 3] AS xs
RETURN
reduce(acc = [], x in xs |
acc + CASE x IN acc
WHEN false THEN [x]
ELSE []
END) AS unique
Get unique list in openCypher

DELTA QUERIES IN CYPHER
Delta queries are complex. Features that would be nice:
 Subqueries //pattern comprehensions go some length
 Named subqueries //help reusability
 Subtracting lists //related: CIR-2017-180
 Use collection elements or function results for matching
These are probably too much to ask.
-> recommended approach: compile directly to query plans.
MATCH (n)
WITH collect(n) AS ns
MATCH (ns[0])
RETURN *
MATCH (n)
WITH collect(n) AS ns
WITH ns[0] AS x
MATCH (x)
RETURN *


CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2 (Non-First Normal Form): maps, lists
 No schema (schema-optionality)
 Graph structure
Queries
 Nulls, antijoins and left outerjoins
 Updates on property values
 Aggregates on aggregates, non-distributive functions
 Ordering + skip/limit
 Reachability queries

CHALLENGES FOR PROPERTY GRAPH QUERIES
Data model
 NF2
 No schema
 Graph
Queries
 Nulls
 Updates
 Aggregates
 Ordering
 Reachability
A. Gupta, I. S. Mumick:
Materialized Views.
MIT Press, 1999
R. Chirkova, J. Yang:
Materialized Views.
Foundations and Trends
in Databases, 2012








Decades of research -> 2 long surveys

OUR SURVEY OF RELATED IVM TECHNIQUES

DBTOASTER
 Shows the scale of the problem
 Relational data model and SQL queries
 R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.
 Approach
o Queries over an algebraic ring
o Higher-order recursive IVM
 Compiler in OCaml
 Backend with code generation for C++, Scala/Spark
Christoph Koch et al.:
DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views.
VLDB Journal 2014

FUTURE DIRECTIONS
 Work out derivation rules for Expand/VarExpand, …
 Automate delta query derivation
 Integrate to Neo4j
 Run performance experiments
o Train Benchmark (set semantics)
o LDBC Social Network Benchmark’s BI workload (bag semantics)

LDBC BENCHMARKS
 Social Network Benchmark
o Business Intelligence workload published
o openCypher reference implementation
o Next goal: full conference paper
 Graphalytics
o Competition is online at graphalytics.org
o Neo4j implementation (using the Graph Algorithms library) WIP
graphalytics-platforms-neo4j/pull/6
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.:
An early look at the LDBC Social Network Benchmark’s BI Workload.
GRADES-NDA at SIGMOD, 2018
ldbc/ldbc_snb_implementations

GRAPH ANALYTICS ON THE PANAMA PAPERS
 Network science approach: multidimensional graph metrics
from social network analysis, biology, physics, etc.
 Our work originally targeted software and system models.
 Progress in 2018
o Q1: implemented adapters for Neo4j and CSV
o Q2 goal: analyse Panama papers and using metrics
Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró:
Towards the Characterization of Realistic Models:
Evaluation of Multidisciplinary Graph Metrics,
MODELS 2016 ftsrg/model-analyzer

MAPPING CYPHER TO SQL
 Evaluate graph queries in an RDB - similar to ORM
 Approaches
o Cytosm: Cypher to SQL Mapper / gTop: graph topology
o GraphGen – extracting graphs from RDBs
o Ongoing work to map TCK to SQLite
B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne:
Cytosm: Declarative Property Graph Queries Without Data Migration.
GRADES 2017
K. Xirogiannopoulos, V. Srinivas, A. Deshpande:
GraphGen: Adaptive Graph Processing using Relational Databases.
GRADES 2017
cytosm/cytosm
KonstantinosX/graphgen-project

NEO4J APOC LIBRARY
 CSV loader that follows the schema of the neo4j-import tool
 Goal
o Use headers to generate LOAD CSV commands.
o 1st pass: CALL apoc.import.csv.node(file, labels, …)
o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)
 Result
o Many corner cases -> ~700 LOC + tests
o Covers most use cases, but is very slow
o APOC PR pending
neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121

SCOPING FOR OPENCYPHER
p
p
x
c
c
slizaa-opencypher-xtext/issues/7
 Xtext grammar for the Slizaa software analysis workbench
 Progress in 2018
o Q1: scope analyser implemented for Cypher grammar of M05
o Q2 goal: update to M10

Incremental View Maintenance for openCypher Queries

Incremental View Maintenance for openCypher Queries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Incremental View Maintenance for openCypher Queries

Similar to Incremental View Maintenance for openCypher Queries (20)

More from openCypher

More from openCypher (20)

Recently uploaded

Recently uploaded (20)

Incremental View Maintenance for openCypher Queries