Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Incremental View Maintenance for openCypher Queries

120 views

Published on

Presented at the Fourth openCypher Implementers Meeting

Numerous graph use cases require continuous evaluation of queries over a constantly changing data set, e.g. fraud detection in financial systems, recommendations, and checking integrity constraints. For relational systems, incremental view maintenance has been researched for three decades, resulting in a wide body of literature. The property graph data model and the openCypher language, however, are recent developments, and therefore lack established techniques to perform efficient view maintenance. In this talk, we give an overview of the view maintenance problem for property graphs, discuss why it is particularly difficult and present an approach that tackles a meaningful subset of the language.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Incremental View Maintenance for openCypher Queries

  1. 1. Incremental View Maintenance for openCypher Queries Gábor Szárnyas, József Marton 4th openCypher Implementers Meeting
  2. 2. MODEL-DRIVEN ENGINEERING  Primarily for designing critical systems  Models are first class citizens during development o SysML / requirements, statecharts, etc. o Validation and code generation techniques for correctness Technology: Eclipse Modeling Framework (EMF)  Originally started at IBM as an implementation of the Object Management Group’s (OMG) Meta Object Facility (MOF).  i.e. an object-oriented model  i.e. a property graph-like structure with a metamodel
  3. 3. MODEL VALIDATION  Implemented with model queries  Models are typed, attributed graphs  Typical queries o Get two components connected by a particular edge MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s) o Check if two objects are reachable MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s) o Property checks MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y) Complex graph queries
  4. 4. 1 switch sensor C sensor B 2 sensor A segment route RAILWAY NETWORK MODEL
  5. 5. RAILWAY NETWORK MODEL segment segment segmentswitch sensor C sensor B sensor A route 2 route 1
  6. 6. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight» RAILWAY NETWORK MODEL
  7. 7. :FOLLOWS :MONITORED_BY :TARGET:REQUIRES MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw sw: Switchsensor: Sensor route: Route swP: SwitchPosition G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Software and Systems Modeling, 2017
  8. 8. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  9. 9. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  10. 10. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  11. 11. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  12. 12. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  13. 13. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  14. 14. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  15. 15. segment segment sensor Csensor Bsensor A route 2route 1 switch segment switchPosition «diverging» switchPosition «straight»
  16. 16. INCREMENTAL VIEW MAINTENANCE (IVM) In many use cases…  queries are static  data changes slowly -> views can be maintained incrementally Graph applications  model validation  simulation  recommendation systems  fraud detection
  17. 17. INGRAPH: IVM ON PROPERTY GRAPHS Idea: map to relational algebra and use standard IVM techniques  Challenging aspects o Property graph data model o Cypher language  Formalise the language in relational algebra  Use nested relational algebra -> closed on operations Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks) Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
  18. 18. INGRAPH / GRAPH TO NESTED RELATIONS
  19. 19. INGRAPH / NESTED RELATIONAL ALGEBRA OPS
  20. 20. INGRAPH  ingraph uses a procedural IVM approach: the Rete algorithm. o Build caches for each operator o Maintain caches upon changes o Supports 15+ out of 25 LDBC BI queries o Details to be published in a conference paper o Extensible, but very heavy on memory  The rest of the talk focuses on the algebraic approach. Gábor Szárnyas, József Marton et al.: Incremental View Maintenance on Property Graphs. arXiv preprint will be available on the 1st week of June
  21. 21. Delta Queries for openCypher
  22. 22. DELTA QUERIES AT A GLANCE 𝐺 evaluate query 𝑄 for each Δ𝐺 evaluate Δ𝑄 changes Δ𝐺1, Δ𝐺2, … 𝑄(𝐺) Δ𝑄(Δ𝐺1) Δ𝑄(Δ𝐺2) ⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ ) 𝑄 and Δ𝑄 are calculated by the same engine.
  23. 23. IMPLEMENTATION: TRIGGERS IN NEO4J  Event-driven programming in databases  Neo4j: TransactionEventHandler interface o afterCommit(TransactionData data, T state) o beforeCommit(TransactionData data) o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …  Only the updated state of the graph is accessible.  GraphAware framework: ImprovedTransactionData API o Get properties and labels/types of deleted elements Max de Marzi: Triggers in Neo4j. 2015 Michal Bachman: Neo4j Improved Transaction Event API. 2014
  24. 24. DERIVING DELTA QUERIES a b 1 2 3 4 5 6 7 8 a b 1 2 5 6 7 8 𝑅 𝑚 Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which define positive and negative changes, respectively. But: most IVM techniques are defined for relational algebra. Notation:  𝑅 relation  Δ𝑅 positive changes  𝛻𝑅 negative changes  𝑅 𝑚 : maintained relation of 𝑅 ⇒ 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅  “−” denotes set minus (∖), “+” denotes set union (∪) a b 1 2 3 4 5 6 𝑅 𝛻𝑅 Δ𝑅
  25. 25. RELATIONAL ALGEBRA FOR CYPHER  Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.  Expand is essentially a natural join.  Natural join 𝑟 ⋈ 𝑠  Semijoin 𝑟 ⋉ 𝑠 = 𝜋 𝑅 𝑟 ⋈ 𝑠  Antijoin 𝑟 ഥ⋉ 𝑠 = 𝑟 ∖ 𝑟 ⋉ 𝑠  Left outer join 𝑟 ⟕ 𝑠 ≅ 𝑟 ⋈ 𝑠 ∪ 𝑟 ഥ⋉ 𝑠 //plus nulls Andrés Taylor: Neo4j Cypher implementation. First openCypher Implementers Meeting, 2017
  26. 26. v1 v2 v3 1 2 3 1 2 6 RELATIONAL ALGEBRA FOR CYPHER Natural join: 𝑟 ⋈ 𝑠 MATCH (v1)-[:r]->(v2)-[:s]->(v3) RETURN * Semijoin: 𝑟 ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE (v2)-[:s]->() Antijoin: 𝑟 ഥ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE NOT (v2)-[:s]->() Left outer join: 𝑟 ⟕ 𝑠 MATCH (v1)-[:r]->(v2) OPTIONAL MATCH (v2)-[:s]->(v3) 1 3 4 2 5 6 :r :r :s :s v1 v2 1 2 v1 v2 4 5 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 3 4 2 5 6 :r :r :s :s 1 32 6 :r :s :s 1 2:r 4 5 :r 1 3 4 2 5 6 :r :r :s :s v1 v2 v3 1 2 3 1 2 6 4 5 null 4 5 v1 v2 v3 1 2 3 1 2 6
  27. 27. T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998 DERIVING DELTA QUERIES T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995 T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. TKDE 1997 X. Qian, G. Wiederhold: Incremental Recomputation of Active Relational Expressions. TKDE 1991
  28. 28. DELTA QUERIES T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995  The seminal paper  Δ/𝛻 delta queries for joins, selections, projections, etc.  Bag semantics
  29. 29. DELTA QUERIES  Semijoins, antijoins, outer joins  Set semantics  Later publications, e.g. Zhou-Larson’s ICDE’07 paper improved these T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998
  30. 30. EXAMPLE QUERY #1 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 . v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  31. 31. EXAMPLE QUERY #1 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 = 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 $pcs -> pass lists of nodes/edges as parameters // This only works in embedded mode, see neo4j/issues/10239 v1 v2 v3 :b:a  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4 v4 :c
  32. 32. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 // r1 = a⋈b⋈Δc UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r2 = a⋈Δb⋈c UNWIND $pbs AS pb MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r3 = Δa⋈b⋈c UNWIND $pas AS pa MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4
  33. 33. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 Long WITH chains are cumbersome -> patterns+list comprehensions. WITH [pc IN $pcs | // r1 = a⋈b⋈Δc [(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) | [v1, v2, v3, v4]]] [pb IN $pbs | // r2 = a⋈Δb⋈c [(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] + [pa IN $pas | // r3 = Δa⋈b⋈c [(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] AS r RETURN r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
  34. 34. EXAMPLE QUERY #2 MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4 v1 v2 v3v4 :b :c :d :a :FOLLOWS :MONITORED_BY :TARGET:REQUIRES sw: Switchsensor: Sensor route: Route swP: SwitchPosition
  35. 35. NEGATIVE CONDITIONS v1 v2 v3v4 :b :c :d :a MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 ⇒ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  36. 36. DELTA QUERIES FOR JOINS AND ANTIJOINS Natural join  Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚 + 𝑆 𝑚 ⋈ Δ𝑇  𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇 Antijoin  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 Expression 2Expression 1 Only 𝑆 𝑚 and 𝑇 𝑚 are available.
  37. 37. SUBEXPRESSIONS 1. Δ𝑇 ഥ⋉ 𝑇 = ?  R1 ഥ⋉ 𝑅2, where 𝑅1 and 𝑅2 both have schema 𝑅.  𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝜋 𝑅 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2  If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈ 𝜃) becomes a natural join, which is an intersection for relations with the same schema.  𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2  𝑅1 − 𝑅1 ⋈ 𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2 ⇒ ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 2. 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  38. 38. DELTAS FOR ANTIJOINS Based on Griffin-Kumar’s ’98 paper.  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ഥ⋉ 𝑇 𝑚 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  Δ 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ഥ⋉ 𝑇 𝑚  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ഥ⋉ 𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗ 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = ∗∗ 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇  𝛻 𝑆 ഥ⋉ 𝑇 = 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ഥ⋉ 𝑇 𝑚 − 𝛻𝑇 ∗ 𝑅1 ഥ⋉ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
  39. 39. NEGATIVE CONDITIONS  𝑎 𝑣1, 𝑣2  𝑏 𝑣2, 𝑣3  𝑐 𝑣3, 𝑣4  𝑑 𝑣1, 𝑣4 Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 = 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑 𝑚 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 − Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 Pushdown 𝛻𝑑: 𝑎 𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 𝑎 𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎 𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 v1 v2 v3v4 :b :c :d :a
  40. 40. NEGATIVE CONDITIONS Δ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ v1 v2 v3v4 :b :c :d :a  ,  ,  ,  , ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉
  41. 41. NEGATIVE CONDITIONS ⋉ ∈ . , where ∩ is a single vertex, because and represent edges. Δ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋈ Δ ⋈ Δ ⋈ ⋈ ⋉ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ ⋉ ∈ . Δ R1 R2 R3 R4 R5 R6 R7 ⋉ ⋈ ⋈ ⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈ Δ ⋉ ⋈ Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉ S1 S2 S3 S4 v1 v2 v3v4 :b :c :d :a  ,  ,  ,  ,
  42. 42. WITH [] AS pes, [] AS nes WITH [pe IN pes WHERE type(pe) = 'a'|pe] AS pas, [pe IN pes WHERE type(pe) = 'b'|pe] AS pbs, [pe IN pes WHERE type(pe) = 'c'|pe] AS pcs, [pe IN pes WHERE type(pe) = 'd'|pe] AS pds, [ne IN nes WHERE type(ne) = 'a'|ne] AS nas, [ne IN nes WHERE type(ne) = 'b'|ne] AS nbs, [ne IN nes WHERE type(ne) = 'c'|ne] AS ncs, [ne IN nes WHERE type(ne) = 'd'|ne] AS nds WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, [nd IN nds | startNode(nd)] AS nd_v1s, [nd IN nds | endNode(nd)] AS nd_v4s // calculating s1s...s4s // s1s: (𝑎⋉∇𝑑) UNWIND nd_v1s AS v1 MATCH (v1)-[:a]->(v2) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, collect({v1: v1, v2: v2}) AS s1s // s2s: (Δ𝑎⋉∇𝑑) UNWIND pas AS pa MATCH (v1)-[pa]->(v2) WHERE v1 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, collect({v1: v1, v2: v2}) AS s2s // s3s: (𝑐⋉∇𝑑) UNWIND nd_v4s AS v4 MATCH (v3)-[:c]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s,note collect({v3: v3, v4: v4}) AS s3s // s4s: (Δ𝑐⋉∇𝑑) UNWIND pcs AS pc MATCH (v3)-[pc]->(v4) WHERE v4 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, collect({v3: v3, v4: v4}) AS s4s // calculating r1...r7 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s // r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, collect([v1, v2, v3, v4]) AS r1 // r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s4s AS s4 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4 AS v4 WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, collect([v1, v2, v3, v4]) AS r2 // r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, collect([v1, v2, v3, v4]) AS r3 // r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s2s AS s2 UNWIND s3s AS s3 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4 AS v4 MATCH (v2)-[b:b]->(v3) WHERE b IN pbs WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, collect([v1, v2, v3, v4]) AS r4 // r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅⋉ 𝑑 UNWIND pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, collect([v1, v2, v3, v4]) AS r5 // r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pbs AS pb MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, collect([v1, v2, v3, v4]) AS r6 // r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅⋉ 𝑑 UNWIND pas AS pa MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, r5, r6, collect([v1, v2, v3, v4]) AS r7 WITH r1 + r5 + r6 + r7 AS rp, r2 + r3 + r4 AS rn RETURN [r IN rp WHERE NOT r IN rn] AS results POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑
  43. 43. POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ഥ⋉ 𝑑  Workaround: knowing the change workload helps o Only consider changes in Δ𝑑 and 𝛻𝑑 o Query is cleaner and much more efficient UNWIND $nds AS nd WITH startNode(nd) AS v1, endNode(nd) AS v4 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4  This can outperform recomputing the query from scratch.
  44. 44. LIST OPERATIONS IN CYPHER Instead of subqueries, use chained queries and combine lists. WITH [1, 1, 2, 2, 3] AS xs UNWIND xs AS x RETURN collect(DISTINCT x) AS unique WITH [1, 2, 3] AS xs, [2] AS ys RETURN xs + ys AS append, [x IN xs WHERE NOT x IN ys] AS subtraction, [x IN xs WHERE x IN ys] AS intersection WITH [1, 1, 2, 2, 3] AS xs RETURN reduce(acc = [], x in xs | acc + CASE x IN acc WHEN false THEN [x] ELSE [] END) AS unique Get unique list in openCypher
  45. 45. DELTA QUERIES IN CYPHER Delta queries are complex. Features that would be nice:  Subqueries //pattern comprehensions go some length  Named subqueries //help reusability  Subtracting lists //related: CIR-2017-180  Use collection elements or function results for matching These are probably too much to ask. -> recommended approach: compile directly to query plans. MATCH (n) WITH collect(n) AS ns MATCH (ns[0]) RETURN * MATCH (n) WITH collect(n) AS ns WITH ns[0] AS x MATCH (x) RETURN * 
  46. 46. CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2 (Non-First Normal Form): maps, lists  No schema (schema-optionality)  Graph structure Queries  Nulls, antijoins and left outerjoins  Updates on property values  Aggregates on aggregates, non-distributive functions  Ordering + skip/limit  Reachability queries
  47. 47. CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2  No schema  Graph Queries  Nulls  Updates  Aggregates  Ordering  Reachability A. Gupta, I. S. Mumick: Materialized Views. MIT Press, 1999 R. Chirkova, J. Yang: Materialized Views. Foundations and Trends in Databases, 2012         Decades of research -> 2 long surveys
  48. 48. OUR SURVEY OF RELATED IVM TECHNIQUES
  49. 49. DBTOASTER  Shows the scale of the problem  Relational data model and SQL queries  R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc.  Approach o Queries over an algebraic ring o Higher-order recursive IVM  Compiler in OCaml  Backend with code generation for C++, Scala/Spark Christoph Koch et al.: DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. VLDB Journal 2014
  50. 50. FUTURE DIRECTIONS  Work out derivation rules for Expand/VarExpand, …  Automate delta query derivation  Integrate to Neo4j  Run performance experiments o Train Benchmark (set semantics) o LDBC Social Network Benchmark’s BI workload (bag semantics)
  51. 51. Short news
  52. 52. LDBC BENCHMARKS  Social Network Benchmark o Business Intelligence workload published o openCypher reference implementation o Next goal: full conference paper  Graphalytics o Competition is online at graphalytics.org o Neo4j implementation (using the Graph Algorithms library) WIP graphalytics-platforms-neo4j/pull/6 Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 ldbc/ldbc_snb_implementations
  53. 53. GRAPH ANALYTICS ON THE PANAMA PAPERS  Network science approach: multidimensional graph metrics from social network analysis, biology, physics, etc.  Our work originally targeted software and system models.  Progress in 2018 o Q1: implemented adapters for Neo4j and CSV o Q2 goal: analyse Panama papers and using metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró: Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics, MODELS 2016 ftsrg/model-analyzer
  54. 54. MAPPING CYPHER TO SQL  Evaluate graph queries in an RDB - similar to ORM  Approaches o Cytosm: Cypher to SQL Mapper / gTop: graph topology o GraphGen – extracting graphs from RDBs o Ongoing work to map TCK to SQLite B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne: Cytosm: Declarative Property Graph Queries Without Data Migration. GRADES 2017 K. Xirogiannopoulos, V. Srinivas, A. Deshpande: GraphGen: Adaptive Graph Processing using Relational Databases. GRADES 2017 cytosm/cytosm KonstantinosX/graphgen-project
  55. 55. NEO4J APOC LIBRARY  CSV loader that follows the schema of the neo4j-import tool  Goal o Use headers to generate LOAD CSV commands. o 1st pass: CALL apoc.import.csv.node(file, labels, …) o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)  Result o Many corner cases -> ~700 LOC + tests o Covers most use cases, but is very slow o APOC PR pending neo4j-apoc-procedures/pull/581 neo4j-documentation/pull/121
  56. 56. SCOPING FOR OPENCYPHER p p x c c slizaa-opencypher-xtext/issues/7  Xtext grammar for the Slizaa software analysis workbench  Progress in 2018 o Q1: scope analyser implemented for Cypher grammar of M05 o Q2 goal: update to M10

×