Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Writing a Cypher Engine in Clojure
Dávid Szakállas (BME)
Gábor Szárnyas (BME, MTA)
3rd Budapest Clojure meetup 2018
AGENDA
Gábor:
 Graph databases
 Cypher language
 Query evaluation
Dávid:
 Graph exploration strategies
 Optimization
...
GRAPHS
𝐺 = 𝑉, 𝐸 𝐸 ⊆ 𝑉 × 𝑉
Textbook definition
Flavours:
 directed/undirected edges
 labels
 properties
GRAPH DATABASES
pattern
matching
NoSQL family
Data model: property graphs = vertices, edges, and properties
analytics
CYPHER
„Cypher is a declarative, SQL-inspired language for describing
patterns in graphs visually using an ascii-art synta...
DBMS POPULARITY BY DATA MODEL
version 2.0 introduces
the Cypher query language
OPENCYPHER SYSTEMS
 Goal: deliver a full and open specification of Cypher
 Relational databases:
o SAP HANA
o AGENS Grap...
USE CASES
 Analytics
o IT security
o Investigative journalism
 “Real-time” execution
o Fraud detection
o Recommendation ...
INGRAPH
 MTA-BME project
 PoC query engine for openCypher
 Primarily written in Scala
 Goals:
o Provide continuous eva...
GRAPH QUERY PROCESSING APPROACHES
 Relational: projection, selection, join
o SAP HANA
o Agens Graph (PostgreSQL + Cypher)...
RELATIONAL MODEL
person parent
Alice Sarah
Alice Bob
a b
Bob Sarah
Parent
Married
Person
{ name: Alice }
Person
{ name: Bo...
EXPLORATION BASED METHODS
 given this graph  and query
Person
{ name: Alice }
Person
{ name: Bob }
Person
{ name: Sarah ...
EXPLORATION BASED METHODS
Person
{ name: Alice }
Person
{ name: Bob }
Person
{ name: Sarah }
PARENT
PARENT
MARRIED
 DFT -...
EXPLORATION BASED METHODS
Goal: minimize state-space generated for a query.
 Cost based optimizations
 Constraint satisf...
EXPLORATION BASED METHODS
Goal: minimize state-space generated for a query.
 Iterative Dynamic Programming (IDP):
oGenera...
FULL VS GREEDY VS IDP
⋮
:e _ -[_]-> _
MATCH (a)-[:e]->(b)
RETURN a, b
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1,0 100,0
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1 100
a-[:e]->b a-[:e]->b
2 0.02
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1 100
_-[_]->_
a-[_]->_ _-[_]->b
1 1...
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1 100
_-[_]->_
a-[_]->_ _-[_]->b
1 1...
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1 100
a-[:e]->b
2
_-[_]->_
a-[_]->_ ...
FULL VS GREEDY VS IDP
⋮
:e _-[_]->_
MATCH (a)-[:e]->(b)
RETURN a, b
a-[_]->_ _-[_]->b
1 100
a-[:e]->b
2
_-[_]->_
a-[_]->_ ...
MODEL-SENSITIVE APPROACH
G. Varró, F. Deckwerth, M. Wieber, A. Schürr,
An algorithm for generating model-sensitive search ...
ADAPTING IT
Requirements
 Variables for edges and vertices
 Estimate based on the graph
 Map to specialized indexer ope...
CONSTRAINT DEFINITIONS
 Constraints and
operations are
separate
 Constraints can
imply other
constraints
OPERATIONS - CONSTRAINTS
OPERATIONS - CONSTRAINTS
IMMEDIATE
OPERATIONS – COSTS
 indexer support (context sensitive)
 Based on variables referenced in the constraints
 Given a (Has...
SIMPLE PATTERNS
Conjunction of positive terms
NEGATIVE PATTERNS
Compiles to a single constraint with inner constraints.
Negation needs at least one bound variable
BENCHMARKING
design
execution
CONCLUSION
 Difficult to adapt this OO algorithm to property graphs
 Clojure is well fitted to the problem
 Subpar perf...
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
Writing a Cypher Engine in Clojure
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Writing a Cypher Engine in Clojure

Download to read offline

This talk shows how to run openCypher property graph queries with a search-based algorithm. Presented at the Budapest Clojure meetup.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Writing a Cypher Engine in Clojure

  1. 1. Writing a Cypher Engine in Clojure Dávid Szakállas (BME) Gábor Szárnyas (BME, MTA) 3rd Budapest Clojure meetup 2018
  2. 2. AGENDA Gábor:  Graph databases  Cypher language  Query evaluation Dávid:  Graph exploration strategies  Optimization  How to compile patterns
  3. 3. GRAPHS 𝐺 = 𝑉, 𝐸 𝐸 ⊆ 𝑉 × 𝑉 Textbook definition Flavours:  directed/undirected edges  labels  properties
  4. 4. GRAPH DATABASES pattern matching NoSQL family Data model: property graphs = vertices, edges, and properties analytics
  5. 5. CYPHER „Cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.” MATCH (pers:Person)-[:PRESENTERS]-> (:Presentation)<-[:TOPIC_OF]->(proj:Project) WHERE proj.name = 'ingraph' RETURN pers.name pers.name David Gabor
  6. 6. DBMS POPULARITY BY DATA MODEL version 2.0 introduces the Cypher query language
  7. 7. OPENCYPHER SYSTEMS  Goal: deliver a full and open specification of Cypher  Relational databases: o SAP HANA o AGENS Graph  Research prototypes: o Graphflow (Univesity of Waterloo) o ingraph (incremental graph engine) (Source: Keynote talk @ GraphConnect NYC 2017)
  8. 8. USE CASES  Analytics o IT security o Investigative journalism  “Real-time” execution o Fraud detection o Recommendation systems
  9. 9. INGRAPH  MTA-BME project  PoC query engine for openCypher  Primarily written in Scala  Goals: o Provide continuous evaluation (incremental view maintenance) o Cover standard openCypher constructs o Does not work efficiently for one-time query execution Szárnyas, G. et al., IncQuery-D: A distributed incremental model query framework in the cloud. MODELS, 2014, https://link.springer.com/chapter/10.1007/978-3-319-11653-2_40
  10. 10. GRAPH QUERY PROCESSING APPROACHES  Relational: projection, selection, join o SAP HANA o Agens Graph (PostgreSQL + Cypher) o ingraph  Relational + expand o Neo4j  Constraint satisfaction o VIATRA (BME project since 2002): “local search” engine o Our proposed openCypher engine (this talk)
  11. 11. RELATIONAL MODEL person parent Alice Sarah Alice Bob a b Bob Sarah Parent Married Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED PROPERTY GRAPH label properties edge type
  12. 12. EXPLORATION BASED METHODS  given this graph  and query Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED MATCH (u:Person) -[:PARENT]-> (f:Person) -[m:MARRIED]-> (m:Person) RETURN m, f find an execution plan
  13. 13. EXPLORATION BASED METHODS Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED  DFT -> small memory footprint  Small state-space -> less steps
  14. 14. EXPLORATION BASED METHODS Goal: minimize state-space generated for a query.  Cost based optimizations  Constraint satisfaction problems  SAT is an NP-full problem  Greedy algorithm unsatisfactory
  15. 15. EXPLORATION BASED METHODS Goal: minimize state-space generated for a query.  Iterative Dynamic Programming (IDP): oGeneralization of DP and greedy algorithms oPolynomial complexity oMultiple variants oWidely researched in the RDMS area -> use iterative dynamic programming
  16. 16. FULL VS GREEDY VS IDP ⋮ :e _ -[_]-> _ MATCH (a)-[:e]->(b) RETURN a, b
  17. 17. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1,0 100,0
  18. 18. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  19. 19. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  20. 20. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  21. 21. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b 2 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  22. 22. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b 2 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02 IDP: Something in between...
  23. 23. MODEL-SENSITIVE APPROACH G. Varró, F. Deckwerth, M. Wieber, A. Schürr, An algorithm for generating model-sensitive search plans for pattern matching on EMF models, Software and Systems Modeling, 2013  Derives cost based on the graph elements  Object-oriented approach  Different goal: model validation Idea: adapt to property graphs.
  24. 24. ADAPTING IT Requirements  Variables for edges and vertices  Estimate based on the graph  Map to specialized indexer operations Difficulties  No type information  Original algorithm oblivious to edges with properties
  25. 25. CONSTRAINT DEFINITIONS  Constraints and operations are separate  Constraints can imply other constraints
  26. 26. OPERATIONS - CONSTRAINTS
  27. 27. OPERATIONS - CONSTRAINTS IMMEDIATE
  28. 28. OPERATIONS – COSTS  indexer support (context sensitive)  Based on variables referenced in the constraints  Given a (HasLabels a t) and a (Vertex a) constraints we can get the number of vertices for that label
  29. 29. SIMPLE PATTERNS Conjunction of positive terms
  30. 30. NEGATIVE PATTERNS Compiles to a single constraint with inner constraints. Negation needs at least one bound variable
  31. 31. BENCHMARKING design execution
  32. 32. CONCLUSION  Difficult to adapt this OO algorithm to property graphs  Clojure is well fitted to the problem  Subpar performance from the initial implementation  Looking for maintainers  Interested? Contact us: @szarnyasg @szdavid92 FTSRG/ingraph Dávid Szakállas: Evaluation of openCypher Graph Queries with a Local Search-Based Algorithm. Master’s thesis, Budapest University of Technology and Economics, 2017.

This talk shows how to run openCypher property graph queries with a search-based algorithm. Presented at the Budapest Clojure meetup.

Views

Total views

527

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×