Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Writing a Cypher Engine in Clojure

145 views

Published on

This talk shows how to run openCypher property graph queries with a search-based algorithm. Presented at the Budapest Clojure meetup.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Writing a Cypher Engine in Clojure

  1. 1. Writing a Cypher Engine in Clojure Dávid Szakállas (BME) Gábor Szárnyas (BME, MTA) 3rd Budapest Clojure meetup 2018
  2. 2. AGENDA Gábor:  Graph databases  Cypher language  Query evaluation Dávid:  Graph exploration strategies  Optimization  How to compile patterns
  3. 3. GRAPHS 𝐺 = 𝑉, 𝐸 𝐸 ⊆ 𝑉 × 𝑉 Textbook definition Flavours:  directed/undirected edges  labels  properties
  4. 4. GRAPH DATABASES pattern matching NoSQL family Data model: property graphs = vertices, edges, and properties analytics
  5. 5. CYPHER „Cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.” MATCH (pers:Person)-[:PRESENTERS]-> (:Presentation)<-[:TOPIC_OF]->(proj:Project) WHERE proj.name = 'ingraph' RETURN pers.name pers.name David Gabor
  6. 6. DBMS POPULARITY BY DATA MODEL version 2.0 introduces the Cypher query language
  7. 7. OPENCYPHER SYSTEMS  Goal: deliver a full and open specification of Cypher  Relational databases: o SAP HANA o AGENS Graph  Research prototypes: o Graphflow (Univesity of Waterloo) o ingraph (incremental graph engine) (Source: Keynote talk @ GraphConnect NYC 2017)
  8. 8. USE CASES  Analytics o IT security o Investigative journalism  “Real-time” execution o Fraud detection o Recommendation systems
  9. 9. INGRAPH  MTA-BME project  PoC query engine for openCypher  Primarily written in Scala  Goals: o Provide continuous evaluation (incremental view maintenance) o Cover standard openCypher constructs o Does not work efficiently for one-time query execution Szárnyas, G. et al., IncQuery-D: A distributed incremental model query framework in the cloud. MODELS, 2014, https://link.springer.com/chapter/10.1007/978-3-319-11653-2_40
  10. 10. GRAPH QUERY PROCESSING APPROACHES  Relational: projection, selection, join o SAP HANA o Agens Graph (PostgreSQL + Cypher) o ingraph  Relational + expand o Neo4j  Constraint satisfaction o VIATRA (BME project since 2002): “local search” engine o Our proposed openCypher engine (this talk)
  11. 11. RELATIONAL MODEL person parent Alice Sarah Alice Bob a b Bob Sarah Parent Married Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED PROPERTY GRAPH label properties edge type
  12. 12. EXPLORATION BASED METHODS  given this graph  and query Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED MATCH (u:Person) -[:PARENT]-> (f:Person) -[m:MARRIED]-> (m:Person) RETURN m, f find an execution plan
  13. 13. EXPLORATION BASED METHODS Person { name: Alice } Person { name: Bob } Person { name: Sarah } PARENT PARENT MARRIED  DFT -> small memory footprint  Small state-space -> less steps
  14. 14. EXPLORATION BASED METHODS Goal: minimize state-space generated for a query.  Cost based optimizations  Constraint satisfaction problems  SAT is an NP-full problem  Greedy algorithm unsatisfactory
  15. 15. EXPLORATION BASED METHODS Goal: minimize state-space generated for a query.  Iterative Dynamic Programming (IDP): oGeneralization of DP and greedy algorithms oPolynomial complexity oMultiple variants oWidely researched in the RDMS area -> use iterative dynamic programming
  16. 16. FULL VS GREEDY VS IDP ⋮ :e _ -[_]-> _ MATCH (a)-[:e]->(b) RETURN a, b
  17. 17. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1,0 100,0
  18. 18. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  19. 19. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  20. 20. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  21. 21. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b 2 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02
  22. 22. FULL VS GREEDY VS IDP ⋮ :e _-[_]->_ MATCH (a)-[:e]->(b) RETURN a, b a-[_]->_ _-[_]->b 1 100 a-[:e]->b 2 _-[_]->_ a-[_]->_ _-[_]->b 1 100 a-[:e]->b a-[:e]->b 2 0.02 IDP: Something in between...
  23. 23. MODEL-SENSITIVE APPROACH G. Varró, F. Deckwerth, M. Wieber, A. Schürr, An algorithm for generating model-sensitive search plans for pattern matching on EMF models, Software and Systems Modeling, 2013  Derives cost based on the graph elements  Object-oriented approach  Different goal: model validation Idea: adapt to property graphs.
  24. 24. ADAPTING IT Requirements  Variables for edges and vertices  Estimate based on the graph  Map to specialized indexer operations Difficulties  No type information  Original algorithm oblivious to edges with properties
  25. 25. CONSTRAINT DEFINITIONS  Constraints and operations are separate  Constraints can imply other constraints
  26. 26. OPERATIONS - CONSTRAINTS
  27. 27. OPERATIONS - CONSTRAINTS IMMEDIATE
  28. 28. OPERATIONS – COSTS  indexer support (context sensitive)  Based on variables referenced in the constraints  Given a (HasLabels a t) and a (Vertex a) constraints we can get the number of vertices for that label
  29. 29. SIMPLE PATTERNS Conjunction of positive terms
  30. 30. NEGATIVE PATTERNS Compiles to a single constraint with inner constraints. Negation needs at least one bound variable
  31. 31. BENCHMARKING design execution
  32. 32. CONCLUSION  Difficult to adapt this OO algorithm to property graphs  Clojure is well fitted to the problem  Subpar performance from the initial implementation  Looking for maintainers  Interested? Contact us: @szarnyasg @szdavid92 FTSRG/ingraph Dávid Szakállas: Evaluation of openCypher Graph Queries with a Local Search-Based Algorithm. Master’s thesis, Budapest University of Technology and Economics, 2017.

×