Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas, József Marton,
Márton Elekes, János Benjamin Antal
Budapest DB Meetup ...
PROPERTY GRAPH DATABASES
NoSQL family
Data model:
 nodes
 edges
 properties
#1 query approach:
graph pattern matching
RANKINGS: POPULARITY CHANGES PER CATEGORY
CYPHER AND OPENCYPHER
Cypher: query language of the Neo4j graph database.
„Cypher is a declarative, SQL-inspired language ...
OPENCYPHER SYSTEMS
 Increasing adoption
 Relational databases:
o SAP HANA
o AGENS Graph
 Research prototypes:
o Graphfl...
PROPERTY GRAPHS
 Textbook graph: 𝐺 = 𝑉, 𝐸
o Dijkstra, Ford-Fulkerson, etc.
o Homogenous nodes
o Homogeneous edges
 Exten...
GRAPH VS. RELATIONAL DATABASES
 Graph databases
o Graph-based modelling is intuitive
o Concise query language
 Relationa...
PROPOSED APPROACH
To get the best of both worlds, map Cypher queries to SQL:
1. Formulate queries in Cypher
2. Execute ins...
GRAPH QUERIES IN CYPHER
 Subgraph matching and graph traversals
 Example: Alice’s colleagues and their colleagues
MATCH ...
Data and Query Mapping
MAPPING BETWEEN DATA MODELS
Class 2
Objects
:Class1
attr1: String
attr2: int
:Class 2
attr1: String
attr2: int
Graph
:Cour...
DATA MAPPING #1: GENERIC SCHEMA
 Useful for representing schema-free data sets
 Hopelessly slow
Bob
53
Carol
38
A Ltd.
D...
DATA MAPPING #2: CONCRETE SCHEMA
QUERY MAPPING
Cypher
query
graph
relational
algebra
SQL
algebra
SQL
query
MATCH (p:Person)
-[:COLL|:FRIEND]-()
…
RETURN p
...
A SIMPLE EXAMPLE
MATCH (node)
WITH node.name AS name
RETURN name ◯(𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒)
𝜋 𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒→𝑛𝑎𝑚𝑒
𝜋 𝑛𝑎𝑚𝑒
SELECT "name"
F...
CHALLENGES #1
 Variable length paths: union of multiple subqueries
 Unbound: WITH RECURSIVE (fixpoint-based evaluation)
...
CHALLENGES #2
 Edges are directed
o Undirectedness is modelled in the query
o Union of both directions
MATCH (p1:Person …...
CHALLENGES #3
 Multiple tables as sources
MATCH (p1:Person …)-[:COLL|:FRIEND]-(p2:Person)
RETURN p2.name
Bob
53
Carol
38
...
CHALLENGES #1 #2 #3
Simple graph patterns turn to many subqueries
 Querying edges as undirected:
• Enumerate edges in bot...
A COMPLEX EXAMPLE
WITH
q0 AS
(-- GetVerticesWithGTop
SELECT
ROW(0, p_personid)::vertex_type AS "_e186#0",
"p_personid" AS ...
Benchmarks
BENCHMARKS: LINKED DATA BENCHMARK COUNCIL
LDBC is a non-profit organization dedicated to establishing benchmarks,
benchmar...
LDBC IN A NUTSHELL
Peter Boncz, Thomas Neumann, Orri Erling,
TPC-H Analyzed: Hidden Messages and Lessons Learned
from an I...
PERFORMANCE EXPERIMENTS
 LDBC Interactive workload
 Tools
o PostgreSQL (reference implementation)
o Cypher-to-SQL querie...
BENCHMARK RESULTS ON LDBC QUERIES
RELATED PROJECTS
 Cytosm
o Cypher to SQL Mapping
o HP Labs for Vertica
o Project abandoned in 2017
o gTop (graph topology...
SUMMARY
 Mapping property graph queries to SQL is challenging
o Similar to ORM
o + edge properties
o + reachability
 Ini...
RELATED RESOURCES
ingraph and C2S github.com/ftsrg/ingraph
Cypher for Apache Spark github.com/opencypher/cypher-for-apache...
Upcoming SlideShare
Loading in …5
×

Mapping Graph Queries to PostgreSQL

47 views

Published on

Talk on compiling Cypher queries to PostgreSQL, presented at the Budapest Database Meetup in November 2018

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Mapping Graph Queries to PostgreSQL

  1. 1. Mapping Graph Queries to PostgreSQL Gábor Szárnyas, József Marton, Márton Elekes, János Benjamin Antal Budapest DB Meetup — 2018/Nov/13
  2. 2. PROPERTY GRAPH DATABASES NoSQL family Data model:  nodes  edges  properties #1 query approach: graph pattern matching
  3. 3. RANKINGS: POPULARITY CHANGES PER CATEGORY
  4. 4. CYPHER AND OPENCYPHER Cypher: query language of the Neo4j graph database. „Cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.” MATCH (d:Database)<-[:RELATED_TO]-(:Talk)-[:PRESENTED_AT]->(m:Meetup) WHERE m.date = 'Tuesday, November 13, 2018' RETURN d „The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher.” (late 2015)
  5. 5. OPENCYPHER SYSTEMS  Increasing adoption  Relational databases: o SAP HANA o AGENS Graph  Research prototypes: o Graphflow (Univesity of Waterloo, Canada) o ingraph (incremental graph engine @ BME) (Source: Keynote talk @ GraphConnect NYC 2017)
  6. 6. PROPERTY GRAPHS  Textbook graph: 𝐺 = 𝑉, 𝐸 o Dijkstra, Ford-Fulkerson, etc. o Homogenous nodes o Homogeneous edges  Extensions o Labelled nodes o Typed edges o Properties  The schema is implicit  Very intuitive o Things and connections Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_AT since name age Person
  7. 7. GRAPH VS. RELATIONAL DATABASES  Graph databases o Graph-based modelling is intuitive o Concise query language  Relational databases o Most common o Many legacy systems o Efficient and mature  No tools available to bridge the two o i.e. query data in RDBs as a graph o first you have to wrangle the graph out of the RDB Col1 Col2 1 A 2 B Tables Graph :Course name: 'Phys1' REQUIRES :Course name: 'Phys2'ENROLLED :Student name: 'John' ENROLLED
  8. 8. PROPOSED APPROACH To get the best of both worlds, map Cypher queries to SQL: 1. Formulate queries in Cypher 2. Execute inside an existing RDB 3. Return results as a graph relation
  9. 9. GRAPH QUERIES IN CYPHER  Subgraph matching and graph traversals  Example: Alice’s colleagues and their colleagues MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person) RETURN p2.name Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_ATsince name age Person
  10. 10. Data and Query Mapping
  11. 11. MAPPING BETWEEN DATA MODELS Class 2 Objects :Class1 attr1: String attr2: int :Class 2 attr1: String attr2: int Graph :Course name: 'Phys1' REQUIRES :Course name: 'Phys2'ENROLLED :Student name: 'John' ENROLLED object- graph mapping object-relational mapping (ORM) e.g. JPA, Entity Framework graph-relational mapping Col1 Col2 1 A 2 B Tables
  12. 12. DATA MAPPING #1: GENERIC SCHEMA  Useful for representing schema-free data sets  Hopelessly slow Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30
  13. 13. DATA MAPPING #2: CONCRETE SCHEMA
  14. 14. QUERY MAPPING Cypher query graph relational algebra SQL algebra SQL query MATCH (p:Person) -[:COLL|:FRIEND]-() … RETURN p ◯ 𝑃𝑒𝑟𝑠𝑜𝑛 ⋈ 𝜎 𝑎=5 ◯ → ◯ COLL, FRIEND 𝜋 𝑝 ⋈ 𝑐𝑜𝑙1,𝑐𝑜𝑙2 𝜎 𝑎=5 𝜋 𝑝 compiler mapping code generation ◯ person ◯ → ◯ colleague ◯ → ◯ friend ⋃ SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
  15. 15. A SIMPLE EXAMPLE MATCH (node) WITH node.name AS name RETURN name ◯(𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒) 𝜋 𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒→𝑛𝑎𝑚𝑒 𝜋 𝑛𝑎𝑚𝑒 SELECT "name" FROM ( SELECT "node.name" AS "name" FROM ( SELECT vertex_id AS "node", (SELECT value FROM vertex_property WHERE parent = vertex_id AND key = 'name') AS "node.name" FROM vertex ) ) Cypher query relational algebra tree ingraph mapping
  16. 16. CHALLENGES #1  Variable length paths: union of multiple subqueries  Unbound: WITH RECURSIVE (fixpoint-based evaluation)  WITH RECURSIVE was introduced in SQL:1999 but o PostgreSQL 8.4+ (since 2009) o SQLite 3.8.3+ (since 2014) o MySQL 8.0.1+ (since 2017) MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person) RETURN p2.name MATCH (p1:Person {name: 'Alice'})-[c:COLL*]-(p2:Person) RETURN p2.name
  17. 17. CHALLENGES #2  Edges are directed o Undirectedness is modelled in the query o Union of both directions MATCH (p1:Person …)-[:FRIEND]-(p2:Person), (p2)-[:WORKS_AT]->(c:Company) RETURN p2.name, c.name Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_ATsince name age Person
  18. 18. CHALLENGES #3  Multiple tables as sources MATCH (p1:Person …)-[:COLL|:FRIEND]-(p2:Person) RETURN p2.name Bob 53 Carol 38 David 47 Erin 30 Person, Student COLLEAGUE FRIEND name age Person id name 1 Alice 2 Bob person p1 p2 1 2 1 3 colleague p1 p2 2 1 5 4 friend name age
  19. 19. CHALLENGES #1 #2 #3 Simple graph patterns turn to many subqueries  Querying edges as undirected: • Enumerate edges in both directions (2 ×)  Variable length paths (1..L, *): • Limited: enumerate 1, 2, … , 𝐿 → 𝐿 × • Unlimited: use WITH RECURSIVE  Multiple node labels/edge types: • Enumerate all source tables (𝑁 ×) Total: union of 2 × 𝐿 × 𝑁 subqueries in SQL Bob 53 Carol 38 David 47 Person, Student COLLEAGUE FRIEND name age Person MATCH (p1:Person …)-[:COLL|:FRIEND*1..2]-(p2:Person)
  20. 20. A COMPLEX EXAMPLE WITH q0 AS (-- GetVerticesWithGTop SELECT ROW(0, p_personid)::vertex_type AS "_e186#0", "p_personid" AS "_e186.id#0" FROM person), q1 AS (-- Selection SELECT * FROM q0 AS subquery WHERE ("_e186.id#0" = :personId)), q2 AS (-- GetEdgesWithGTop SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "_e186#0", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS "_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "friend#2", toTable."p_personid" AS "friend.id#2", toTable."p_firstname" AS "friend.firstName#1", toTable."p_lastname" AS "friend.lastName#2" FROM "knows" edgeTable JOIN "person" toTable ON (edgeTable."k_person2id" = toTable."p_personid")), q3 AS (-- GetEdgesWithGTop SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "friend#2", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS "_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "_e186#0", fromTable."p_personid" AS "friend.id#2", fromTable."p_firstname" AS "friend.firstName#1", fromTable."p_lastname" AS "friend.lastName#2" FROM "knows" edgeTable JOIN "person" fromTable ON (fromTable."p_personid" = edgeTable."k_person1id")), q4 AS (-- UnionAll SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q2 UNION ALL SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q3), q5 AS (-- EquiJoinLike SELECT left_query."_e186#0", left_query."_e186.id#0", right_query."friend#2", right_query."friend.id#2", right_query."_e187#0", right_query."friend.lastName#2", right_query."friend.firstName#1" FROM q1 AS left_query INNER JOIN q4 AS right_query ON left_query."_e186#0" = right_query."_e186#0"), q6 AS (-- GetEdgesWithGTop SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS "_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2", fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_ps_imagefile" AS "message.imageFile#0", fromTable."m_creationdate" AS "message.creationDate#13" FROM "message" fromTable WHERE (fromTable."m_c_replyof" IS NULL)), q7 AS (-- GetEdgesWithGTop SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS "_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2", fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_creationdate" AS "message.creationDate#13" FROM "message" fromTable WHERE (fromTable."m_c_replyof" IS NOT NULL)), q8 AS (-- UnionAll SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", "message.imageFile#0", "message.creationDate#13" FROM q6 UNION ALL SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", NULL AS "message.imageFile#0", "message.creationDate#13" FROM q7), q9 AS (-- EquiJoinLike SELECT left_query."_e186#0", left_query."_e186.id#0", left_query."_e187#0", left_query."friend#2", left_query."friend.id#2", left_query."friend.firstName#1", left_query."friend.lastName#2", right_query."message#17", right_query."message.id#2", right_query."message.imageFile#0", right_query."_e188#0", right_query."message.creationDate#13", right_query."message.content#2" FROM q5 AS left_query INNER JOIN q8 AS right_query ON left_query."friend#2" = right_query."friend#2"), q10 AS (-- AllDifferent SELECT * FROM q9 AS subquery WHERE is_unique(ARRAY[]::edge_type[] || "_e188#0" || "_e187#0")), q11 AS (-- Selection SELECT * FROM q10 AS subquery WHERE ("message.creationDate#13" <= :maxDate)), q12 AS (-- Projection SELECT "friend.id#2" AS "personId#0", "friend.firstName#1" AS "personFirstName#0", "friend.lastName#2" AS "personLastName#0", "message.id#2" AS "postOrCommentId#0", CASE WHEN ("message.content#2" IS NOT NULL = true) THEN "message.content#2" ELSE "message.imageFile#0" END AS "postOrCommentContent#0", "message.creationDate#13" AS "postOrCommentCreationDate#0" FROM q11 AS subquery), q13 AS (-- SortAndTop SELECT * FROM q12 AS subquery ORDER BY "postOrCommentCreationDate#0" DESC NULLS LAST, ("postOrCommentId#0")::BIGINT ASC NULLS FIRST LIMIT 20) SELECT "personId#0" AS "personId", "personFirstName#0" AS "personFirstName", "personLastName#0" AS "personLastName", "postOrCommentId#0" AS "postOrCommentId", "postOrCommentContent#0" AS "postOrCommentContent", "postOrCommentCreationDate#0" AS "postOrCommentCreationDate" FROM q13 AS subquery MATCH (:Person {id:$personId})-[:KNOWS]-(friend:Person)<- [:HAS_CREATOR]-(message:Message) WHERE message.creationDate <= $maxDate RETURN friend.id AS personId, friend.firstName AS personFirstName, friend.lastName AS personLastName, message.id AS postOrCommentId, CASE exists(message.content) WHEN true THEN message.content ELSE message.imageFile END AS postOrCommentContent, message.creationDate AS postOrCommentCreationDate ORDER BY postOrCommentCreationDate DESC, toInteger(postOrCommentId) ASC LIMIT 20
  21. 21. Benchmarks
  22. 22. BENCHMARKS: LINKED DATA BENCHMARK COUNCIL LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Social Network Benchmark is an industrial and academic initiative, formed by principal actors in the field of graph-like data management.
  23. 23. LDBC IN A NUTSHELL Peter Boncz, Thomas Neumann, Orri Erling, TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark, TPCTC 2013 Gábor Szárnyas, József Marton, János Benjamin Antal et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 Orri Erling et al., The LDBC Social Network Benchmark: Interactive Workload, SIGMOD 2015
  24. 24. PERFORMANCE EXPERIMENTS  LDBC Interactive workload  Tools o PostgreSQL (reference implementation) o Cypher-to-SQL queries on PostgreSQL o Semantic database (anonymized)  Geometric mean of 20+ executions
  25. 25. BENCHMARK RESULTS ON LDBC QUERIES
  26. 26. RELATED PROJECTS  Cytosm o Cypher to SQL Mapping o HP Labs for Vertica o Project abandoned in 2017 o gTop (graph topology) reused  Cypher for Apache Spark o Neo4j’s project o Executes queries in Spark o Read-only Tool Source Target OSS Updates Paths CAPS Cypher SparkSQL    Cytosm Cypher Vertica SQL    Cypher-to-SQL Cypher PostgreSQL   
  27. 27. SUMMARY  Mapping property graph queries to SQL is challenging o Similar to ORM o + edge properties o + reachability  Initial implementation: C2S o Moderate feature coverage o Poor performance o Needs some tweaks, e.g. work around CTE optimization fences Gábor Szárnyas, József Marton, János Maginecz, Dániel Varró: Incremental View Maintenance on Property Graphs. arXiv preprint 2018
  28. 28. RELATED RESOURCES ingraph and C2S github.com/ftsrg/ingraph Cypher for Apache Spark github.com/opencypher/cypher-for-apache-spark Cytosm github.com/cytosm/cytosm LDBC github.com/ldbc/ Thanks for the contributions to the whole ingraph team.

×