More Related Content



  1. Cypher Query Language Chicago Graph Database Meet-Up Max De Marzi
  2. What is Cypher? • Graph Query Language for Neo4j • Aims to make querying simple
  3. Why Cypher? • Existing Neo4j query mechanisms were not simple enough • Too verbose (Java API) • Too prescriptive (Gremlin)
  4. SQL? • Unable to express paths • these are crucial for graph-based reasoning • Neo4j is schema/table free
  5. SPARQL? • SPARQL designed for a different data model • namespaces • properties as nodes • high learning curve
  6. Design
  7. Design Decisions Declarative Most of the time, Neo4j knows better than you Imperative Declarative follow relationship specify starting point breadth-first vs depth-first specify desired outcome explicit algorithm algorithm adaptable based on query
  8. Design Decisions Pattern matching
  9. Design Decisions Pattern matching A B C
  10. Design Decisions Pattern matching
  11. Design Decisions Pattern matching
  12. Design Decisions Pattern matching
  13. Design Decisions Pattern matching
  14. Design Decisions ASCII-art patterns () --> ()
  15. Design Decisions Directed relationship A B (A) --> (B)
  16. Design Decisions Undirected relationship A B (A) -- (B)
  17. Design Decisions specific relationships LOVES A B A -[:LOVES]-> B
  18. Design Decisions Joined paths A B C A --> B --> C
  19. Design Decisions multiple paths A B C A --> B --> C, A --> C A --> B --> C <-- A
  20. Design Decisions Variable length paths A B A B A B ... A -[*]-> B
  21. Design Decisions Optional relationships A B A -[?]-> B
  22. Design Decisions Familiar for SQL users select start from match where where group by return order by
  23. START SELECT * FROM Person WHERE firstName = “Max” START max=node:persons(firstName = “Max”) RETURN max
  24. MATCH SELECT skills.* FROM users JOIN skills ON = skills.user_id WHERE = 101 START user = node(101) MATCH user --> skills RETURN skills
  25. Optional MATCH SELECT skills.* FROM users LEFT JOIN skills ON = skills.user_id WHERE = 101 START user = node(101) MATCH user –[?]-> skills RETURN skills
  26. SELECT skills.*, user_skill.* FROM users JOIN user_skill ON = user_skill.user_id JOIN skills ON user_skill.skill_id = WHERE = 1
  27. START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
  28. Indexes Used as multiple starting points, not to speed up any traversals START a = node:nodes_index(type='User') MATCH a-[r:knows]-b RETURN ID(a), ID(b), r.weight
  30. Complicated Match Some UGLY recursive self join on the groups table START max=node:person(name=“Max") MATCH group <-[:BELONGS_TO*]- max RETURN group
  31. Where SELECT person.* FROM person WHERE person.age >32 OR = "bald" START person = node:persons("name:*") WHERE person.age >32 OR = "bald" RETURN person
  32. Return SELECT, count(*) FROM Person GROUP BY ORDER BY START person=node:persons("name:*") RETURN, count(*) ORDER BY
  33. Order By, Parameters Same as SQL {node_id} expected as part of request START me = node({node_id}) MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)- [?:follows]->others RETURN,,,, count(others) ORDER BY,,, count(others) DESC
  35. Graph Functions Some UGLY multiple recursive self and inner joins on the user and all related tables START lucy=node(1000), kevin=node(759) MATCH p = shortestPath( lucy-[*]-kevin ) RETURN p
  36. Aggregate Functions ID: get the neo4j assigned identifier Count: add up the number of occurrences Min: get the lowest value Max: get the highest value Avg: get the average of a numeric value Distinct: remove duplicates START me = node:nodes_index(type = 'user') MATCH (me)-[r?:wrote]-() RETURN ID(me),, count(r), min(, max(" ORDER BY ID(me)
  37. Functions Collect: put all values in a list START a = node:nodes_index(type='User') MATCH a-[:follows]->b RETURN, collect(
  39. Combine Functions Collect the ID of friends START me = node:nodes_index(type = 'user')" MATCH (me)<-[r?:wrote]-(friends) RETURN ID(me),, collect(ID(friends)), collect( ORDER BY ID(me)
  41. Uses Recommend Friends START me = node({node_id}) MATCH (me)-[:friends]->(friend)-[:friends]->(foaf) RETURN
  42. Uses Six Degrees of Kevin Bacon Length: counts the number of nodes along a path Extract: gets the nodes/relationships from a path START me=node({start_node_id}), them=node({destination_node_id}) MATCH path = allShortestPaths( me-[?*]->them ) RETURN length(path), extract(person in nodes(path) :
  43. Uses Similar Users Users who rated same items within 2 points. Abs: gets absolute numeric value START me = node(user1) MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u) WHERE abs(myRating.rating-otherRating.rating)<=2 RETURN u
  44. Boolean Operations Items with a rating > 7 that similar users rated, but I have not And: this and that are true Or: this or that is true Not: this is false START me=node(user1),         similarUsers=node(3) (result received in the first query) MATCH (similarUsers)-[r:RATED]->(item) WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item))  RETURN item
  45. Predicates ALL: closure is true for all items ANY: closure is true for any item NONE: closure is true for no items SINGLE: closure is true for exactly 1 item START london = node(1), moscow = node(2) MATCH path = london -[*]-> moscow WHERE all(city in nodes(path) where = true)
  46. Design Decisions Parsed, not an internal DSL Execution Semantics Serialisation Type System Portability
  47. Design Decisions Database vs Application Design Goal: single user interaction expressible as single query Queries have enough logic to find required data, not enough to process it
  48. Implementation
  49. Implementation • Recursive matching with backtracking START x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
  50. Implementation Execution Plan start n=node(0) Cypher is Pipes return n lazily evaluated Parameters() pulling from pipes underneath Nodes(n) Extract([n]) ColumnFilter([n])
  51. Implementation Execution Plan start n=node(0) match n-[*]-> b return, n, count(*) order by n.age Parameters() Nodes(n) PatternMatch(n-[*]->b) Extract([, n]) EagerAggregation( keys: [, n], aggregates: [count(*)]) Extract([n.age]) Sort(n.age ASC) ColumnFilter([,n,count(*)])
  52. Implementation Execution Plan start n=node(0) match n-[*]-> b return, n, count(*) order by Parameters() Nodes(n) PatternMatch(n-[*]->b) Extract([, n]) Sort( ASC,n ASC) EagerAgregation( keys: [, n], aggregates: [count(*)]) ColumnFilter([,n,count(*)])
  53. Thanks for Listening! Questions?

Editor's Notes

  1. There existed a number of different ways to query a graph database. This one aims to make querying easy, and to produce queries that are readable. We looked at alternatives - SPARQL, SQL, Gremlin and other...