Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cypher

22,417 views

Published on

Published in: Technology, Business

Cypher

  1. 1. Cypher Query Language Chicago Graph Database Meet-Up Max De Marzi
  2. 2. What is Cypher?• Graph Query Language for Neo4j• Aims to make querying simple
  3. 3. Why Cypher? • Existing Neo4j query mechanisms were not simple enough • Too verbose (Java API) • Too prescriptive (Gremlin)
  4. 4. SQL? • Unable to express paths • these are crucial for graph-based reasoning • Neo4j is schema/table free
  5. 5. SPARQL? • SPARQL designed for a different data model • namespaces • properties as nodes • high learning curve
  6. 6. Design
  7. 7. Design Decisions Declarative Most of the time, Neo4j knows better than you Imperative Declarative follow relationship specify starting pointbreadth-first vs depth-first specify desired outcome explicit algorithm algorithm adaptable based on query
  8. 8. Design Decisions Pattern matching
  9. 9. Design Decisions Pattern matching A B C
  10. 10. Design Decisions Pattern matching
  11. 11. Design Decisions Pattern matching
  12. 12. Design Decisions Pattern matching
  13. 13. Design Decisions Pattern matching
  14. 14. Design Decisions ASCII-art patterns () --> ()
  15. 15. Design Decisions Directed relationship A B (A) --> (B)
  16. 16. Design Decisions Undirected relationship A B (A) -- (B)
  17. 17. Design Decisions specific relationships LOVES A B A -[:LOVES]-> B
  18. 18. Design Decisions Joined paths A B C A --> B --> C
  19. 19. Design Decisions multiple paths A B C A --> B --> C, A --> C A --> B --> C <-- A
  20. 20. Design Decisions Variable length paths A B A B A B ... A -[*]-> B
  21. 21. Design Decisions Optional relationships A B A -[?]-> B
  22. 22. Design Decisions Familiar for SQL users select start from match where where group by return order by
  23. 23. STARTSELECT *FROM PersonWHERE firstName = “Max”START max=node:persons(firstName = “Max”)RETURN max
  24. 24. MATCHSELECT skills.*FROM usersJOIN skills ON users.id = skills.user_idWHERE users.id = 101START user = node(101)MATCH user --> skillsRETURN skills
  25. 25. Optional MATCHSELECT skills.*FROM usersLEFT JOIN skills ON users.id = skills.user_idWHERE users.id = 101START user = node(101)MATCH user –[?]-> skillsRETURN skills
  26. 26. SELECT skills.*, user_skill.*FROM usersJOIN user_skill ON users.id = user_skill.user_idJOIN skills ON user_skill.skill_id = skill.id WHEREusers.id = 1
  27. 27. START user = node(1)MATCH user -[user_skill]-> skillRETURN skill, user_skill
  28. 28. IndexesUsed as multiple starting points, not to speedup any traversalsSTART a = node:nodes_index(type=User) MATCHa-[r:knows]-bRETURN ID(a), ID(b), r.weight
  29. 29. http://maxdemarzi.com/2012/03/16/jung-in-neo4j-par
  30. 30. Complicated MatchSome UGLY recursive self join on the groupstableSTART max=node:person(name=“Max")MATCH group <-[:BELONGS_TO*]- maxRETURN group
  31. 31. WhereSELECT person.*FROM personWHERE person.age >32 OR person.hair = "bald"START person = node:persons("name:*") WHEREperson.age >32 OR person.hair = "bald"RETURN person
  32. 32. ReturnSELECT person.name, count(*)FROM PersonGROUP BY person.nameORDER BY person.nameSTART person=node:persons("name:*") RETURNperson.name, count(*)ORDER BY person.name
  33. 33. Order By, ParametersSame as SQL{node_id} expected as part of requestSTART me = node({node_id})MATCH (me)-[?:follows]->(friends)-[?:follows]->(fof)-[?:follows]->(fofof)-[?:follows]->othersRETURN me.name, friends.name, fof.name, fofof.name, count(others)ORDER BY friends.name, fof.name, fofof.name, count(others) DESC
  34. 34. http://maxdemarzi.com/2012/02/13/visualizing-a-netw
  35. 35. Graph FunctionsSome UGLY multiple recursive self and inner joins onthe user and all related tablesSTART lucy=node(1000), kevin=node(759) MATCH p= shortestPath( lucy-[*]-kevin ) RETURN p
  36. 36. Aggregate FunctionsID: get the neo4j assigned identifierCount: add up the number of occurrencesMin: get the lowest valueMax: get the highest valueAvg: get the average of a numeric valueDistinct: remove duplicatesSTART me = node:nodes_index(type = user)MATCH (me)-[r?:wrote]-()RETURN ID(me), me.name, count(r), min(r.date), max(r.date)" ORDERBY ID(me)
  37. 37. FunctionsCollect: put all values in a listSTART a = node:nodes_index(type=User)MATCH a-[:follows]->bRETURN a.name, collect(b.name)
  38. 38. http://maxdemarzi.com/2012/02/02/graph-visualizatio
  39. 39. Combine FunctionsCollect the ID of friendsSTART me = node:nodes_index(type = user)"MATCH (me)<-[r?:wrote]-(friends)RETURN ID(me), me.name, collect(ID(friends)), collect(r.date)ORDER BY ID(me)
  40. 40. http://maxdemarzi.com/2012/03/08/connections-in-time/
  41. 41. UsesRecommend FriendsSTART me = node({node_id})MATCH (me)-[:friends]->(friend)-[:friends]->(foaf)RETURN foaf.name
  42. 42. UsesSix Degrees of Kevin BaconLength: counts the number of nodes along a pathExtract: gets the nodes/relationships from a pathSTART me=node({start_node_id}), them=node({destination_node_id})MATCH path = allShortestPaths( me-[?*]->them )RETURN length(path), extract(person in nodes(path) : person.name)
  43. 43. UsesSimilar UsersUsers who rated same items within 2 points.Abs: gets absolute numeric valueSTART me = node(user1)MATCH (me)-[myRating:RATED]->(i)<-[otherRating:RATED]-(u)WHERE abs(myRating.rating-otherRating.rating)<=2RETURN u
  44. 44. Boolean OperationsItems with a rating > 7 that similar users rated, but I have notAnd: this and that are trueOr: this or that is trueNot: this is falseSTART me=node(user1),        similarUsers=node(3) (result received in the first query)MATCH (similarUsers)-[r:RATED]->(item)WHERE r.rating > 7 AND NOT((me)-[:RATED]->(item)) RETURN itemhttp://thought-bytes.blogspot.com/2012/02/similarity-based-recommendation
  45. 45. PredicatesALL: closure is true for all itemsANY: closure is true for any itemNONE: closure is true for no itemsSINGLE: closure is true for exactly 1 itemSTART london = node(1), moscow = node(2)MATCH path = london -[*]-> moscowWHERE all(city in nodes(path) wherecity.capital = true)
  46. 46. Design Decisions Parsed, not an internal DSL Execution Semantics Serialisation Type System Portability
  47. 47. Design Decisions Database vs Application Design Goal: single user interaction expressible as single query Queries have enough logic to find required data, not enough to process it
  48. 48. Implementation
  49. 49. Implementation • Recursive matching with backtrackingSTART x=... MATCH x-->y, x-->z, y-->z, z-->a-->b, z-->b
  50. 50. Implementation Execution Planstart n=node(0) Cypher is Pipesreturn n lazily evaluatedParameters() pulling from pipes underneathNodes(n)Extract([n])ColumnFilter([n])
  51. 51. Implementation Execution Planstart n=node(0)match n-[*]-> breturn n.name, n, count(*)order by n.ageParameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])EagerAggregation( keys: [n.name, n], aggregates: [count(*)])Extract([n.age])Sort(n.age ASC)ColumnFilter([n.name,n,count(*)])
  52. 52. Implementation Execution Planstart n=node(0)match n-[*]-> breturn n.name, n, count(*)order by n.nameParameters()Nodes(n)PatternMatch(n-[*]->b)Extract([n.name, n])Sort(n.name ASC,n ASC)EagerAgregation( keys: [n.name, n], aggregates: [count(*)])ColumnFilter([n.name,n,count(*)])
  53. 53. Thanks for Listening! Questions?maxdemarzi.com

×