Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tips and Tricks for Graph Data Modeling

662 views

Published on

As the only NoSQL database category that prioritizes relationships, graph databases provide all the flexibility of a NoSQL database with optimized performance for connected data. This webinar will walk you through how to model your data as a graph. We will demonstrate how to avoid pitfalls early on and how to optimize your model for answering questions as Cypher queries.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tips and Tricks for Graph Data Modeling

  1. 1. Graph Modeling Tips and Tricks Mark Needham @markhneedham
  2. 2. Introducing our data set...
  3. 3. meetup.com’s recommendations
  4. 4. Recommendation queries ‣ Several different types • groups to join • topics to follow • events to attend ‣ As a user of meetup.com trying to find groups to join and events to attend
  5. 5. How will this talk be structured?
  6. 6. Find similar groups to Neo4j As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups
  7. 7. What makes groups similar?
  8. 8. As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups Find similar groups to Neo4j
  9. 9. As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups Nodes
  10. 10. As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups Relationships
  11. 11. As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups Labels
  12. 12. As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups Properties
  13. 13. Find similar groups to Neo4j MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10
  14. 14. Find similar groups to Neo4j
  15. 15. Tip: Model incrementally ‣ Build the model for the question we need to answer now then move onto the next question
  16. 16. I’m already a member of these!
  17. 17. What other data can we get?
  18. 18. Exclude groups I’m a member of As a member of the Neo4j London group I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  19. 19. Exclude groups I’m a member of As a member of the Neo4j London group I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  20. 20. Exclude groups I’m a member of MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, EXISTS((:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup)) AS alreadyMember, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC LIMIT 10
  21. 21. Exclude groups I’m a member of
  22. 22. Exclude groups I’m a member of MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group) WHERE NOT( (:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup) ) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC LIMIT 10
  23. 23. Exclude groups I’m a member of
  24. 24. Find my similar groups As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  25. 25. Find my similar groups As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  26. 26. Find my similar groups MATCH (member:Member {name: "Mark Needham"}) -[:INTERESTED_IN]->(topic), (member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC
  27. 27. Find my similar groups
  28. 28. Find Jonny’s similar groups
  29. 29. Oops...Jonny has no interests!
  30. 30. What is Jonny interested in? As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups INTERESTED_IN?
  31. 31. What is Jonny interested in? There’s an implicit INTERESTED_IN relationship between the topics of groups I belong to but don’t express an interest in. Let’s make it explicit P G T MEMBER_OF HAS_TOPIC P G T MEMBER_OF HAS_TOPIC INTERESTED_IN
  32. 32. What is Jonny interested in? MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH m, topic, COUNT(*) AS times WHERE times > 3 MERGE (m)-[:INTERESTED_IN]->(topic)
  33. 33. What is Jonny interested in?
  34. 34. Tip: Make the implicit explicit ‣ Fill in the missing links in the graph ‣ You could run this type of query once a day during a quiet period ‣ On bigger graphs we’d run it in batches to avoid loading the whole database into memory
  35. 35. Find next group people join As a member of a meetup group I want to find out which meetup groups other people join after this one So that I can join those groups
  36. 36. Find next group people join MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WHERE membership.joined < otherMembership.joined WITH member, otherGroup ORDER BY otherMembership.joined WITH member, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  37. 37. Find next group people join
  38. 38. It feels a bit clunky... MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WHERE membership.joined < otherMembership.joined WITH member, otherGroup ORDER BY otherMembership.joined WITH member, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC ‣ We have to scan through all the MEMBER_OF relationships to find the one we want ‣ It might make our lives easier if we made membership a first class citizen of the domain
  39. 39. Facts can become nodes
  40. 40. Facts can become nodes
  41. 41. Refactor to facts MATCH (member:Member)-[rel:MEMBER_OF]->(group) MERGE (membership:Membership {id: member.id + "_" + group.id}) SET membership.joined = rel.joined MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group)
  42. 42. Refactor to facts MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership) WITH member, membership ORDER BY member.id, membership.joined WITH member, COLLECT(membership) AS memberships UNWIND RANGE(0,SIZE(memberships) - 2) as idx WITH memberships[idx] AS m1, memberships[idx+1] AS m2 MERGE (m1)-[:NEXT]->(m2)
  43. 43. Find next group people join MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  44. 44. Comparing the approaches vs MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WITH member, membership, otherMembership, otherGroup ORDER BY member.id, otherMembership.joined WHERE membership.joined < otherMembership.joined WITH member, membership, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  45. 45. How do I profile a query? ‣ EXPLAIN • shows the execution plan without actually executing it or returning any results. ‣ PROFILE • executes the statement and returns the results along with profiling information. 45
  46. 46. Neo4j’s longest plan (so far…) 46
  47. 47. Neo4j’s longest plan (so far…) 47
  48. 48. Neo4j’s longest plan (so far…) 48
  49. 49. What is our goal? At a high level, the goal is simple: get the number of db hits down. 49
  50. 50. an abstract unit of storage engine work. What is a database hit? “ ” 50
  51. 51. Comparing the approaches Cypher version: CYPHER 2.3, planner: COST. 111656 total db hits in 330 ms. vs Cypher version: CYPHER 2.3, planner: COST. 23650 total db hits in 60 ms.
  52. 52. Tip: Profile your queries ‣ Spike the different models and see which one performs the best
  53. 53. Should we keep both models? We could but when we add, edit or remove a membership we’d have to keep both graph structures in sync.
  54. 54. Adding a group membership WITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now MATCH (group:Group {name: groupName}) MATCH (member:Member {name: memberName}) MERGE (member)-[memberOfRel:MEMBER_OF]->(group) ON CREATE SET memberOfRel.time = now MERGE (membership:Membership {id: member.id + "_" + group.id}) ON CREATE SET membership.joined = now MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group)
  55. 55. Removing a group membership WITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now MATCH (group:Group {name: groupName}) MATCH (member:Member {name: memberName}) MATCH (member)-[memberOfRel:MEMBER_OF]->(group) MATCH (membership:Membership {id: member.id + "_" + group.id}) MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership) MATCH (membership)-[ofGroupRel:OF_GROUP]->(group) DELETE memberOfRel, hasMembershipRel, ofGroupRel, membership
  56. 56. Let’s delete MEMBER_OF then...
  57. 57. ...not so fast! As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  58. 58. Why not delete MEMBER_OF? MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC
  59. 59. Why not delete MEMBER_OF? MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC 433318 total db hits in 485 ms. 83268 total db hits in 117 ms.
  60. 60. Tip: Maintaining multiple models ‣ Different models perform better for different queries but worse for others ‣ Optimising for reads may mean we pay a write and maintenance penalty
  61. 61. What about events?
  62. 62. Events in my groups As a member of several meetup groups who has previously attended events I want to find other events hosted by those groups So that I can attend those events
  63. 63. Events in my groups As a member of several meetup groups who has previously attended events I want to find other events hosted by those groups So that I can attend those events
  64. 64. WITH 24.0*60*60*1000 AS oneDay MATCH (member:Member {name: "Mark Needham"}), (member)-[:MEMBER_OF]->(group), (group)-[:HOSTED_EVENT]->(futureEvent) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) RETURN group.name, futureEvent.name, round((futureEvent.time - timestamp()) / oneDay) AS days ORDER BY days LIMIT 10 Events in my groups
  65. 65. Events in my groups
  66. 66. + previous events attended WITH 24.0*60*60*1000 AS oneDay MATCH (member:Member {name: "Mark Needham"}) MATCH (futureEvent:Event) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) MATCH (futureEvent)<-[:HOSTED_EVENT]-(group) WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group) WHERE pastEvent.time < timestamp() RETURN group.name, futureEvent.name, isMember, COUNT(rsvp) AS previousEvents, round((futureEvent.time - timestamp()) / oneDay) AS days ORDER BY days, previousEvents DESC
  67. 67. + previous events attended
  68. 68. RSVPD_YES vs RSVPD I was curious whether refactoring RSVPD {response: "yes"} to RSVPD_YES would have any impact as Neo4j is optimised for querying by unique relationship types.
  69. 69. Refactor to specific relationships MATCH (m:Member)-[rsvp:RSVPD {response:"yes"}]->(event) MERGE (m)-[rsvpYes:RSVPD_YES {id: rsvp.id}]->(event) ON CREATE SET rsvpYes.created = rsvp.created, rsvpYes.lastModified = rsvp.lastModified; MATCH (m:Member)-[rsvp:RSVPD {response:"no"}]->(event) MERGE (m)-[rsvpYes:RSVPD_NO {id: rsvp.id}]->(event) ON CREATE SET rsvpYes.created = rsvp.created, rsvpYes.lastModified = rsvp.lastModified;
  70. 70. RSVPD_YES vs RSVPD RSVPD {response: "yes"} vs RSVPD_YES Cypher version: CYPHER 2.3, planner: COST. 688635 total db hits in 232 ms. Cypher version: CYPHER 2.3, planner: COST. 559866 total db hits in 207 ms.
  71. 71. Why would we keep RSVPD? MATCH (m:Member)-[r:RSVPD]->(event)<-[:HOSTED_EVENT]-(group) WHERE m.name = "Mark Needham" RETURN event, group, r MATCH (m:Member)-[r:RSVPD_YES|:RSVPD_NO|:RSVPD_WAITLIST]->(event), (event)<-[:HOSTED_EVENT]-(group) WHERE m.name = "Mark Needham" RETURN event, group, r
  72. 72. Tip: Specific relationships ‣ Neo4j is optimised for querying by unique relationship types… ‣ ...but sometimes we pay a query maintenance cost to achieve this
  73. 73. + events my friends are attending There’s an implicit FRIENDS relationship between people who attended the same events. Let’s make it explicit. M E M RSVPD RSVPD FRIENDS M E M RSVPD RSVPD
  74. 74. + events my friends are attending MATCH (m1:Member) WHERE NOT m1:Processed WITH m1 LIMIT {limit} MATCH (m1)-[:RSVP_YES]->(event:Event)<-[:RSVP_YES]-(m2:Member) WITH m1, m2, COLLECT(event) AS events, COUNT(*) AS times WHERE times >= 5 WITH m1, m2, times, [event IN events | SIZE((event)<-[:RSVP_YES]-())] AS attendances WITH m1, m2, REDUCE(score = 0.0, a IN attendances | score + (1.0 / a)) AS score MERGE (m1)-[friendsRel:FRIENDS]-(m2) SET friendsRel.score = row.score
  75. 75. Bidirectional relationships ‣ You may have noticed that we didn’t specify a direction when creating the relationship MERGE (m1)-[:FRIENDS]-(m2) ‣ FRIENDS is a bidirectional relationship. We only need to create it once between two people. ‣ We ignore the direction when querying
  76. 76. + events my friends are attending WITH 24.0*60*60*1000 AS oneDay MATCH (member:Member {name: "Mark Needham"}) MATCH (futureEvent:Event) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) MATCH (futureEvent)<-[:HOSTED_EVENT]-(group) WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group) WHERE pastEvent.time < timestamp() WITH oneDay, group, futureEvent, member, isMember, COUNT(rsvp) AS previousEvents OPTIONAL MATCH (futureEvent)<-[:HOSTED_EVENT]-()-[:HAS_TOPIC]->(topic)<-[:INTERESTED_IN]-(member) WITH oneDay, group, futureEvent, member, isMember, previousEvents, COUNT(topic) AS topics OPTIONAL MATCH (member)-[:FRIENDS]-(:Member)-[rsvpYes:RSVP_YES]->(futureEvent) RETURN group.name, futureEvent.name, isMember, round((futureEvent.time - timestamp()) / oneDay) AS days, previousEvents, topics, COUNT(rsvpYes) AS friendsGoing ORDER BY days, friendsGoing DESC, previousEvents DESC LIMIT 15
  77. 77. + events my friends are attending
  78. 78. Tip: Bidirectional relationships ‣ Some relationships are bidirectional in nature ‣ Neo4j always stores relationships with a direction but we can choose to ignore that when we query
  79. 79. tl;dr ‣ Model incrementally ‣ Always profile your queries ‣ Consider making the implicit explicit… • ...but beware the maintenance cost ‣ Be specific with relationship types ‣ Ignore direction for bidirectional relationships
  80. 80. That’s all for today! Questions? :-) Mark Needham @markhneedham https://github.com/neo4j-meetups/modeling-worked-example

×