As the only NoSQL database category that prioritizes relationships, graph databases provide all the flexibility of a NoSQL database with optimized performance for connected data. This webinar will walk you through how to model your data as a graph. We will demonstrate how to avoid pitfalls early on and how to optimize your model for answering questions as Cypher queries.
4. Recommendation queries
‣ Several different types
• groups to join
• topics to follow
• events to attend
‣ As a user of meetup.com trying to find
groups to join and events to attend
8. As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Find similar groups to Neo4j
9. As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Nodes
10. As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Relationships
11. As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Labels
12. As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Properties
13. Find similar groups to Neo4j
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
18. Exclude groups I’m a member of
As a member of the Neo4j London group
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
19. Exclude groups I’m a member of
As a member of the Neo4j London group
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
20. Exclude groups I’m a member of
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
EXISTS((:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(otherGroup)) AS alreadyMember,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC
LIMIT 10
22. Exclude groups I’m a member of
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)
WHERE NOT( (:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(otherGroup) )
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC
LIMIT 10
24. Find my similar groups
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
25. Find my similar groups
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
26. Find my similar groups
MATCH (member:Member {name: "Mark Needham"})
-[:INTERESTED_IN]->(topic),
(member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name,
COLLECT(topic.name),
SUM(score) as score
ORDER BY score DESC
30. What is Jonny interested in?
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
INTERESTED_IN?
31. What is Jonny interested in?
There’s an implicit INTERESTED_IN relationship
between the topics of groups I belong to but
don’t express an interest in. Let’s make it explicit
P
G
T
MEMBER_OF
HAS_TOPIC
P
G
T
MEMBER_OF
HAS_TOPIC
INTERESTED_IN
32. What is Jonny interested in?
MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH m, topic, COUNT(*) AS times
WHERE times > 3
MERGE (m)-[:INTERESTED_IN]->(topic)
34. Tip: Make the implicit explicit
‣ Fill in the missing links in the graph
‣ You could run this type of query once a day
during a quiet period
‣ On bigger graphs we’d run it in
batches to avoid loading the
whole database into memory
35. Find next group people join
As a member of a meetup group
I want to find out which meetup groups
other people join after this one
So that I can join those groups
36. Find next group people join
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WHERE membership.joined < otherMembership.joined
WITH member, otherGroup
ORDER BY otherMembership.joined
WITH member, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
38. It feels a bit clunky...
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WHERE membership.joined < otherMembership.joined
WITH member, otherGroup
ORDER BY otherMembership.joined
WITH member, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
‣ We have to scan through all the MEMBER_OF
relationships to find the one we want
‣ It might make our lives easier if we made
membership a first class citizen of the domain
41. Refactor to facts
MATCH (member:Member)-[rel:MEMBER_OF]->(group)
MERGE (membership:Membership {id: member.id + "_" + group.id})
SET membership.joined = rel.joined
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
42. Refactor to facts
MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)
WITH member, membership ORDER BY member.id, membership.joined
WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx
WITH memberships[idx] AS m1, memberships[idx+1] AS m2
MERGE (m1)-[:NEXT]->(m2)
43. Find next group people join
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
(membership)<-[:HAS_MEMBERSHIP]-(member:Member)
-[:HAS_MEMBERSHIP]->(nextMembership),
(nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
44. Comparing the approaches
vs
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WITH member, membership, otherMembership, otherGroup
ORDER BY member.id, otherMembership.joined
WHERE membership.joined < otherMembership.joined
WITH member, membership, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
(membership)<-[:HAS_MEMBERSHIP]-(member:Member)
-[:HAS_MEMBERSHIP]->(nextMembership),
(nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
45. How do I profile a query?
‣ EXPLAIN
• shows the execution plan without actually
executing it or returning any results.
‣ PROFILE
• executes the statement and returns the results
along with profiling information.
45
49. What is our goal?
At a high level, the goal is
simple: get the number of
db hits down.
49
50. an abstract unit of storage
engine work.
What is a database hit?
“
”
50
51. Comparing the approaches
Cypher version: CYPHER 2.3,
planner: COST.
111656 total db hits in 330 ms.
vs
Cypher version: CYPHER 2.3,
planner: COST.
23650 total db hits in 60 ms.
52. Tip: Profile your queries
‣ Spike the different models and see which
one performs the best
53. Should we keep both models?
We could but when we add, edit or remove a membership
we’d have to keep both graph structures in sync.
54. Adding a group membership
WITH "Mark Needham" AS memberName,
"Neo4j - London User Group" AS groupName,
timestamp() AS now
MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})
MERGE (member)-[memberOfRel:MEMBER_OF]->(group)
ON CREATE SET memberOfRel.time = now
MERGE (membership:Membership {id: member.id + "_" + group.id})
ON CREATE SET membership.joined = now
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
55. Removing a group membership
WITH "Mark Needham" AS memberName,
"Neo4j - London User Group" AS groupName,
timestamp() AS now
MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})
MATCH (member)-[memberOfRel:MEMBER_OF]->(group)
MATCH (membership:Membership {id: member.id + "_" + group.id})
MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership)
MATCH (membership)-[ofGroupRel:OF_GROUP]->(group)
DELETE memberOfRel, hasMembershipRel, ofGroupRel, membership
57. ...not so fast!
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
58. Why not delete MEMBER_OF?
MATCH (member:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
MATCH (member:Member {name: "Mark Needham"})
-[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
59. Why not delete MEMBER_OF?
MATCH (member:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
MATCH (member:Member {name: "Mark Needham"})
-[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
433318 total db hits in 485 ms.
83268 total db hits in 117 ms.
60. Tip: Maintaining multiple models
‣ Different models perform better for
different queries but worse for others
‣ Optimising for reads may mean we pay a
write and maintenance penalty
62. Events in my groups
As a member of several meetup groups who
has previously attended events
I want to find other events hosted by those
groups
So that I can attend those events
63. Events in my groups
As a member of several meetup
groups who has previously
attended events
I want to find other events
hosted by those groups
So that I can attend those events
64. WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"}),
(member)-[:MEMBER_OF]->(group),
(group)-[:HOSTED_EVENT]->(futureEvent)
WHERE futureEvent.time >= timestamp()
AND NOT (member)-[:RSVPD]->(futureEvent)
RETURN group.name, futureEvent.name,
round((futureEvent.time - timestamp()) / oneDay) AS days
ORDER BY days
LIMIT 10
Events in my groups
66. + previous events attended
WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"})
MATCH (futureEvent:Event)
WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)
MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)
WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember
OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)
WHERE pastEvent.time < timestamp()
RETURN group.name,
futureEvent.name,
isMember,
COUNT(rsvp) AS previousEvents,
round((futureEvent.time - timestamp()) / oneDay) AS days
ORDER BY days, previousEvents DESC
68. RSVPD_YES vs RSVPD
I was curious whether refactoring
RSVPD {response: "yes"} to RSVPD_YES would have
any impact as Neo4j is optimised for querying
by unique relationship types.
69. Refactor to specific relationships
MATCH (m:Member)-[rsvp:RSVPD {response:"yes"}]->(event)
MERGE (m)-[rsvpYes:RSVPD_YES {id: rsvp.id}]->(event)
ON CREATE SET rsvpYes.created = rsvp.created,
rsvpYes.lastModified = rsvp.lastModified;
MATCH (m:Member)-[rsvp:RSVPD {response:"no"}]->(event)
MERGE (m)-[rsvpYes:RSVPD_NO {id: rsvp.id}]->(event)
ON CREATE SET rsvpYes.created = rsvp.created,
rsvpYes.lastModified = rsvp.lastModified;
70. RSVPD_YES vs RSVPD
RSVPD {response: "yes"}
vs
RSVPD_YES
Cypher version: CYPHER 2.3,
planner: COST.
688635 total db hits in 232 ms.
Cypher version: CYPHER 2.3,
planner: COST.
559866 total db hits in 207 ms.
71. Why would we keep RSVPD?
MATCH (m:Member)-[r:RSVPD]->(event)<-[:HOSTED_EVENT]-(group)
WHERE m.name = "Mark Needham"
RETURN event, group, r
MATCH (m:Member)-[r:RSVPD_YES|:RSVPD_NO|:RSVPD_WAITLIST]->(event),
(event)<-[:HOSTED_EVENT]-(group)
WHERE m.name = "Mark Needham"
RETURN event, group, r
72. Tip: Specific relationships
‣ Neo4j is optimised for querying by unique
relationship types…
‣ ...but sometimes we pay a query
maintenance cost to achieve this
73. + events my friends are attending
There’s an implicit FRIENDS relationship
between people who attended the same events.
Let’s make it explicit.
M
E
M
RSVPD
RSVPD
FRIENDS
M
E
M
RSVPD
RSVPD
74. + events my friends are attending
MATCH (m1:Member)
WHERE NOT m1:Processed
WITH m1 LIMIT {limit}
MATCH (m1)-[:RSVP_YES]->(event:Event)<-[:RSVP_YES]-(m2:Member)
WITH m1, m2, COLLECT(event) AS events, COUNT(*) AS times
WHERE times >= 5
WITH m1, m2, times, [event IN events | SIZE((event)<-[:RSVP_YES]-())] AS attendances
WITH m1, m2, REDUCE(score = 0.0, a IN attendances | score + (1.0 / a)) AS score
MERGE (m1)-[friendsRel:FRIENDS]-(m2)
SET friendsRel.score = row.score
75. Bidirectional relationships
‣ You may have noticed that we didn’t specify a
direction when creating the relationship
MERGE (m1)-[:FRIENDS]-(m2)
‣ FRIENDS is a bidirectional relationship. We only
need to create it once between two people.
‣ We ignore the direction when querying
76. + events my friends are attending
WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"})
MATCH (futureEvent:Event)
WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)
MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)
WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember
OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)
WHERE pastEvent.time < timestamp()
WITH oneDay, group, futureEvent, member, isMember, COUNT(rsvp) AS previousEvents
OPTIONAL MATCH (futureEvent)<-[:HOSTED_EVENT]-()-[:HAS_TOPIC]->(topic)<-[:INTERESTED_IN]-(member)
WITH oneDay, group, futureEvent, member, isMember, previousEvents, COUNT(topic) AS topics
OPTIONAL MATCH (member)-[:FRIENDS]-(:Member)-[rsvpYes:RSVP_YES]->(futureEvent)
RETURN group.name, futureEvent.name, isMember, round((futureEvent.time - timestamp()) / oneDay) AS days,
previousEvents, topics, COUNT(rsvpYes) AS friendsGoing
ORDER BY days, friendsGoing DESC, previousEvents DESC
LIMIT 15
78. Tip: Bidirectional relationships
‣ Some relationships are bidirectional in nature
‣ Neo4j always stores relationships with a
direction but we can choose to ignore that
when we query
79. tl;dr
‣ Model incrementally
‣ Always profile your queries
‣ Consider making the implicit explicit…
• ...but beware the maintenance cost
‣ Be specific with relationship types
‣ Ignore direction for bidirectional relationships
80. That’s all for today!
Questions? :-)
Mark Needham @markhneedham
https://github.com/neo4j-meetups/modeling-worked-example