Graph Data
Modelling
Neo4j Inc. All rights reserved 2024
2
Summary of Scenarios
Scenario 1:
● Does our problem involve
understanding
relationships between
entities?
Scenario 3:
● Does the problem explore
relationships of varying or
unknown depth?
Scenario 2:
● Does the problem involve
a lot of self-referencing to
the same type of entity?
Scenario 4:
● Does our problem involve
discovering lots of
different routes or paths?
Neo4j Inc. All rights reserved 2024
3
Relational Versus Graph Models
Neo4j Inc. All rights reserved 2024
4
Path is starting point
What nodes are visited
to find Ann’s residence?
MATCH (p:Person)-[:OWNS]->(r)
WHERE p.name = 'Ann'
RETURN r
Person Person
Location
Residence
MARRIED
LIVES_AT
LIVES_AT
OWNS
name: ‘Dan’
born: 1975
name: ‘Ann’
born: 1977
address: ‘475 Broad Street’
postalCode: 28394
since: 2005-02-14
financed=TRUE
Neo4j Inc. All rights reserved 2024
5
1. Anchor node label
Anchor node properties (indexed)
Anchor relationship type
Anchor relationship properties (indexed)
2. Downstream node labels
Downstream relationship types
3. Anchor node/relationship properties
(non-indexed)
4. Downstream node/relationship properties
Hierarchy of Accessibility
Anchor
Node
Downstream
Nodes
For each data object, how much work must Neo4j do to evaluate if this is a “good” path or a “bad” one?
Most accessible
Least processing required
Least accessible
Most processing required
Neo4j Inc. All rights reserved 2024
6
Graph data modelling
Generic Graph specific
Neo4j Inc. All rights reserved 2024
7
Collaborative effort
Nodes
Neo4j Inc. All rights reserved 2024
8
Identify Entities from Questions
Entities are the nouns in the application questions:
1. What ingredients are used in a recipe?
2. Who is married to this person?
● The generic nouns often become labels in the model
● Use domain knowledge deciding how to further group or differentiate entities
Neo4j Inc. All rights reserved 2024
9
Best practice: Avoid complex types for
properties
Neo4j Inc. All rights reserved 2024
10
Relationships
Neo4j Inc. All rights reserved 2024
11
Identify Connections between Entities
Connections are the verbs in the application questions:
● What ingredients are used in a recipe?
● Who is married to this person?
Neo4j Inc. All rights reserved 2024
12
Qualifying a relationship
Use properties to describe the weight or quality of the relationship.
Neo4j Inc. All rights reserved 2024
13
Graph Modelling patterns
Neo4j Inc. All rights reserved 2024
14
Intermediate Nodes
Create intermediate nodes when you need to:
● Connect more than two nodes in a single context
● Relate something to a relationship
IN_ROLE
Neo4j Inc. All rights reserved 2024
15
Using Intermediate Nodes
Neo4j Inc. All rights reserved 2024
16
Intermediate Nodes: Sharing Context
Neo4j Inc. All rights reserved 2024
17
Intermediate Nodes: Sharing Data
Before
After
Neo4j Inc. All rights reserved 2024
18
Intermediate Nodes: Organizing Data
Neo4j Inc. All rights reserved 2024
19
Linked Lists
Episodes of the Dr. Who series
DO NOT do this (doubly-linked list):
Neo4j Inc. All rights reserved 2024
20
Interleaved Linked List
Neo4j Inc. All rights reserved 2024
21
Head and Tail of Linked List
Some possible use cases:
● Add episodes as they are broadcast
● Maintain pointer to first and last
episodes
● Find all broadcast episodes
● Find latest broadcast episode
Current item
Neo4j Inc. All rights reserved 2024
22
Timeline Tree
Neo4j Inc. All rights reserved 2024
23
Using Multiple Structures
Neo4j Inc. All rights reserved 2024
24
Using The Timeline Tree
Neo4j Inc. All rights reserved 2024
25
Using Intermediate Nodes
Neo4j Inc. All rights reserved 2024
26
Using Linked Lists
Neo4j Inc. All rights reserved 2024
27
Airline Flight
Management
Exercise 1
● Create a graph data
model
Neo4j Inc. All rights reserved 2023
29
Exercise 1 Question
“Which airports can I fly to from Las Vegas airport?”
from to airline flightNumber date departure arrival
LAS LAX WN 82 2021-03-01 1715 1820
LAS ABQ WN 500 2021-03-01 1445 1710
Our Data
Neo4j Inc. All rights reserved 2024
30
Exercise 1 Instructions
Steps:
1. Identify the entities and relationships based on the question.
Remember: Use the Simplest Model Possible (SMP)!
2. Go to Arrows - https://arrows.app/
3. Create the graph data model using the sample data and the entities and
relationships you have identified.
from to airline flightNumber date departure arrival
LAS LAX WN 82 2021-03-01 1715 1820
LAS ABQ WN 500 2021-03-01 1445 1710
Which airports can I fly to from Las Vegas airport?
Neo4j Inc. All rights reserved 2024
31
Entities
Which airports can I fly to from Las Vegas airport?
Answer: Airport
Exercise 1: Solution
Relationships
Which airports can I fly to from Las Vegas airport?
Answer: FLIES_TO
Neo4j Inc. All rights reserved 2024
32
Exercise 1 Solution
Note
No extra properties aside from
the airport ‘code’
The simplest model to answer
the question.
Neo4j Inc. All rights reserved 2024
33
Exercise 1 The Model
Neo4j Inc. All rights reserved 2024
34
Exercise 1 Solution checks
● Can we answer our question with the model?
﹣ “Which airports can I fly to from Las Vegas airport?”
● Does the model answer other questions?
﹣ How many airports are connected to a given airport?
﹣ How many airports are there?
MATCH (:Airport {code: 'LAS'})-[:FLIES_TO]->(destination:Airport)
RETURN destination.code
Neo4j Inc. All rights reserved 2024
35
Exercise 2
● Apply best practices
Neo4j Inc. All rights reserved 2024
36
Exercise 2 Question
“What are the origin and destination airports for a
specific flight?”
from to airline flightNumber date departure arrival
LAS LAX WN 82 2021-03-01 1715 1820
LAS ABQ WN 500 2021-03-01 1445 1710
Still Our
Data
Exercise 2 Instructions
Question: Given a flight number, find the origin and destination airports.
“What are the origin and destination airports for a specific flight?”
Steps:
1. Identify the (new!) entities and relationships based on the question.
Remember: SMP!
2. Go to Arrows - https://arrows.app/
3. Update the graph data model using the sample data and the entities and
relationships you have identified (think intermediate nodes).
from to airline flightNumber date departure arrival
LAS LAX WN 82 2021-03-01 1715 1820
LAS ABQ WN 500 2021-03-01 1445 1710
Relationships
“What are the and airports for a specific flight?”
Entities
“What are the origin and destination for a specific ?”
Exercise 2 Entities and Relationships
Answer: Flight
flight
airports
Answer:
DEPARTS_FROM,
ARRIVES_AT
origin destination
Exercise 2 A solution
Exercise 2 A solution
Do we
need these?
Exercise 2 A better solution?
Exercise 2 Model
Exercise 2 Solution checks
● Can we answer our questions with the model?
﹣ “Which airports can I fly to from Las Vegas airport?”
﹣ “What are the origin and destination airports for a specific flight?
MATCH
(:Airport {code: 'LAS'})<-[:DEPARTS_FROM]-(f:Flight),
(f)-[:ARRIVES_AT]->(destination:Airport)
RETURN
destination.code
MATCH
(origin:Airport)<-[:DEPARTS_FROM]-(f:Flight),
(f)-[:ARRIVES_AT]->(destination:Airport)
WHERE
f.flightNumber = '500' AND f.departure = 1445
RETURN
origin.code, destination.code
Exercise 3
● Graph refactoring
Neo4j Inc. All rights reserved 2024
45
Exercise 3 Problem
Airports have too many flights in a day
* https://www.heathrow.com/company/about-heathrow/performance/airport-operations/traffic-statistics
- Heathrow ~1,300 flights per day
- Totalling ~474,500 per year
- 4.7 Million over a decade
- Just 1 airport *
Solutions?
• Bigger Machines
• Less Flights
• Accept the time costs
• Remodel
• Add AirportDay intermediate nodes
(We’ve seen it before!)
?
Dense Node
Exercise 3 Instructions
● Open Arrows
● Refactor your model to add AirportDay nodes
?
Exercise 3 Solution
Thank you!
Questions?
Neo4j Inc. All rights reserved 2024
49

Neo4j Graph Data Modelling Session - GraphTalk

  • 1.
  • 2.
    Neo4j Inc. Allrights reserved 2024 2
  • 3.
    Summary of Scenarios Scenario1: ● Does our problem involve understanding relationships between entities? Scenario 3: ● Does the problem explore relationships of varying or unknown depth? Scenario 2: ● Does the problem involve a lot of self-referencing to the same type of entity? Scenario 4: ● Does our problem involve discovering lots of different routes or paths? Neo4j Inc. All rights reserved 2024 3
  • 4.
    Relational Versus GraphModels Neo4j Inc. All rights reserved 2024 4
  • 5.
    Path is startingpoint What nodes are visited to find Ann’s residence? MATCH (p:Person)-[:OWNS]->(r) WHERE p.name = 'Ann' RETURN r Person Person Location Residence MARRIED LIVES_AT LIVES_AT OWNS name: ‘Dan’ born: 1975 name: ‘Ann’ born: 1977 address: ‘475 Broad Street’ postalCode: 28394 since: 2005-02-14 financed=TRUE Neo4j Inc. All rights reserved 2024 5
  • 6.
    1. Anchor nodelabel Anchor node properties (indexed) Anchor relationship type Anchor relationship properties (indexed) 2. Downstream node labels Downstream relationship types 3. Anchor node/relationship properties (non-indexed) 4. Downstream node/relationship properties Hierarchy of Accessibility Anchor Node Downstream Nodes For each data object, how much work must Neo4j do to evaluate if this is a “good” path or a “bad” one? Most accessible Least processing required Least accessible Most processing required Neo4j Inc. All rights reserved 2024 6
  • 7.
    Graph data modelling GenericGraph specific Neo4j Inc. All rights reserved 2024 7 Collaborative effort
  • 8.
    Nodes Neo4j Inc. Allrights reserved 2024 8
  • 9.
    Identify Entities fromQuestions Entities are the nouns in the application questions: 1. What ingredients are used in a recipe? 2. Who is married to this person? ● The generic nouns often become labels in the model ● Use domain knowledge deciding how to further group or differentiate entities Neo4j Inc. All rights reserved 2024 9
  • 10.
    Best practice: Avoidcomplex types for properties Neo4j Inc. All rights reserved 2024 10
  • 11.
    Relationships Neo4j Inc. Allrights reserved 2024 11
  • 12.
    Identify Connections betweenEntities Connections are the verbs in the application questions: ● What ingredients are used in a recipe? ● Who is married to this person? Neo4j Inc. All rights reserved 2024 12
  • 13.
    Qualifying a relationship Useproperties to describe the weight or quality of the relationship. Neo4j Inc. All rights reserved 2024 13
  • 14.
    Graph Modelling patterns Neo4jInc. All rights reserved 2024 14
  • 15.
    Intermediate Nodes Create intermediatenodes when you need to: ● Connect more than two nodes in a single context ● Relate something to a relationship IN_ROLE Neo4j Inc. All rights reserved 2024 15
  • 16.
    Using Intermediate Nodes Neo4jInc. All rights reserved 2024 16
  • 17.
    Intermediate Nodes: SharingContext Neo4j Inc. All rights reserved 2024 17
  • 18.
    Intermediate Nodes: SharingData Before After Neo4j Inc. All rights reserved 2024 18
  • 19.
    Intermediate Nodes: OrganizingData Neo4j Inc. All rights reserved 2024 19
  • 20.
    Linked Lists Episodes ofthe Dr. Who series DO NOT do this (doubly-linked list): Neo4j Inc. All rights reserved 2024 20
  • 21.
    Interleaved Linked List Neo4jInc. All rights reserved 2024 21
  • 22.
    Head and Tailof Linked List Some possible use cases: ● Add episodes as they are broadcast ● Maintain pointer to first and last episodes ● Find all broadcast episodes ● Find latest broadcast episode Current item Neo4j Inc. All rights reserved 2024 22
  • 23.
    Timeline Tree Neo4j Inc.All rights reserved 2024 23
  • 24.
    Using Multiple Structures Neo4jInc. All rights reserved 2024 24
  • 25.
    Using The TimelineTree Neo4j Inc. All rights reserved 2024 25
  • 26.
    Using Intermediate Nodes Neo4jInc. All rights reserved 2024 26
  • 27.
    Using Linked Lists Neo4jInc. All rights reserved 2024 27
  • 28.
  • 29.
    Exercise 1 ● Createa graph data model Neo4j Inc. All rights reserved 2023 29
  • 30.
    Exercise 1 Question “Whichairports can I fly to from Las Vegas airport?” from to airline flightNumber date departure arrival LAS LAX WN 82 2021-03-01 1715 1820 LAS ABQ WN 500 2021-03-01 1445 1710 Our Data Neo4j Inc. All rights reserved 2024 30
  • 31.
    Exercise 1 Instructions Steps: 1.Identify the entities and relationships based on the question. Remember: Use the Simplest Model Possible (SMP)! 2. Go to Arrows - https://arrows.app/ 3. Create the graph data model using the sample data and the entities and relationships you have identified. from to airline flightNumber date departure arrival LAS LAX WN 82 2021-03-01 1715 1820 LAS ABQ WN 500 2021-03-01 1445 1710 Which airports can I fly to from Las Vegas airport? Neo4j Inc. All rights reserved 2024 31
  • 32.
    Entities Which airports canI fly to from Las Vegas airport? Answer: Airport Exercise 1: Solution Relationships Which airports can I fly to from Las Vegas airport? Answer: FLIES_TO Neo4j Inc. All rights reserved 2024 32
  • 33.
    Exercise 1 Solution Note Noextra properties aside from the airport ‘code’ The simplest model to answer the question. Neo4j Inc. All rights reserved 2024 33
  • 34.
    Exercise 1 TheModel Neo4j Inc. All rights reserved 2024 34
  • 35.
    Exercise 1 Solutionchecks ● Can we answer our question with the model? ﹣ “Which airports can I fly to from Las Vegas airport?” ● Does the model answer other questions? ﹣ How many airports are connected to a given airport? ﹣ How many airports are there? MATCH (:Airport {code: 'LAS'})-[:FLIES_TO]->(destination:Airport) RETURN destination.code Neo4j Inc. All rights reserved 2024 35
  • 36.
    Exercise 2 ● Applybest practices Neo4j Inc. All rights reserved 2024 36
  • 37.
    Exercise 2 Question “Whatare the origin and destination airports for a specific flight?” from to airline flightNumber date departure arrival LAS LAX WN 82 2021-03-01 1715 1820 LAS ABQ WN 500 2021-03-01 1445 1710 Still Our Data
  • 38.
    Exercise 2 Instructions Question:Given a flight number, find the origin and destination airports. “What are the origin and destination airports for a specific flight?” Steps: 1. Identify the (new!) entities and relationships based on the question. Remember: SMP! 2. Go to Arrows - https://arrows.app/ 3. Update the graph data model using the sample data and the entities and relationships you have identified (think intermediate nodes). from to airline flightNumber date departure arrival LAS LAX WN 82 2021-03-01 1715 1820 LAS ABQ WN 500 2021-03-01 1445 1710
  • 39.
    Relationships “What are theand airports for a specific flight?” Entities “What are the origin and destination for a specific ?” Exercise 2 Entities and Relationships Answer: Flight flight airports Answer: DEPARTS_FROM, ARRIVES_AT origin destination
  • 40.
    Exercise 2 Asolution
  • 41.
    Exercise 2 Asolution Do we need these?
  • 42.
    Exercise 2 Abetter solution?
  • 43.
  • 44.
    Exercise 2 Solutionchecks ● Can we answer our questions with the model? ﹣ “Which airports can I fly to from Las Vegas airport?” ﹣ “What are the origin and destination airports for a specific flight? MATCH (:Airport {code: 'LAS'})<-[:DEPARTS_FROM]-(f:Flight), (f)-[:ARRIVES_AT]->(destination:Airport) RETURN destination.code MATCH (origin:Airport)<-[:DEPARTS_FROM]-(f:Flight), (f)-[:ARRIVES_AT]->(destination:Airport) WHERE f.flightNumber = '500' AND f.departure = 1445 RETURN origin.code, destination.code
  • 45.
    Exercise 3 ● Graphrefactoring Neo4j Inc. All rights reserved 2024 45
  • 46.
    Exercise 3 Problem Airportshave too many flights in a day * https://www.heathrow.com/company/about-heathrow/performance/airport-operations/traffic-statistics - Heathrow ~1,300 flights per day - Totalling ~474,500 per year - 4.7 Million over a decade - Just 1 airport * Solutions? • Bigger Machines • Less Flights • Accept the time costs • Remodel • Add AirportDay intermediate nodes (We’ve seen it before!) ? Dense Node
  • 47.
    Exercise 3 Instructions ●Open Arrows ● Refactor your model to add AirportDay nodes ?
  • 48.
  • 49.
    Thank you! Questions? Neo4j Inc.All rights reserved 2024 49