© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
Graph Data Modeling Best
Practices
Eric Monk,
Principal Solutions Engineer
© 2022 Neo4j, Inc. All rights reserved.
What is a Graph Data Model?
The structure and definition of the Nodes, Relationships, and Properties.
Structure = How Labeled Nodes are connected via Relationship Types.
Definition = Labels and Relationship Types used. Assignment of Properties to
Labeled Nodes and Relationship Types along with their data types.
2
Explicit Graph Data Modeling is OPTIONAL, but very useful.
© 2022 Neo4j, Inc. All rights reserved.
Instance vs Model
3
:Person :Person
:KNOWS name: String
age: Integer
Person Node Label
KNOWS Relationship Type
alice
:Person
bob
:Person
:KNOWS name: Bob
age: 42
alice Node with
Person Node Label
KNOWS Relationship
name and age
Property Definitions
Model
name and age
Properties with values
Instance
© 2022 Neo4j, Inc. All rights reserved.
4
© 2022 Neo4j, Inc. All rights reserved.
5
We are here:
Data Modeling
© 2022 Neo4j, Inc. All rights reserved.
6
+ More Questions
+ More Queries
+ More Data
Need to harmonize
inputs from multiple
sources
We need to iterate
+ Domain Expertise
+ Performance
Concerns
© 2022 Neo4j, Inc. All rights reserved.
Types of Modeling
Whiteboard - graph domain information
Instance Model - graph example data to solve questions
Logical Model - use Node Labels and Relationship Types to define graph
structure + some Property Definition
Physical Model - augment Logical Model with Property Definitions to support
data load + constraints
Tuned Model - account for previous misconceptions and performance
concerns
7
© 2022 Neo4j, Inc. All rights reserved.
8
Types of Models
Model Type Domain
Expertise
Questions Data Load Cypher
Queries
Performance Constraints /
Indexes
Whiteboard Y Y
Instance
Model
Y Y *
Logical Model Y Y * *
Physical Model Y Y Y Y * Y
Tuned Model Y Y Y Y Y Y
* Partial support
© 2022 Neo4j, Inc. All rights reserved.
9
Types of Models
Model Type Domain
Expertise
Questions Data Load Cypher
Queries
Performance Constraints /
Indexes
Whiteboard Y Y
Instance
Model
Y Y *
Logical Model Y Y * *
Physical Model Y Y Y Y * Y
Tuned Model Y Y Y Y Y Y
* Partial support
Similar level of detail
- slightly different
approaches
© 2022 Neo4j, Inc. All rights reserved.
10
See
Draw your model
visually. Use a tool
such as Arrows to
create and view your
model.
Whiteboard Tips
Know
Use domain relevant
language. Collaborate
with domain experts to
understand the real
world processes the
data represents.
Limit
Initially limit the scope
of your model. Only
include concepts and
relationships relevant to
an initial set of
questions.
Focus on concepts and how they are related.
© 2022 Neo4j, Inc. All rights reserved.
11
Whiteboard Example
© 2022 Neo4j, Inc. All rights reserved.
12
Define
Define properties
relevant to the nodes
and relationships.
Instance Model Tips
Fill
Create example nodes
and relationships with
representative data.
Answer
Trace paths to ensure
questions can be
answered. Write
Cypher if needed.
Representative Data. Focus on answering a question.
© 2022 Neo4j, Inc. All rights reserved.
13
Instance Model Example
© 2022 Neo4j, Inc. All rights reserved.
14
Node, Relationship, or Property?
Item/Action Definition Node Relationship Property
Business noun - Y (1) Y (2)
How things relate - Y
Anchor Starting point Y (3) Y (3)
Traverse Visit matching
relationships and nodes
Y Y
Degree Count of relationships per
node
Y Y (4)
Filter Remove paths by
examining property values
Y
Decorator Return property value Y
© 2022 Neo4j, Inc. All rights reserved.
Node, Relationship, or Property? (Notes)
15
Business noun - Node
Domain concept
Unique identifier
Avoid supernodes (very high degree nodes)
Business noun - Property
Attribute of a Domain concept
Small set of possible values
Names, quantities, timestamps
Anchor
Both Node Label and Property are used to
Anchor
Degree
Used to access Neo4j count store
© 2022 Neo4j, Inc. All rights reserved.
16
Define
Define properties
relevant to the nodes
and relationships.
Logical Model Tips
Guided
Use node, relationship,
and property guidance
to decide proper graph
structure.
Answer
Trace paths to ensure
questions can be
answered. Write
Cypher if needed.
Informed Modeling. Focus on answering a question.
© 2022 Neo4j, Inc. All rights reserved.
17
Logical Model Example
© 2022 Neo4j, Inc. All rights reserved.
Data Discovery
• Identify data sources node and properties in the model
• Getting sample data
• Identify primary keys / node keys
• Can correlate across data sources?
• Does your model map to existing sources? Any gaps?
18
© 2022 Neo4j, Inc. All rights reserved.
19
Source
Use details from
relevant data sources
to inform your
modeling.
Physical Model Tips
Keys
Determine node keys,
constraints, and
indexes to prepare for
data load.
Load
Load data. Confirm
structures are loaded
properly and data types
are correct.
Enable loading of data. Question(s) should still be answered.
© 2022 Neo4j, Inc. All rights reserved.
20
Physical Model Example
© 2022 Neo4j, Inc. All rights reserved.
21
Iterate
Make adjustments as
needed to tune model
based on additional
data added or new
questions to be
answered.
Tuning Your Model
Performance
Use specific
Relationship Types to
assist traversals.
Ensure constraints and
indexes are used
properly.
Validate
Validate the model to
ensure data is loaded
properly, questions are
answered.
Focus on optimizing the model.
© 2022 Neo4j, Inc. All rights reserved.
22
Relationship Types Example
© 2022 Neo4j, Inc. All rights reserved.
23
images by OokamiKasumi and TransparentJiggly64
https://www.deviantart.com/ookamikasumi/art/Many-Doors-to-Wonderland-166917185 https://www.deviantart.com/transparentjiggly64/art/Champion-Link-Facing-Away-803623424
Room
Imagine a room with
many doors.
Room = node
Door = relationship
© 2022 Neo4j, Inc. All rights reserved.
Room
24
Treasure
Our adventurer
wants treasure.
© 2022 Neo4j, Inc. All rights reserved.
Room
25
PATH_TO
PATH_TO
PATH_TO
PATH_TO
PATH_TO
???
Treasure
Uhh…which way to
go?
© 2022 Neo4j, Inc. All rights reserved.
26
NOT!
Treasure
Relationship
traversal NOT
optimal!
© 2022 Neo4j, Inc. All rights reserved.
Room
27
PATH_TO
_ROOM
PATH_TO
_SHOP
PATH_TO_
MONSTER
PATH_TO
_TREASURE
PATH_TO
_GROOSE
Ahh! I know
which way to go
This WAY!
© 2022 Neo4j, Inc. All rights reserved.
28
Other Optimizations and
Validation
© 2022 Neo4j, Inc. All rights reserved.
Tuning Optimizations
• Have specific Relationship Types that lead to distinct Node Types
◦ Avoid the issue of extra traversals + filtering on end Node Labels
• Relationship Locking on Writes
◦ Be cognizant of potential locking issues
◦ Better in 4.3, but could still be an issue
• Avoid traversals through high degree nodes
◦ Cardinality expansion will kill performance and increase memory consumption
• Smart Aggregation
◦ Build intermediate aggregate nodes or use GDS
29
© 2022 Neo4j, Inc. All rights reserved.
Model Validation
• Sanity Checks
◦ Does your model answer your business problem(s)?
◦ Does your model ingest data properly?
• Usability
◦ During testing, does your model introduce usability issues?
• Performance
◦ During testing, does your model perform fast enough?
30
© 2022 Neo4j, Inc. All rights reserved.
Summary
• Whiteboard to get started
• Use Instance modeling and Logical modeling to answer questions
• Use Physical modeling to load data
• Tune your model
• Iterate
31
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
32
Happy Modeling!
Contact us at
sales@neo4j.com

Graph Data Modeling Best Practices(Eric_Monk).pptx

  • 1.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Graph Data Modeling Best Practices Eric Monk, Principal Solutions Engineer
  • 2.
    © 2022 Neo4j,Inc. All rights reserved. What is a Graph Data Model? The structure and definition of the Nodes, Relationships, and Properties. Structure = How Labeled Nodes are connected via Relationship Types. Definition = Labels and Relationship Types used. Assignment of Properties to Labeled Nodes and Relationship Types along with their data types. 2 Explicit Graph Data Modeling is OPTIONAL, but very useful.
  • 3.
    © 2022 Neo4j,Inc. All rights reserved. Instance vs Model 3 :Person :Person :KNOWS name: String age: Integer Person Node Label KNOWS Relationship Type alice :Person bob :Person :KNOWS name: Bob age: 42 alice Node with Person Node Label KNOWS Relationship name and age Property Definitions Model name and age Properties with values Instance
  • 4.
    © 2022 Neo4j,Inc. All rights reserved. 4
  • 5.
    © 2022 Neo4j,Inc. All rights reserved. 5 We are here: Data Modeling
  • 6.
    © 2022 Neo4j,Inc. All rights reserved. 6 + More Questions + More Queries + More Data Need to harmonize inputs from multiple sources We need to iterate + Domain Expertise + Performance Concerns
  • 7.
    © 2022 Neo4j,Inc. All rights reserved. Types of Modeling Whiteboard - graph domain information Instance Model - graph example data to solve questions Logical Model - use Node Labels and Relationship Types to define graph structure + some Property Definition Physical Model - augment Logical Model with Property Definitions to support data load + constraints Tuned Model - account for previous misconceptions and performance concerns 7
  • 8.
    © 2022 Neo4j,Inc. All rights reserved. 8 Types of Models Model Type Domain Expertise Questions Data Load Cypher Queries Performance Constraints / Indexes Whiteboard Y Y Instance Model Y Y * Logical Model Y Y * * Physical Model Y Y Y Y * Y Tuned Model Y Y Y Y Y Y * Partial support
  • 9.
    © 2022 Neo4j,Inc. All rights reserved. 9 Types of Models Model Type Domain Expertise Questions Data Load Cypher Queries Performance Constraints / Indexes Whiteboard Y Y Instance Model Y Y * Logical Model Y Y * * Physical Model Y Y Y Y * Y Tuned Model Y Y Y Y Y Y * Partial support Similar level of detail - slightly different approaches
  • 10.
    © 2022 Neo4j,Inc. All rights reserved. 10 See Draw your model visually. Use a tool such as Arrows to create and view your model. Whiteboard Tips Know Use domain relevant language. Collaborate with domain experts to understand the real world processes the data represents. Limit Initially limit the scope of your model. Only include concepts and relationships relevant to an initial set of questions. Focus on concepts and how they are related.
  • 11.
    © 2022 Neo4j,Inc. All rights reserved. 11 Whiteboard Example
  • 12.
    © 2022 Neo4j,Inc. All rights reserved. 12 Define Define properties relevant to the nodes and relationships. Instance Model Tips Fill Create example nodes and relationships with representative data. Answer Trace paths to ensure questions can be answered. Write Cypher if needed. Representative Data. Focus on answering a question.
  • 13.
    © 2022 Neo4j,Inc. All rights reserved. 13 Instance Model Example
  • 14.
    © 2022 Neo4j,Inc. All rights reserved. 14 Node, Relationship, or Property? Item/Action Definition Node Relationship Property Business noun - Y (1) Y (2) How things relate - Y Anchor Starting point Y (3) Y (3) Traverse Visit matching relationships and nodes Y Y Degree Count of relationships per node Y Y (4) Filter Remove paths by examining property values Y Decorator Return property value Y
  • 15.
    © 2022 Neo4j,Inc. All rights reserved. Node, Relationship, or Property? (Notes) 15 Business noun - Node Domain concept Unique identifier Avoid supernodes (very high degree nodes) Business noun - Property Attribute of a Domain concept Small set of possible values Names, quantities, timestamps Anchor Both Node Label and Property are used to Anchor Degree Used to access Neo4j count store
  • 16.
    © 2022 Neo4j,Inc. All rights reserved. 16 Define Define properties relevant to the nodes and relationships. Logical Model Tips Guided Use node, relationship, and property guidance to decide proper graph structure. Answer Trace paths to ensure questions can be answered. Write Cypher if needed. Informed Modeling. Focus on answering a question.
  • 17.
    © 2022 Neo4j,Inc. All rights reserved. 17 Logical Model Example
  • 18.
    © 2022 Neo4j,Inc. All rights reserved. Data Discovery • Identify data sources node and properties in the model • Getting sample data • Identify primary keys / node keys • Can correlate across data sources? • Does your model map to existing sources? Any gaps? 18
  • 19.
    © 2022 Neo4j,Inc. All rights reserved. 19 Source Use details from relevant data sources to inform your modeling. Physical Model Tips Keys Determine node keys, constraints, and indexes to prepare for data load. Load Load data. Confirm structures are loaded properly and data types are correct. Enable loading of data. Question(s) should still be answered.
  • 20.
    © 2022 Neo4j,Inc. All rights reserved. 20 Physical Model Example
  • 21.
    © 2022 Neo4j,Inc. All rights reserved. 21 Iterate Make adjustments as needed to tune model based on additional data added or new questions to be answered. Tuning Your Model Performance Use specific Relationship Types to assist traversals. Ensure constraints and indexes are used properly. Validate Validate the model to ensure data is loaded properly, questions are answered. Focus on optimizing the model.
  • 22.
    © 2022 Neo4j,Inc. All rights reserved. 22 Relationship Types Example
  • 23.
    © 2022 Neo4j,Inc. All rights reserved. 23 images by OokamiKasumi and TransparentJiggly64 https://www.deviantart.com/ookamikasumi/art/Many-Doors-to-Wonderland-166917185 https://www.deviantart.com/transparentjiggly64/art/Champion-Link-Facing-Away-803623424 Room Imagine a room with many doors. Room = node Door = relationship
  • 24.
    © 2022 Neo4j,Inc. All rights reserved. Room 24 Treasure Our adventurer wants treasure.
  • 25.
    © 2022 Neo4j,Inc. All rights reserved. Room 25 PATH_TO PATH_TO PATH_TO PATH_TO PATH_TO ??? Treasure Uhh…which way to go?
  • 26.
    © 2022 Neo4j,Inc. All rights reserved. 26 NOT! Treasure Relationship traversal NOT optimal!
  • 27.
    © 2022 Neo4j,Inc. All rights reserved. Room 27 PATH_TO _ROOM PATH_TO _SHOP PATH_TO_ MONSTER PATH_TO _TREASURE PATH_TO _GROOSE Ahh! I know which way to go This WAY!
  • 28.
    © 2022 Neo4j,Inc. All rights reserved. 28 Other Optimizations and Validation
  • 29.
    © 2022 Neo4j,Inc. All rights reserved. Tuning Optimizations • Have specific Relationship Types that lead to distinct Node Types ◦ Avoid the issue of extra traversals + filtering on end Node Labels • Relationship Locking on Writes ◦ Be cognizant of potential locking issues ◦ Better in 4.3, but could still be an issue • Avoid traversals through high degree nodes ◦ Cardinality expansion will kill performance and increase memory consumption • Smart Aggregation ◦ Build intermediate aggregate nodes or use GDS 29
  • 30.
    © 2022 Neo4j,Inc. All rights reserved. Model Validation • Sanity Checks ◦ Does your model answer your business problem(s)? ◦ Does your model ingest data properly? • Usability ◦ During testing, does your model introduce usability issues? • Performance ◦ During testing, does your model perform fast enough? 30
  • 31.
    © 2022 Neo4j,Inc. All rights reserved. Summary • Whiteboard to get started • Use Instance modeling and Logical modeling to answer questions • Use Physical modeling to load data • Tune your model • Iterate 31
  • 32.
    © 2022 Neo4j,Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 32 Happy Modeling! Contact us at sales@neo4j.com