The document discusses data modeling with Neo4j, focusing on graph structures and their components: nodes, relationships, properties, and labels. It emphasizes the advantages of graph databases in managing complex data and highlights the process of designing a data model based on user queries and application goals. Additionally, it contrasts graph models with traditional relational models and discusses best practices for graph data design.
Overview of Neo4j, its community, and the purpose of the webinar focusing on data modeling using graphs.
Graphs help address data complexity influenced by size, semi-structure, and connectedness.
Graphs are widely applicable in social networks, impact analysis, route finding, recommendations, logistics, access control, and fraud analysis.
Four primary components: Nodes, Relationships, Properties, and Labels in graph modeling. Explains what nodes and relationships are, how they function, and their importance in graph databases. Details on the variable structure of relationships and distinctions between aggregate and connected data models.
Differences between relational and graph models, emphasizing the latter's flexibility and suitability for complex data.
Explains how to design a graph model based on user stories and patterns, including query design.
Developing Cypher queries to match user questions to graph patterns and refine data models.
Challenges of creating effective nodes and relationships in graph databases, including normalization.
Emphasizes using relationships for efficient data access and querying structures in a graph.
Examples of how organizations utilize graph databases to solve real business problems across industries.
Encouragement to adopt graph thinking for various applications, highlighting graph databases as a potent data management solution.
Data Modeling with
Neo4j
1
MichaelHunger, Neo Technology
@neo4j | michael@neo4j.org
Thanks to: Ian Robinson, Mark Needham,Alistair Jones
Samstag, 31. August 13
2.
Please ask questions
inthe chat
I‘ll answer at the end.
Follow up email with missing answers,
video and slides.
2
Samstag, 31. August 13
This Webinar
๏Graphs areeverywhere
๏Graph Model Building Blocks
๏(NOSQL) Data Models
๏Designing a Data Model
๏Embrace the Paradigm
4
Samstag, 31. August 13
Nodes
๏ Used torepresent entities in your domain
๏ Can contain properties
• Used to represent entity attributes and/or metadata
(e.g. timestamps, version)
• Key-value pairs
‣Java primitives
‣Arrays
‣null is not a valid value
• Every node can have different properties
Samstag, 31. August 13
Relationships
๏ Every relationshiphas a name and a direction
• Add structure to the graph
• Provide semantic context for nodes
๏ Can contain properties
• Used to represent quality or weight of relationship,
or metadata
๏ Every relationship must have a start node and end node
• No dangling relationships
Samstag, 31. August 13
24.
Relationships (continued)
Nodes canhave
more than one
relationship
Self relationships are
allowed
Nodes can be connected by
more than one relationship
Samstag, 31. August 13
25.
Variable Structure
๏ Relationshipsare defined with regard to node
instances, not classes of nodes
• Different nodes can be connected in different ways
• Allows for structural variation in the domain
• Contrast with relational schemas, where foreign key
relationships apply to all rows in a table
Samstag, 31. August 13
Labels
๏ Every nodecan have zero or more labels attached
๏ Used to represent roles (e.g. user, product, company)
• Group nodes
• Allow us to associate indexes and constraints with
groups of nodes
Samstag, 31. August 13
28.
Four Building Blocks
๏Nodes
• Entities
๏ Relationships
• Connect entities and structure domain
๏ Properties
• Attributes and metadata
๏ Labels
• Group nodes by role
Samstag, 31. August 13
26
“There is asignificant downside - the whole approach works
really well when data access is aligned with the aggregates, but
what if you want to look at the data in a different way? Order
entry naturally stores orders as aggregates, but analyzing
product sales cuts across the aggregate structure. The
advantage of not using an aggregate structure in the database
is that it allows you to slice and dice your data different ways
for different audiences.
This is why aggregate-oriented stores talk so much about map-
reduce.”
Martin Fowler
Aggregate Oriented Model
Samstag, 31. August 13
32.
27
The connected datamodel is based on fine grained elements
that are richly connected, the emphasis is on extracting many
dimensions and attributes as elements.
Connections are cheap and can be used not only for the
domain-level relationships but also for additional structures
that allow efficient access for different use-cases. The fine
grained model requires a external scope for mutating
operations that ensures Atomicity, Consistency, Isolation and
Durability - ACID also known as Transactions.
Michael Hunger
Connected Data Model
Samstag, 31. August 13
Method
1. Identify application/end-usergoals
2. Figure out what questions to ask of the domain
3. Identify entities in each question
4. Identify relationships between entities in each
question
5. Convert entities and relationships to paths
These become the basis of the data model
6. Express questions as graph patterns
These become the basis for queries
Samstag, 31. August 13
55.
From User Storyto Model and Query
1.
User story
4.
Paths
3.
Entities and
relationships
?2.
Questions we want
to ask
5.
Data model
6.
Query
Samstag, 31. August 13
56.
1. Application/End-User Goals
Asan employee
I want to know who in thecompany has similar skills to meSo that we can exchangeknowledge
Samstag, 31. August 13
57.
2. Questions ToAsk of the Domain
Which people, who work for the same
company as me, have similar skills to me?
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
Samstag, 31. August 13
58.
Which people, whowork for the same
company as me, have similar skills to me?
Person
Company
Skill
3. Identify Entities
Samstag, 31. August 13
59.
Which people, whowork for the same
company as me, have similar skills to me?
Person WORKS_FOR Company
Person HAS_SKILL Skill
4. Identify Relationships Between
Entities
Samstag, 31. August 13
60.
5. Convert toCypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Samstag, 31. August 13
61.
5. Convert toCypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Relationship
Label
Samstag, 31. August 13
62.
5. Convert toCypher Paths
Person WORKS_FOR Company
Person HAS_SKILL Skill
Relationship
Label
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
Samstag, 31. August 13
6. Express Questionas Graph Pattern
Which people, who work for the same
company as me, have similar skills to me?
Samstag, 31. August 13
68.
Cypher Query
Which people,who work for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Samstag, 31. August 13
69.
Which people, whowork for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Graph Pattern
Samstag, 31. August 13
70.
Which people, whowork for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Anchor Pattern in Graph
Samstag, 31. August 13
71.
Which people, whowork for the same
company as me, have similar skills to me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
Create Projection of Results
Samstag, 31. August 13
From User Storyto Model and Query
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
As an employee
I want to know who in thecompany has similar skills tome
So that we can exchangeknowledge
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
Person WORKS_FOR Company
Person HAS_SKILL Skill
?Which people, who work for the
same company as me, have similar
skills to me?
Samstag, 31. August 13
Anti-Pattern: Node representsmultiple
concepts
name
age
position
company
department
project
skills
Person
Samstag, 31. August 13
82.
HAS_SKILL
Normalize into separateconcepts
name
age
Person
name
number_of_employees
Company
WORKS_FOR
Skill
name
Samstag, 31. August 13
83.
Challenge: Property orRelationship?
๏ Can every property be replaced by a relationship?
• Hint: triple stores. Are they easy to use?
๏ Should every entity with the same property values be
connected?
Samstag, 31. August 13
84.
Object Mapping
๏ Similarto how you would map objects to a relational
database, using an ORM such as Hibernate
๏ Generally simpler and easier to reason about
๏ Examples
• Java: Spring Data Neo4j
• Ruby: Active Model
๏ Why Map?
• Do you use mapping because you are scared of SQL?
• Following DDD, could you write your repositories
directly against the graph API?
Samstag, 31. August 13
Relationships for querying
๏like in other databases
• same structure for different use-cases (OLTP and
OLAP) doesn‘t work
• graph allows: add more structures
๏ Relationships should the primary means to access
nodes in the database
๏ Traversing relationships is cheap – that’s the whole
design goal of a graph database
๏ Use lookups only to find starting nodes for a query
Data Modeling examples in Manual
Samstag, 31. August 13
Evolution: Relationship toNode
68
Peter
SENT_EMAIL
Michael
Peter EMAIL_FROM
Michael
EMAIL_TO
Email
Emil
EMAIL_CC
Community
TAGGED
. . .
see Hyperedges
Samstag, 31. August 13
96.
Combine multiple Domainsin a Graph
๏ you start with a single domain
๏ add more connected domains as your system evolves
๏ more domains allow to ask different queries
๏ one domain „indexes“ the other
๏ Example Facebook Graph Search
• social graph
• location graph
• activity graph
• favorite graph
• ...
Samstag, 31. August 13
97.
Notes on theGraph Data Model
๏Schema free, but constraints
๏Model your graph with a whiteboard and a wise man
๏Nodes as main entities but useless without connections
๏Relationships are first level citizens in the model and database
๏Normalize more than in a relational database
๏use meaningful relationship-types, not generic ones like IS_
๏use in-graph structures to allow different access paths
๏evolve your graph to your needs, incremental growth
70
Samstag, 31. August 13
Need to modelthe relationship
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
language_code
Country
Samstag, 31. August 13
107.
What if thecardinality changes?
language_code
language_name
word_count
country_code
Language
country_code
country_name
flag_uri
Country
Samstag, 31. August 13
108.
Or we gomany-to-many?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
LanguageCountry
Samstag, 31. August 13
109.
Or we wantto qualify the relationship?
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
Samstag, 31. August 13
What’s different?
๏ Implementationof maintaining relationships is left up
to the database
๏ Artificial keys disappear or are unnecessary
๏ Relationships get an explicit name
• can be navigated in both directions
Samstag, 31. August 13
Keep on addingrelationships
name
word_count
Language
name
flag_uri
Country
POPULATION_SPEAKS
population_fraction
SIMILAR_TO ADJACENT_TO
Samstag, 31. August 13
[A] ACL fromHell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
Samstag, 31. August 13
132.
[A] ACL fromHell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
Samstag, 31. August 13
133.
[A] ACL fromHell
๏ Customer:
• leading consumer utility company with tons and
tons of users
๏ Goal:
• comprehensive access control administration
for customers
๏ Benefits:
• Flexible and dynamic architecture
• Exceptional performance
• Extensible data model supports new
applications and features
• Low cost
95
• A Reliable access control administration system for
5 million customers, subscriptions and agreements
• Complex dependencies between groups, companies,
individuals, accounts, products, subscriptions, services and
agreements
• Broad and deep graphs (master customers with 1000s of
customers, subscriptions & agreements)
name: Andreas
subscription: sports
service: NFL
account: 9758352794
agreement: ultimate
owns
subscribes to
has plan
includes
provides group: graphistas
promotion: fall
member of
offered
discounts
company: Neo
Technologyworks with
gets discount on
subscription: local
subscribes to
provides service: Ravens
includes
Samstag, 31. August 13
[B] Timely Recommendations
๏Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
Samstag, 31. August 13
136.
[B] Timely Recommendations
๏Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
Samstag, 31. August 13
137.
[B] Timely Recommendations
๏Customer:
• a professional social network
• 35 millions users, adding 30,000+ each day
๏ Goal: up-to-date recommendations
• Scalable solution with real-time end-user
experience
• Low maintenance and reliable architecture
• 8-week implementation
96
๏ Problem:
• Real-time recommendation imperative to attract new
users and maintain positive user retention
• Clustered MySQL solution not scalable or fast enough
to support real-time requirements
๏ Upgrade from running a batch job
• initial hour-long batch job
• but then success happened, and it became a day
• then two days
๏ With Neo4j, real time recommendations
name:Andreas
job: talking
name: Allison
job: plumber
name: Tobias
job: coding
knows
knows
name: Peter
job: building
name: Emil
job: plumber
knows
name: Stephen
job: DJ
knows
knows
name: Delia
job: barking
knows
knows
name: Tiberius
job: dancer
knows
knows
knows
knows
Samstag, 31. August 13
[C] Collaboration onGlobal Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
Samstag, 31. August 13
140.
[C] Collaboration onGlobal Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Samstag, 31. August 13
141.
[C] Collaboration onGlobal Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Samstag, 31. August 13
142.
[C] Collaboration onGlobal Scale
๏ Customer: a worldwide software leader
• highly collaborative end-users
๏ Goal: offer an online platform for global collaboration
• Highly flexible data analysis
• Sub-second results for large, densely-connected data
• User experience - competitive advantage
97
• Massive amounts of data tied to members, user
groups, member content, etc. all interconnected
• Infer collaborative relationships through user-
generated content
• Worldwide Availability
Asia North America Europe
Asia North America Europe
Samstag, 31. August 13
112
Really, once youstart
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
Samstag, 31. August 13
159.
112
Really, once youstart
thinking in graphs
it's hard to stop
Recommendations MDM
Systems
Management
Geospatial
Social computing
Business intelligence
Biotechnology
Making Sense of all that
data
your brain
access control
linguistics
catalogs
genealogyrouting
compensation market vectors
What will you build?
Samstag, 31. August 13
A graph database...
117
NO:not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
Samstag, 31. August 13
167.
A graph database...
117
NO:not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
Samstag, 31. August 13
168.
A graph database...
117
NO:not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
Samstag, 31. August 13
169.
A graph database...
117
NO:not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
remember linked lists, trees?
graphs are the general-purpose data structure
“A relational database may tell you the average age of everyone
in this place,
but a graph database will tell you who is most likely to buy you a
beer.”
Samstag, 31. August 13
Why Data Modeling
119
๏Whatis modeling?
๏Aren‘t we schema free?
๏How does it work in a
graph?
๏Where should modeling
happen? DB or Application
Samstag, 31. August 13
// lookup startingpoint in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
125
Samstag, 31. August 13
185.
// lookup startingpoint in an index
START n=node:People(name = ‘Andreas’)
Andreas
You traverse the graph
125
// then traverse to find results
START me=node:People(name = ‘Andreas’
MATCH (me)-[:FRIEND]-(friend)-[:FRIEND]-(friend2)
RETURN friend2
Samstag, 31. August 13