SlideShare a Scribd company logo
1 of 27
Data Modeling for Graph Data
Science
2
From Previous Session:
Developing the initial graph data model
1. Define high-level domain requirements
2. Create sample data for modeling purposes
3. Define the questions for the domain
4. Identify entities
5. Identify connections between entities
6. Test the model against the questions
7. Test scalability
What could a generic
data model look like for
Game of Thrones?
3
4
Game Of Thrones Data Model
Battle
King
Person
Knight
House
● Based on Network of Thrones, A song of
Math and Westeros
● Information from 5 books. Represents
characters, houses they belong to,
interactions between them and battles
etc..
5
Let’s look at some queries..
Find the Houses attacking or defending in “Battle of the Blackwater Bay”
and return names of Kings and Knights in those houses
MATCH (k) -
[:ATTACKER_KING |:DEFENDER_KING |:ATTACKER_COMMANDER
|:DEFENDER_COMMANDER]
-> (b:Battle {name: 'Battle of the Blackwater'})
WHERE 'King' in labels(k) or 'Knight' in labels(k)
MATCH (h:House) - [:ATTACKER |:DEFENDER] -> (b:Battle
{name: 'Battle of the Blackwater'})
RETURN (k) -[:BELONGS_TO] - (h)
6
Find all people who belonged to “House Lannister” and died between 300
and 600 and whom they interacted with from “House Martell”
Let’s look at some queries..
MATCH (p1:Person) - [:BELONGS_TO] ->
(h:House {name: 'Lannister'})
WHERE 300 <=p1.death_year <=600
MATCH p = (p1) -[:INTERACTS] - (:Person) -
[:BELONGS_TO] - (:House {name : 'Martell'})
RETURN p
7
Cluster or group battles based on Kings and Knights participated in them?
Let’s look at some queries..
…. this isn’t easy to answer with Cypher, but we’ve got Graph
Algorithms (Community detection) for that!
Objective: Understand data modeling for graph data
science.
● Considerations prior to schema design
● Monopartite, bipartite & multipartite graphs
8
In This Session..
The fundamentals don’t change
The white board model is
still your starting point
9
Things that break Neo4j still
break Neo4j for data
science
- Putting tons of attributes on
nodes and relationships
- Making all of your attributes into
nodes
- Labels that aren’t sufficiently
specific
- Super densely connected nodes
• Define your questions and objectives up front
• Are you running algorithms to answer a specific question? or
• Are you generating features for an ML pipeline?
10
Considerations prior to designing schema
• Which algorithms do you want to run?
• Different algorithms require differently shaped graphs
• How often do you want to run your workload
• Is the graph for one offs and experimentation?
• Is this part of a pipeline?
• Do you need to do data refreshes? Incremental updates?
• Are there any time constraints?
11
What is your objective?
Workload
Type
Performance
Expectation
Data Data Model
Experimental
Mixed Variable, no strict
requirements
The more the
merrier
Domain relevant, easy to
refactor
Model building
Primarily
Algorithms
Variable, usually
<24 hours
Reproducible,
timestamps,
test/train splits
Follow Best practices,
Easy for running algos
and updating data
Production ML
Primarily
Algorithms
< a few hours Only the data
relevant to your
workload
Mono- or bipartite
graphs, seeding
Production
Analytics
Mixed < 24 hours Only the data
relevant to your
workload
Optimized for algorithms
(mono- or bipartite
graphs, seeding) + key
queries
OBJECTIVES
CONSIDERATIONS
12
What are Graph Algorithms?
• Graph Algorithms (#Algorithms) solve a wide variety of problems
(#Problems) and are applicable in many areas of science and technology
(#Applications)
• Algorithms require differently shaped graphs. For example, spanning trees,
shortest path and community detection algorithms work on homogeneous
networks represented by monopartite graphs
• Similarity algorithms can work on heterogeneous networks represented
by bipartite or multipartite graphs with some strong assumptions
Real world networks are heterogeneous!
13
Monopartite, bipartite & multipartite
graphs
Most networks contain data with
multiple node and relationship
types. (multipartite)
A bipartite graph contains
nodes that can be divided into
two sets, such that
relationships only exists
between sets but not within
each set.
Node Similarity relies on this
type of graph.
A monopartite graph
contains one node label and
relationship. Most algorithms
rely on this type of graph.
14
Derived relationships and k-partite graphs
• One class of algorithm --
community detection is made to
answer this question, but it is
intended to work on monopartite
graphs
• Can we use cypher queries to
construct a subgraph with only
battles in it? That has information
about Kings/Knights that
participated in the battles?
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
15
Derived relationships and k-partite graphs
• One class of algorithm --
community detection is made to
answer this question, but it is
intended to work on monopartite
graphs
• Can we use cypher queries to
construct a subgraph with only
battles in it? That has information
about Kings/Knights that
participated in the battles?
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
16
Derived relationships and k-partite graphs
Monopartite = only 1 node label
- Connect battles to each
other
- Based on how many Knights
and Kings fought in the same
battle
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
17
Derived relationships and k-partite graphs
MATCH
(b1:Battle)<-[]-(k)-[]->(b2:Battle)
WHERE k:King or k:Knight
RETURN b1.name AS source, b2.name
AS target, count(distinct k) AS
num_participants
Battle
King
Person
Knight
House
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
This gives you:
- for every pair of battles,
- how many kings or knights
fought in both
18
Derived relationships and k-partite graphs
MATCH
(b1:Battle)<-[]-(k)-[]->(b2:Battle)
WHERE k:King or k:Knight
WITH b1, b2 count(distinct k) AS
num_participants
MERGE (b1)-[r:shared_fighters
{count:num_participants}]->(b2)
RETURN b1,r,b2
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
We can add the new derived relationship to the
graph and extract the relevant subgraph
19
Derived relationships and k-partite graphs
Let’s revisit this question: “Cluster or group battles based on
Kings and Knights participated in them”
Now that we have the graph we can run
our algorithms on ….
we can use one of the Community
detection algorithms to cluster
battles.
Algorithms learn based on the structure of a graph, and make
assumptions about graph topology:
• Any variation in the distribution of relationships for nodes can
tell us some underlying property (how important is a node?
what community is it in?)
• It’s not possible to iterate heuristics over directed graphs
• There’s no external information relevant to the algorithm (like
node labels or relationship type)
20
Why can’t I run my algorithm on a
multipartite graph?
huh?
What if I try to an algorithm on this graph?
- How many relationships does each
person have?
- How many relationships does each book
have?
- What is the direction of the relationships
in this graph?
- Can I reach a person node from another
person node, following the directed
relationships?
21
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- How many relationships does each
person have? 1 or 2
- How many relationships does each book
have? 5 or 6
- What is the direction of the relationships
in this graph? Person-[:APPEARED_IN]->Book
- Can I reach a person node from another
person node, following the directed
relationships? No!
22
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- What would an algorithm that used the
number of edges each node has to
calculate centrality conclude?
- What would an algorithm that followed
directed relationships to find
communities conclude?
23
Why can’t I run my algorithm on a
multipartite graph?
What if I try to an algorithm on this graph?
- What would an algorithm that used the
number of edges each node has to
calculate centrality conclude?
Books are more important than people!
- What would an algorithm that followed
directed relationships to find
communities conclude?
There seven communities?
24
Why can’t I run my algorithm on a
multipartite graph?
If you want to find out:
- What person is the most important
- How many communities of people are
there, across all the books?
You need to reshape your graph!
25
Why can’t I run my algorithm on a
multipartite graph?
• Neo4j automates data
transformations
• Fast iterations & layering
• Production ready features,
parallelization & enterprise
support
Neo4j’s Graph Catalog solves this problem!   
We have procedures to let you reshape and subset your
transactional graph so you have the right data in the right shape.
Mutable In-Memory Workspace
Computational Graph
Native Graph Store
27
Next Session..
Deep dive into Graph Data Science Library
● Graph Catalog
● Graph Algorithms

More Related Content

More from Neo4j

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...Neo4j
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AINeo4j
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignNeo4j
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Neo4j
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Neo4j
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 

More from Neo4j (20)

BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
SWIFT: Maintaining Critical Standards in the Financial Services Industry with...
 
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AIDeloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
Deloitte & Red Cross: Talk to your data with Knowledge-enriched Generative AI
 
Ingka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by DesignIngka Digital: Linked Metadata by Design
Ingka Digital: Linked Metadata by Design
 
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
Discover Neo4j Aura_ The Future of Graph Database-as-a-Service Workshop_3.13.24
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...Identification of insulin-resistance genes with Knowledge Graphs topology and...
Identification of insulin-resistance genes with Knowledge Graphs topology and...
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 

Recently uploaded

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Neo4j Graph Data Science Training - June 9 & 10 - Slides #4

  • 1. Data Modeling for Graph Data Science
  • 2. 2 From Previous Session: Developing the initial graph data model 1. Define high-level domain requirements 2. Create sample data for modeling purposes 3. Define the questions for the domain 4. Identify entities 5. Identify connections between entities 6. Test the model against the questions 7. Test scalability
  • 3. What could a generic data model look like for Game of Thrones? 3
  • 4. 4 Game Of Thrones Data Model Battle King Person Knight House ● Based on Network of Thrones, A song of Math and Westeros ● Information from 5 books. Represents characters, houses they belong to, interactions between them and battles etc..
  • 5. 5 Let’s look at some queries.. Find the Houses attacking or defending in “Battle of the Blackwater Bay” and return names of Kings and Knights in those houses MATCH (k) - [:ATTACKER_KING |:DEFENDER_KING |:ATTACKER_COMMANDER |:DEFENDER_COMMANDER] -> (b:Battle {name: 'Battle of the Blackwater'}) WHERE 'King' in labels(k) or 'Knight' in labels(k) MATCH (h:House) - [:ATTACKER |:DEFENDER] -> (b:Battle {name: 'Battle of the Blackwater'}) RETURN (k) -[:BELONGS_TO] - (h)
  • 6. 6 Find all people who belonged to “House Lannister” and died between 300 and 600 and whom they interacted with from “House Martell” Let’s look at some queries.. MATCH (p1:Person) - [:BELONGS_TO] -> (h:House {name: 'Lannister'}) WHERE 300 <=p1.death_year <=600 MATCH p = (p1) -[:INTERACTS] - (:Person) - [:BELONGS_TO] - (:House {name : 'Martell'}) RETURN p
  • 7. 7 Cluster or group battles based on Kings and Knights participated in them? Let’s look at some queries.. …. this isn’t easy to answer with Cypher, but we’ve got Graph Algorithms (Community detection) for that!
  • 8. Objective: Understand data modeling for graph data science. ● Considerations prior to schema design ● Monopartite, bipartite & multipartite graphs 8 In This Session..
  • 9. The fundamentals don’t change The white board model is still your starting point 9 Things that break Neo4j still break Neo4j for data science - Putting tons of attributes on nodes and relationships - Making all of your attributes into nodes - Labels that aren’t sufficiently specific - Super densely connected nodes
  • 10. • Define your questions and objectives up front • Are you running algorithms to answer a specific question? or • Are you generating features for an ML pipeline? 10 Considerations prior to designing schema • Which algorithms do you want to run? • Different algorithms require differently shaped graphs • How often do you want to run your workload • Is the graph for one offs and experimentation? • Is this part of a pipeline? • Do you need to do data refreshes? Incremental updates? • Are there any time constraints?
  • 11. 11 What is your objective? Workload Type Performance Expectation Data Data Model Experimental Mixed Variable, no strict requirements The more the merrier Domain relevant, easy to refactor Model building Primarily Algorithms Variable, usually <24 hours Reproducible, timestamps, test/train splits Follow Best practices, Easy for running algos and updating data Production ML Primarily Algorithms < a few hours Only the data relevant to your workload Mono- or bipartite graphs, seeding Production Analytics Mixed < 24 hours Only the data relevant to your workload Optimized for algorithms (mono- or bipartite graphs, seeding) + key queries OBJECTIVES CONSIDERATIONS
  • 12. 12 What are Graph Algorithms? • Graph Algorithms (#Algorithms) solve a wide variety of problems (#Problems) and are applicable in many areas of science and technology (#Applications) • Algorithms require differently shaped graphs. For example, spanning trees, shortest path and community detection algorithms work on homogeneous networks represented by monopartite graphs • Similarity algorithms can work on heterogeneous networks represented by bipartite or multipartite graphs with some strong assumptions Real world networks are heterogeneous!
  • 13. 13 Monopartite, bipartite & multipartite graphs Most networks contain data with multiple node and relationship types. (multipartite) A bipartite graph contains nodes that can be divided into two sets, such that relationships only exists between sets but not within each set. Node Similarity relies on this type of graph. A monopartite graph contains one node label and relationship. Most algorithms rely on this type of graph.
  • 14. 14 Derived relationships and k-partite graphs • One class of algorithm -- community detection is made to answer this question, but it is intended to work on monopartite graphs • Can we use cypher queries to construct a subgraph with only battles in it? That has information about Kings/Knights that participated in the battles? Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 15. 15 Derived relationships and k-partite graphs • One class of algorithm -- community detection is made to answer this question, but it is intended to work on monopartite graphs • Can we use cypher queries to construct a subgraph with only battles in it? That has information about Kings/Knights that participated in the battles? Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 16. 16 Derived relationships and k-partite graphs Monopartite = only 1 node label - Connect battles to each other - Based on how many Knights and Kings fought in the same battle Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them”
  • 17. 17 Derived relationships and k-partite graphs MATCH (b1:Battle)<-[]-(k)-[]->(b2:Battle) WHERE k:King or k:Knight RETURN b1.name AS source, b2.name AS target, count(distinct k) AS num_participants Battle King Person Knight House Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” This gives you: - for every pair of battles, - how many kings or knights fought in both
  • 18. 18 Derived relationships and k-partite graphs MATCH (b1:Battle)<-[]-(k)-[]->(b2:Battle) WHERE k:King or k:Knight WITH b1, b2 count(distinct k) AS num_participants MERGE (b1)-[r:shared_fighters {count:num_participants}]->(b2) RETURN b1,r,b2 Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” We can add the new derived relationship to the graph and extract the relevant subgraph
  • 19. 19 Derived relationships and k-partite graphs Let’s revisit this question: “Cluster or group battles based on Kings and Knights participated in them” Now that we have the graph we can run our algorithms on …. we can use one of the Community detection algorithms to cluster battles.
  • 20. Algorithms learn based on the structure of a graph, and make assumptions about graph topology: • Any variation in the distribution of relationships for nodes can tell us some underlying property (how important is a node? what community is it in?) • It’s not possible to iterate heuristics over directed graphs • There’s no external information relevant to the algorithm (like node labels or relationship type) 20 Why can’t I run my algorithm on a multipartite graph? huh?
  • 21. What if I try to an algorithm on this graph? - How many relationships does each person have? - How many relationships does each book have? - What is the direction of the relationships in this graph? - Can I reach a person node from another person node, following the directed relationships? 21 Why can’t I run my algorithm on a multipartite graph?
  • 22. What if I try to an algorithm on this graph? - How many relationships does each person have? 1 or 2 - How many relationships does each book have? 5 or 6 - What is the direction of the relationships in this graph? Person-[:APPEARED_IN]->Book - Can I reach a person node from another person node, following the directed relationships? No! 22 Why can’t I run my algorithm on a multipartite graph?
  • 23. What if I try to an algorithm on this graph? - What would an algorithm that used the number of edges each node has to calculate centrality conclude? - What would an algorithm that followed directed relationships to find communities conclude? 23 Why can’t I run my algorithm on a multipartite graph?
  • 24. What if I try to an algorithm on this graph? - What would an algorithm that used the number of edges each node has to calculate centrality conclude? Books are more important than people! - What would an algorithm that followed directed relationships to find communities conclude? There seven communities? 24 Why can’t I run my algorithm on a multipartite graph?
  • 25. If you want to find out: - What person is the most important - How many communities of people are there, across all the books? You need to reshape your graph! 25 Why can’t I run my algorithm on a multipartite graph?
  • 26. • Neo4j automates data transformations • Fast iterations & layering • Production ready features, parallelization & enterprise support Neo4j’s Graph Catalog solves this problem!    We have procedures to let you reshape and subset your transactional graph so you have the right data in the right shape. Mutable In-Memory Workspace Computational Graph Native Graph Store
  • 27. 27 Next Session.. Deep dive into Graph Data Science Library ● Graph Catalog ● Graph Algorithms