Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Workshop Tel Aviv - Graph Data Science
1. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
1
What Graph Technology can do
with your Data…
GDS Workshop Tel Aviv
Kristof Neys,
Graph Data Science Specialist - Field Engineering EMEA/APAC
Jul 2022
2. Neo4j, Inc. All rights reserved 2021
2
Graph Data Science Workshop
● Get your Neo4j Engine up & running and register at:
https://neo4j.com/sandbox/
● Get the script to code (copy) along:
https://github.com/Kristof-Neys/Neo4j_demos
3. Neo4j, Inc. All rights reserved 2021
3
We will cover…
1. Knowledge Graphs
2. Why Graph Data Science
3. Neo4j Graph Data Science
4. Algo in Action
5. Graph Embeddings
6. Graph Machine Learning
7. Access & Integration
8. Use cases
9. Demo Time!
4. Neo4j, Inc. All rights reserved 2021
4
It’s all about Knowledge Graphs
5. Neo4j, Inc. All rights reserved 2021
5
Driving Intelligence into Data with Knowledge Graphs
Data Graph
Dynamic Context
Knowledge Graph
Deep Dynamic Context
6. Neo4j, Inc. All rights reserved 2021
6
From Data points to Knowledge Graph
Car
D
RIVES
name: “Dan”
born: May 29, 1978
twitter: “@dan”
name: “Ann”
born: Dec 5, 1979
since:
Jan 10,
2021
brand: “Volvo”
model: “V90”
LOVES
LOVES
LIVES_WITH
O
W
N
S
Person Person
7. Neo4j, Inc. All rights reserved 2021
User
:VISITED
Website
User
IPLocation
Website
IPLocation
Website
Website
Website
:VISITED
:VISITED
:VISITED
:USED
:USED
:
U
S
E
D
:
V
I
S
I
T
E
D
:
V
I
S
I
T
E
D
:VISITED
:SAME_AS
Graphs allows you to make implicit
relationships….
….explicit
Graphs….Grow!
8. Neo4j, Inc. All rights reserved 2021
:SAME_AS
User
:VISITED
Website
User
IPLocation
Website
IPLocation
Website
Website
Website
:VISITED
:VISITED
:VISITED
:USED
:USED
:
U
S
E
D
:
V
I
S
I
T
E
D
:
V
I
S
I
T
E
D
:VISITED
User
:SAM
E_AS
:USED
:VISITED
PersonId: 1
PersonId: 1 PersonId: 1
User
PersonId: 2
:VISITED
…and can then group similar nodes…and
create a new graph from the explicit
relationships…
A graph grows organically - gaining
insights and enriching your data
Graphs….Grow!
9. Neo4j, Inc. All rights reserved 2021
Connectedness and Size of Data Set
Response
Time
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees of separation
Thousands of connections
Tens to hundreds of hops
Thousands of degrees
Billions of connections
1000x Advantage
at scale
“Minutes to milliseconds”
1000x Performance @Unlimited Scale
10. Neo4j, Inc. All rights reserved 2021
10
Knowledge graphs in Credit risk analysis
11. Neo4j, Inc. All rights reserved 2021
11
We eat, sleep, drink..
Knowledge
Graphs…
And..
…We even
published a book-let
on it….get your free
copy.
12. Neo4j, Inc. All rights reserved 2021
12
Why Graph Data Science?
13. Neo4j, Inc. All rights reserved 2021
Where Deepmind goes...
“We argue that combinatorial
generalisation must be a top
priority for AI to achieve
human-like abilities, and that
structured representations [i.e.
Graphs] and computations are
key to realizing this objective”
14. Neo4j, Inc. All rights reserved 2021
Everything is a Graph...
15. Neo4j, Inc. All rights reserved 2021
Graph Neural Networks are HOT!
16. Neo4j, Inc. All rights reserved 2021
And Industry is picking it up too….
17. Neo4j, Inc. All rights reserved 2021
17
Why & how Neo4j Graph Data Science?
18. Neo4j, Inc. All rights reserved 2021
Neo4j’s Graph Data Science Framework
Neo4j Graph Data
Science Library
Neo4j
Database
Neo4j
Bloom
Scalable Graph Algorithms &
Analytics Workspace
Native Graph Creation &
Persistence
Visual Graph
Exploration & Prototyping
19. Neo4j, Inc. All rights reserved 2021
19
Graphs & Data Science
Knowledge Graphs
Graph Algorithms
Graph Native
Machine Learning
Find the patterns you’re
looking for in connected data
Use unsupervised machine
learning techniques to
identify associations,
anomalies, and trends.
Use embeddings to learn the
features in your graph that
you don’t even know are
important yet.
Train in-graph supervised ML
models to predict links,
labels, and missing data.
20. Neo4j, Inc. All rights reserved 2021
20
60+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
21. Neo4j, Inc. All rights reserved 2021
21
Before we go any further - let’s
Quiz!
22. Neo4j, Inc. All rights reserved 2021
22
Which is the most important station…?
23. Neo4j, Inc. All rights reserved 2021
23
Betweenness Centrality (BC)
● The BC score of a node is determined by the number of shortest paths
running through it.
● The more shortest paths that go through a node, the higher the BC score
of that node.
● Betweenness centrality: “information flow”, “connector points”, “bridge”...
○ Estate agents
○ ...
24. Neo4j, Inc. All rights reserved 2021
Example: Betweenness centrality
The need for speed…
● NetworkX vs Neo4j
● Graph:
○ 42k nodes / 126k edges
● Time to compute….
○ Neo4j: 10 secs
○ NetworkX: 15,284 secs
(that is 4 hours 15min...)
● How??
Multi-Source Breadth-First-Search
25. Neo4j, Inc. All rights reserved 2021
25
How can they be used?
Stand Alone Solution
Find significant patterns and optimal
structures
Use community detection and
similarity scores for recommendations
Machine Learning Pipeline
Use the measures as features to train
an ML model
1st
node
2nd
node
Common
neighbors
Preferential
attachment
Label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
25
26. Neo4j, Inc. All rights reserved 2021
26
Graph Projection
- Focus time…
27. Neo4j, Inc. All rights reserved 2021
27
Monopartite graph
● is a single set of nodes that are interconnected
● is what you need for the majority of the graph algorithms
28. Neo4j, Inc. All rights reserved 2021
Bipartite graph
● two sets of nodes that are connected but the sets themselves are not
interconnected
● great as input for algorithms (such as node similarity) that are used to
create a monopartite graph
29. Neo4j, Inc. All rights reserved 2021
The Graph Catalog – from Data Model to Predictive Model
• Neo4j automates data
transformations
• Experiment with different data
sets, data models
• Fast iterations & layering
• Production ready features,
parallelization & enterprise
support
• Ability to persist and version
data
A graph-specific analytics workspace that’s mutable – integrated with a
native-graph database
Mutable In-Memory Workspace
Computational Graph
Native Graph Store
30. Neo4j, Inc. All rights reserved 2021
30
Our Implementations are Fast - and Getting Faster
LDBC100
(LDBC Social Network Scale Factor 100)
300M+ nodes
2B+ relationships
LDBC100PKP
(LDBC Social Network Scale Factor 100)
500k nodes
46M+ relationships
Logical Cores: 64
Memory: 512GB
Storage: 600GB
NVMe-SSD
AWS EC2 R5D16XLarge
Intel Xeon Platinum 8000
(Skylake-SP or Cascade Lake)
Node Similarity
20min
Betweenness Centrality
10min
Node2Vec
2.8min
Label Propagation
46sec
Weakly Connected
Components
36sec
Local Clustering
Coefficient
4.76min
FastRP
1.33min
PageRank
53sec
Louvain
14.66min
31. Neo4j, Inc. All rights reserved 2021
Algo in action:
Node Similarity
31
32. Neo4j, Inc. All rights reserved 2021
32
Node similarity
● The Node similarity algorithm compares a node based on the nodes they
are connected to. That is, two nodes are similar if they share the same
neighbours.
● The algorithm takes in a bipartite, connected graph of two disjoint node
sets, for instance ‘Person’ and ‘Food_Drink’
● The GDS Node similarity function is based on the Jaccard metric. For
two nodes, A and B, we compute
33. Neo4j, Inc. All rights reserved 2021
Node similarity - suppose...
34. Neo4j, Inc. All rights reserved 2021
Project the Graph…
CALL gds.graph.project(
'myGraph',
['Person', 'Food_Drink'],
{
LIKES: {
type: 'LIKES' }
}
);
● Food_Drink: Trappist, Sausages, Green_Tea, Vegan_Burger, Frites,
Mussels, Sushi
● Person: Alicia, Stefan, Tom, Kristof
35. Neo4j, Inc. All rights reserved 2021
Simple call to GDS function…
CALL gds.nodeSimilarity.stream('myGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Person1,
gds.util.asNode(node2).name AS Person2,similarity
ORDER BY similarity DESCENDING, Person1
37. Neo4j, Inc. All rights reserved 2021
Node similarity
Use Cases:
● Customer classification based on purchases
● Movie recommendations...
● Similarity scores can be stored as a relationship property -> feature
engineering
● ...
Caveats:
● O(n^2)
● Optimize algorithm by limiting the number of computation per node, topK
parameter
38. Neo4j, Inc. All rights reserved 2021
Graph Embeddings:
From Chaos to Structure…
38
39. Neo4j, Inc. All rights reserved 2021
Node Embedding
What are node embeddings?
How?
The representation of nodes as low-dimensional vectors that
summarize their graph position, the structure of their local graph
neighborhood as well as any possible node features
Encoder - Decoder Framework
40. Neo4j, Inc. All rights reserved 2021
Node Embedding
Encode nodes such that similarity in
the embedding space, i.e. cosine
similarity, approximates similarity in
the graph
41. Neo4j, Inc. All rights reserved 2021
Graph Embeddings in Neo4j
Node2Vec
Random walk based embedding
that can encode structural similarity
or topological proximity.
Easy to understand, interpretable
parameters, plenty of examples
GraphSAGE
Inductive embedding that encodes
properties of neighboring nodes when
learning topology.
Generalizes to unseen graphs, first
method to incorporate properties
FastRP
A super fast linear algebra based
approach to embeddings that can
encode topology or properties.
75,000x faster than Node2Vec
extended to encode properties
42. Neo4j, Inc. All rights reserved 2021
42
What is the intuition…?
43. Neo4j, Inc. All rights reserved 2021
Toy example on Embeddings & FastRP
45. Neo4j, Inc. All rights reserved 2021
Recall, embeddings are simply vectors...
46. Neo4j, Inc. All rights reserved 2021
Use Cosine to do a check...
Step 4: Run basic Cosine similarity:
MATCH (p1:Person)
MATCH (p2:Person)
WHERE id(p2) >id(p1)
RETURN p1.name as from, p2.name as to, gds.alpha.similarity.cosine(p1.`FastRP-embed`,
p2.`FastRP-embed`) AS similarity
ORDER BY similarity DESC
49. Neo4j, Inc. All rights reserved 2021
Et Voila… - bottom 5
Conclusion: As can be seen from the actual graph, ‘Elisa’ and ‘Dan’ are furthest removed from each other, with length of 3,
which is reflected in their embeddings, and hence in the cosine similarity computation. ‘Jeff’, ‘Annie’, ‘Brie’ are near each
other which is reflected in their similarity score.
50. Neo4j, Inc. All rights reserved 2021
50
Graph Machine Learning
51. Neo4j, Inc. All rights reserved 2021
51
Node Classification - in Neo4j
Load your in- memory
graph with labels &
features
Use
nodeClassification.train
Specify the property you want to
predict and the features for making
that prediction
Node classification:
Predicting a node label or (categorical) property
Neo4j Automates the Tricky Parts:
1. Splits data for train & test
2. Builds logistic regression models using the training data
& specified parameters to predict the correct label
3. Evaluates the accuracy of the models using the test data
4. Returns the best performing model
The predictive model
appears in the model
catalog, ready
to apply to
new data
52. Neo4j, Inc. All rights reserved 2021
52
Link Prediction - in Neo4j
Load your in- memory
graph with labels &
features
Use
linkPrediction.train
Split your graph into train & test
splitRelationships.mutate
Link Prediction:
Predicting unobserved edges or relationships that will form in the future
Neo4j Automates the Tricky Parts:
1. Builds logistic regression models using the training data
& specified parameters to predict the correct label
2. Evaluates the accuracy of the models using the test data
3. Returns the best performing model
The predictive model
appears in the model
catalog, ready
to apply to
new data
55. Neo4j, Inc. All rights reserved 2021
55
Access & Integration
56. Neo4j, Inc. All rights reserved 2021
56
Access & deploy GDS
● In addition to the Neo4j Browser, access to the GDS library can be done
using the Neo4j Drivers
57. Neo4j, Inc. All rights reserved 2021
Neo4j GDS - Pythonic way….
https://github.com/neo4j/graph-data-science-client
58. Neo4j, Inc. All rights reserved 2021
Integrating Neo4j in production workflow
59. Neo4j, Inc. All rights reserved 2021
59
One last thing….
It’s better with Transformers...
63. Neo4j, Inc. All rights reserved 2021
63
The Modern Supply Chain is a Graph
64. Neo4j, Inc. All rights reserved 2021
“
64
Bill of Materials
“We looked at the problem and had a diagram
with circles coming off from lines, and it was a
representation of our graph omen.”
Ann Grubbs
Chief Data Engineer, Lockheed Martin Space
Challenge
Product data housed in various silos
across the organization could not be easily
integrated
Solution
Knowledge graph of components and
parts to build satellites and equipment to
reduce costs and increase efficiency
Why Neo4j
Flexible, Contextual Data Model
Reduce
Costs
Increased
Efficiency
https://go.neo4j.com/rs/710-RRC-335/images/Neo4j-case-study-lockheed-martin-space-EN-US.pdf?_gl=1*1l9aumn*_ga*MTY4Mjc3NzczOS4xNjI4NTYwNzAx*_ga_DL38Q8KGQC*MTY0OTk4MTc4OC40NTMuMC4xNjQ5
jA.&_ga=2.56975401.2084245390.1649785148-1682777739.1628560701&_gac=1.217804388.1646438592.CjwKCAiAjoeRBhAJEiwAYY3nDOfaswGYvytf8zzvFmfYpQ0i2UV4KkD_teIXnZtaeFtcrIuLeI7j8BoCgIwQAvD_Bw
65. Neo4j, Inc. All rights reserved 2021
65
Neo4j working with Banking Circle to detect Fraud
66. Neo4j, Inc. All rights reserved 2021
66
● Boston Scientific needed a more
efficient method for finding the root
causes of quality control problems.
● The heart of Boston Scientific’s graph
data model consists of three nodes
representing a finished product, a part
and an issue, with relationships that
trace problems to parts and connect
those to finished products.
● The Boston Scientific team analyzes its
graph and computes scores that rank
nodes based on their proximity to
failures, enriching its models with
insights derived from the graph.
67. Neo4j, Inc. All rights reserved 2021
67
NASA: A Knowledge Graph to identify trends that can prevent
disasters and incorporate lessons learned into new projects.
Getting to Mars Faster with a Knowledge Graph
David Meza, Chief Knowledge Architect at
NASA: "This has saved us at least
a year and over $2 million in
research and development
toward our Mission to Mars
planning."
68. Neo4j, Inc. All rights reserved 2021
68
• Challenge: Focus on preventative
maintenance to avoid costly post-failure
remedial actions
• Solution: 27 million warranty & service
documents parsed for text to knowledge
graph that is context for AI to learn “prime
examples” and anticipate maintenance
• Results:
○ Proactive remedial action has
saved downtime & associated
costs and increased productivity
Caterpillar
Preventative Maintenance
69. Neo4j, Inc. All rights reserved 2021
69
Demo Time…! (but first some
Cypher…)
70. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Cypher: first we CREATE
70
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
Person
NODE NODE
LABEL PROPERTY
LABEL PROPERTY
CREATE
RELATIONSHIP
name: ‘Ann’
LOVES
Person
name: ‘Dan’
71. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Cypher: and then we MATCH a pattern in the Graph
71
MARRIED_TO
Person
name: ‘Dan’
MATCH (p:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
NODE RELATIONSHIP TYPE
LABEL PROPERTY VARIABLE
spouse
NODE
RETURN p, spouse
VARIABLE
72. Neo4j, Inc. All rights reserved 2021
72
In Cypher you MATCH a pattern and then RETURN a result
MATCH (c:Country {name: "Finland"})
RETURN c;
001
Filtering is done with WHERE (this statement does exactly the same)
MATCH (c:Country)
WHERE c.name = "Finland"
RETURN c;
002
73. Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
73
Thank you…,OK, now let’s Demo!