Intro to Passkeys and the State of Passwordless.pptx
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced RAG.pdf
1. Best of Both Worlds: Combine
KG and Vector search for
enhanced RAG
Data Innovation Summit 2024
Jonas El Reweny, Kristof Neys
Neo4j Field Engineering
2. Agenda
Neo4j Inc. All rights reserved 2023
2
1. Knowledge Graph
2. Graph Query Language
3. Graph Data Science
4. Vectors
5. Demo Time!
Notebook in Google Colab:
tinyurl.com/disws24
Neo4j Sandbox:
sandbox.neo4j.com
Prerequisites for the workshop:
● Laptop with internet access and no
outbound restrictions on ports 80,
443, 7687
● Register an account and log in to
https://sandbox.neo4j.com and
select the "Blank Sandbox" project
● Register an account and log in to
https://colab.research.google.com/
3. But….
First a word from our
sponsor…
Neo4j Inc. All rights reserved 2023
3
4. Neo4j Inc. All rights reserved 2023
4
Neo4j: The Graph Database
& Analytics Leader
5. Neo4j Inc. All rights reserved 2023
5
300
1B+ Enterprise
customers
$500M
in funding
170+
Global partner
ecosystem
250K
Community of developers
and data pros
100M+
Downloads
The first-ever graph database
Creator of the market category
Continued market leader
7. Neo4j Inc. All rights reserved 2023
7
The core graph object:
a Knowledge Graph
8. Recap a Knowledge Graph
A knowledge graph is a
structured representation
of facts, consisting of
entities, relationships and
semantic descriptions
8 Neo4j Inc. All rights reserved 2024
9. From data points to a Knowledge Graph
9 Neo4j Inc. All rights reserved 2024
10. From data points to a Knowledge Graph
10 Neo4j Inc. All rights reserved 2024
11. From data points to a Knowledge Graph
11 Neo4j Inc. All rights reserved 2024
12. From data points to a Knowledge Graph
12 Neo4j Inc. All rights reserved 2024
18. Neo4j Inc. All rights reserved 2023
18
Enhance your RAG with
Graph Data Science
19. GDS evolution
Local
Matching
Learn features in your
graph that you don’t even
know are important yet
Train in-graph supervise
ML models to predict
links, labels and missing
data.
Global
Patterns
Graph
Representations
Use unsupervised
machine learning
techniques to identify
associations, anomalies,
and trends.
Graph analytics
Graph feature
engineering
Find the patterns
you’re looking for in
connected data.
Knowledge graphs
19 Neo4j Inc. All rights reserved 2023
20. Neo4j Inc. All rights reserved 2023
20
Before we go any
further…let’s quiz!
21. Neo4j, Inc. All rights reserved 2021
21
Which of the colored nodes would be considered the most
‘important'?
22. Neo4j, Inc. All rights reserved 2021
22
Which of the colored nodes would be considered the most
‘important'?
23. 70+ Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
23 Neo4j Inc. All rights reserved 2023
24. 24 Neo4j Inc. All rights reserved 2023
It’s Better with Vectors…
Neo4j Inc. All rights reserved 2023
25. What is a Vector?
Neo4j Inc. All rights reserved 2023
25
26. What is a vector
Neo4j Inc. All rights reserved 2023
26
● Length
● Direction
● Components have meaning
horizontal
vertical
28. Kings and Queens
Neo4j Inc. All rights reserved 2023
28
king − man + woman ≈ queen
k
i
n
g
man wom
an
1
k
i
n
g
man
wom
an
2
q
u
e
e
n
?
3
29. What are vector embeddings
Neo4j Inc. All rights reserved 2023
29
● Same concepts, just “an arrow”
● 100s or 1000s dimensions
30. Finding Similar vectors
Neo4j Inc. All rights reserved 2023
30
● cosine
● direction / angle based
vector point
query
nearest 4
● Euclidean
● distance based
31. Why a Vector Store?
Neo4j Inc. All rights reserved 2023
31
32. Why & What is a Vector Index?
● Data applied on: encoding vectors of mainly unstructured data such
as text, audio, video that is converted using embedding models
(“Raw” vectors).
● Main purpose: deploy approximate methods to perform similarity
search at lower computational cost.
● Once an embedding vector has been created as a node property a vector
index can be created across those properties.
● This indexing is an algorithm that maps the original vector to a data
structure that enables faster search.
● By creating a vector index a data structure optimized for queries is created
at “store time” (as opposed to GDS similarity search at query time).
Neo4j Inc. All rights reserved 2023
32
33. How is search performed?
Neo4j Inc. All rights reserved 2023
33
● The Query vector is any piece of unstructured data that is being converted
to an encoding vector (the “Raw” vector) and is mapped to an index using
the same Algorithm (i.e. Hierarchical Navigable Small World).
● The “Key” vectors are the stored vectors that have been indexed.
● When search is performed between the query vector and the stored
vectors a similarity function is applied.
● Several similarity measures can be used, including:
○ Cosine similarity
○ Euclidean similarity
○ Dot product
34. Neo4j and Vector Search
Neo4j Inc. All rights reserved 2023
34
Find relevant documents and
content for user queries
Find entities associated to
content and patterns in
connected data.
Improve search relevance &
insights by enhancing a
Knowledge Graph. Use graph
algorithms and ML to
discover new relationships,
entities, and groups.
Vector Similarity
Search
Graph Traversals &
Pattern Matching
Knowledge Graph
Inference & ML
Vector Search
Graph Database
36. Neo4j Inc. All rights reserved 2023
36
What are node embeddings?
The representation of nodes as low-dimensional vectors that summarize
their graph position, the structure of their local graph neighborhood as well
as any possible node features
38. Neo4j Inc. All rights reserved 2023
38
4 algorithms…and counting
• FastRP (Fast Random Projection) - Calculates embeddings extremely fast using probabilistic
sampling and linear algebra.
• GraphSAGE (Graph SAmple and aggreGatE) - Trains a Graph Neural Network (GNN) to
generate embeddings on old and new graph data. Uses batch sampling procedures for
scalability.
• Node2Vec - Creates embeddings that represent nodes in similar neighborhoods and/or
structural “roles” in the graph using adjustable random walks.
• HashGNN - Quickly generates embeddings on heterogeneous graphs. Like a GNN but much
faster and simpler with comparable benchmarked performance. Leverages a clever application
of hashing functions rather than training a model.
Graph Data Science Embeddings