5. 5
What and Why of Graph
Making Relationships
a First Class Citizen
● ArangoDB turns
the value of data
relationships into
actionable results
● Data relationships
are the foundation
of AI/ML models
SQL DB
Product 1 Price Category Description
e.g. Product Listing
Product 2 Price Category Description
Product 3 Price Category Description
Graph/NoSQL DB
e.g. Co-Purchase Pattern
Product 2
Product 4
Product 1
Product 3
Product 4 Price Category Description
Rather than focus on
individual rows or products…
Graph DB captures dependencies and
relationships between those products
6. 6
Graph Database
●Collection of nodes and edges
●Naturally describes relations in data
●Feasibly handles large joins/traversals
●Built-in graph algorithms (K paths, shortest path, etc)
●Use Cases:
○ Fraud Detection
○ Supply Chain Management
○ Recommendations
○ Customer 360
○ Network Management
○ Risk Management
7. 7
ML + Graph Databases
GraphDB
ML Ecosystem
…
GraphQL
Data
Ecosystem
Knowledge Graph
MetaData
Graph Analytics
GraphML Inferences
Embeddings/ Inferences
Graph data
DGL, PyG, NetworkX,...
Cloud
16. 16
Content-based Filtering
● Very personalized recommendation
● Uses existing data to offer predictions
● Typically requires domain knowledge
● Can be fast and ad-hoc
Content-based filtering uses item features to recommend other items similar to what the
user likes, based on their previous actions or explicit feedback. - Google
17. 17
TFIDF
Term Frequency: How often the word shows up in a document.
Inverse Document Frequency: How often the word shows up across all
documents.
Attempts to rank information based on the quality of the words, not just the
frequency.
tfidf(t, d, D) = tf(t,d) * idf(t, D)
( D: all documents, d: document, t: term )
https://en.wikipedia.org/wiki/Tf-idf
19. 21
Storing it in the graph
Movie/
User
Movie/
User
{ ML (Distance, Similarity, Embedding) }
● Store ML outcomes on the edge
● Enrich new/existing data and queries
● Leverage benefits of ML
● Reduce complexity
21. 23
● Personalized recommendation
● Predictions based on combined external patterns
● Depends on existing patterns being accurate
● Can offer predictions with limited domain knowledge
Collaborative Filtering
22. 24
Matrix Factorization
●Can be efficient or not
●Sparse matrix
●Dimensionality Reduction
●Combine with content-based
●Scale with faiss
User 1 User 2 User 3 User 4
Toy Story 5 ? 2 1
Golden
Eye
? 1 5 5
Love
Actually
? 5 ? 5
Babe 5 ? 1 ?
Star Trek 1 ? 5 5
SVD
A = UΣV^T
25. 27
Graph Neural Networks
Sachin Sharma
ML Research
Engineer @ArangoDB
● Develop Intelligent Products
● Former Machine Learning
Scientist & Engineer @Define
Media Gmbh
● Former Research Intern @DFKI
● AI Blogger
● Interests: Graph ML, Vision,
NLP.
Graph ML, NVIDIA Triton, and ArangoDB: Thinking Beyond Euclidean Space
https://www.arangodb.com/events/graphml-nvidia-triton-and-arangodb-thinking-beyond-euclidean-
space/
26. 28
Graph(Node) Representation Learning
image credits Stanford:
● Map network nodes to d-dimensional embeddings space
● Similar nodes in the network should remain close to each other in the embedding space
Similarity of (u, v) in network
Dot product between node embeddings
27. 29
Graph
This is the key to machine learning on graphs, where each node
is mapped into a coordinate system so certain properties are
maintained. e.g., different node types can easily be separated
by a line, or neighbouring nodes are close to each other.
Embedding
Embedding
28. 30
Can we Apply CNNs on Graphs?
Fixed Number of Neighbors
(2D Grid - Euclidean Space)
Random Number of Neighbors
(Graph - Non-Euclidean Space)
image credits: source
Image as 2D Grid
Text/Audio as 1D Sequence
29. 31
Graph Neural Networks
●Node classification
●Graph classification
●Link prediction
○ Predict links for users and movies
31. 33
ML + Graph Databases
●Knowledge graph serves data
●Graph naturally pairs with ML
●ML Ecosystem for graph interface
Movie Knowledge Graph ML Ecosystem
Embeddings/ Inferences
Input data
33. 35
Nvidia Triton Meets ArangoDB
AI Model Repository
Deploy
Graph ML Model
(GraphSage)
Front-End
Client
Application
N3
N1
N2
N4
N5
N6
ArangoDb
Update
Update
N3
N1
N2
N4
N5
N6
ArangoDb
Retrieve all the node
embeddings of the nbors of
node ‘N5’ which are at 1-Hop
distance
Know Surroundings
35. 37
Thank you!
●Notebooks
https://github.com/arangodb/interactive_tutorials
○ Collaborative Filtering with AQL
○ Content-based Recommendations with ArangoSearch and TFIDF
○ Content-based Recommendations with FAISS, TFIDF, and Python
○ Graph Neural Networks with PyTorch
○ Matrix Factorization
Test-drive ArangoDB and ArangoML using Oasis
14-days for free
https://github.com/arangoml/
Register now at
https://bit.ly/3blNaKR
Compress matrix to something like this example. We have sparse data but can now attempt to offer predictions based on the reduced dimensions.
Learn features such as genre and how much an item actually expresses that genre (is it sci-fi and action or sci-fi but more drama)
Content based can match descriptions based on keyword and that isn’t always enough.
Must encode information about the graph (neighbors) for message passing - graph representational learning -
Challenge of going from 2d fixed euclidean space - CNN require your data is represented in a fixed euclidean space. With non-euclidean there is variable number of neighbors.