Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022

1
Machine Learning +
Graph Databases for
Better Recommendations

2
Chris Woodward
Developer Relations
Engineer
@ArangoDB
● Training
● Development
● Community
● ArangoML
● Twitter: @cw00dw0rd
● Slack: Chris.ArangoDB
● Email: chris@arangodb.com

3
The Agenda
●ArangoFlix Project
●Graph Database
●Recommendations
●Machine Learning Techniques
○ Collaborative Filtering with AQL
○ Content-based Recommendations with ArangoSearch and TFIDF
○ Content-based Recommendations with FAISS, TFIDF, and Python
○ Matrix Factorization
○ Graph Neural Networks with PyTorch
●Graph Database + ML

4
What is ArangoFlix?
●Machine Learning + Graph Databases
●ArangoDB Oasis
●ArangoFlix Website

5
What and Why of Graph
Making Relationships
a First Class Citizen
● ArangoDB turns
the value of data
relationships into
actionable results
● Data relationships
are the foundation
of AI/ML models
SQL DB
Product 1 Price Category Description
e.g. Product Listing
Graph/NoSQL DB
e.g. Co-Purchase Pattern
Product 2
Product 4
Product 1
Product 3
Rather than focus on
individual rows or products…
Graph DB captures dependencies and
relationships between those products

6
Graph Database
●Collection of nodes and edges
●Naturally describes relations in data
●Feasibly handles large joins/traversals
●Built-in graph algorithms (K paths, shortest path, etc)
●Use Cases:
○ Fraud Detection
○ Supply Chain Management
○ Recommendations
○ Customer 360
○ Network Management
○ Risk Management

7
ML + Graph Databases
GraphDB
ML Ecosystem
…
GraphQL
Data
Ecosystem
Knowledge Graph
MetaData
Graph Analytics
GraphML Inferences
Embeddings/ Inferences
Graph data
DGL, PyG, NetworkX,...
Cloud

8
What is a Recommendation
System?

9
●Provides predictions to business/users
●Business Driven
●Data Quality
●Privacy/explainability Considerations
●Domain Specific
●Implementation Methods
○ Content-Based
○ Collaborative Filtering
○ Hybrid/Group/Other
Recommendation System/Engine

10
Recommendation System - Use Cases
Domains
●Products
●Jobs
●Destinations
●People
●Entertainment
●Research
●Search
Companies
●Amazon, Newegg, Instacart
●GlassDoor, Indeed
●AirBnB, Maps, Kayak
●Reddit, LinkedIn, Twitter
●Netflix, Xbox, AppleTV
●Healthcare, Citation
●Google, Bing, (Page Rank)

11
…
Customer
LOB
Applications
DATA
DATA LAKE
Big Data
Processing
ETL
Business
Rules
OPERATIONAL
DATA
Knowledge Graph
DATA
WAREHOUSE
Deploy Monitor
Intelligent App
Feature Store ML Metadata Machine Learning
ML Engineer
Data Scientist
MODELS
Data Analyst
Data Engineer
Developer
ETL
ETL
Data Movement within Organizations

12
Application Backend/Storage Recommendation
Logic
Customer
Simplified Recommendation Flow

13
ArangoFlix - Demo Site
ArangoDB Cloud
https://cloud.arangodb.com
Examples > Install > Demo
https://flix.arangodb.com

14
Stack
Recommendations
●ArangoSearch & AQL
●TFIDF - Content Based
○ Sklearn
●Matrix Factorization -
Collaborative Filtering
○ Surprise/SVD
●GNN
○ PyTorch Geometric
○ sBert
Backend
●ArangoDB Cloud
○ Driver: python-arango
●Foxx Microservices
○ GraphQL Endpoint
Frontend
●VueJS / VueX
●Cytoscape
●PrimeVue

16
Content-based Filtering
● Very personalized recommendation
● Uses existing data to offer predictions
● Typically requires domain knowledge
● Can be fast and ad-hoc
Content-based filtering uses item features to recommend other items similar to what the
user likes, based on their previous actions or explicit feedback. - Google

17
TFIDF
Term Frequency: How often the word shows up in a document.
Inverse Document Frequency: How often the word shows up across all
documents.
Attempts to rank information based on the quality of the words, not just the
frequency.
tfidf(t, d, D) = tf(t,d) * idf(t, D)
( D: all documents, d: document, t: term )
https://en.wikipedia.org/wiki/Tf-idf

18
TFIDF
ArangoSearch
https://colab.research.google.com/github/arangodb/interactive_tutorials/blo
b/master/notebooks/arangoflix/similarMovie_TFIDF_AQL_Inference.ipynb
ML
b/master/notebooks/arangoflix/similarMovie_TFIDF_ML_Inference.ipynb

19
Storing it in the graph
Movie/
User
Movie/
User
{ ML (Distance, Similarity, Embedding) }
● Store ML outcomes on the edge
● Enrich new/existing data and queries
● Leverage benefits of ML
● Reduce complexity

21
● Personalized recommendation
● Predictions based on combined external patterns
● Depends on existing patterns being accurate
● Can offer predictions with limited domain knowledge
Collaborative Filtering

22
Matrix Factorization
●Can be efficient or not
●Sparse matrix
●Dimensionality Reduction
●Combine with content-based
●Scale with faiss
User 1 User 2 User 3 User 4
Toy Story 5 ? 2 1
Golden
Eye
? 1 5 5
Love
Actually
? 5 ? 5
Babe 5 ? 1 ?
Star Trek 1 ? 5 5
SVD
A = UΣV^T

23
Matrix Factorization - Hybrid
b/master/notebooks/arangoflix/similarMovie_MF_ML_Inference.ipynb

25
Graph Neural Networks
Sachin Sharma
ML Research
Engineer @ArangoDB
● Develop Intelligent Products
● Former Machine Learning
Scientist & Engineer @Define
Media Gmbh
● Former Research Intern @DFKI
● AI Blogger
● Interests: Graph ML, Vision,
NLP.
Graph ML, NVIDIA Triton, and ArangoDB: Thinking Beyond Euclidean Space
https://www.arangodb.com/events/graphml-nvidia-triton-and-arangodb-thinking-beyond-euclidean-
space/

26
Graph(Node) Representation Learning
image credits Stanford:
● Map network nodes to d-dimensional embeddings space
● Similar nodes in the network should remain close to each other in the embedding space
Similarity of (u, v) in network
Dot product between node embeddings

27
Graph
This is the key to machine learning on graphs, where each node
is mapped into a coordinate system so certain properties are
maintained. e.g., different node types can easily be separated
by a line, or neighbouring nodes are close to each other.
Embedding
Embedding

28
Can we Apply CNNs on Graphs?
Fixed Number of Neighbors
(2D Grid - Euclidean Space)
Random Number of Neighbors
(Graph - Non-Euclidean Space)
image credits: source
Image as 2D Grid
Text/Audio as 1D Sequence

29
●Node classification
●Graph classification
●Link prediction
○ Predict links for users and movies

30
b/master/notebooks/arangoflix/predict_Movie_Rating_GNN.ipynb

31
ML + Graph Databases
●Knowledge graph serves data
●Graph naturally pairs with ML
●ML Ecosystem for graph interface
Movie Knowledge Graph ML Ecosystem
Embeddings/ Inferences
Input data

32
ArangoML - Ecosystem
●NetworkX
●DGL
●CuGraph
●ArangoRDF
●ArangoML Pipeline
●PyTorch Geometric
●… more to come

33
Nvidia Triton Meets ArangoDB
AI Model Repository
Deploy
Graph ML Model
(GraphSage)
Front-End
Client
Application
N3
N1
N2
N4
N5
N6
ArangoDb
Update
Update
N3
N1
N2
N4
N5
N6
ArangoDb
Retrieve all the node
embeddings of the nbors of
node ‘N5’ which are at 1-Hop
distance
Know Surroundings

34
Takeaway
●Graph Databases
●Recommendation
Systems
●ML + Graph Databases
●Keep Learning –>
Recommender
Systems
Specialization
Google ML
Course
Singular Value
Decomposition (SVD)
Steve Brunton
YouTube | Website

35
Thank you!
●Notebooks
https://github.com/arangodb/interactive_tutorials
○ Collaborative Filtering with AQL
○ Content-based Recommendations with ArangoSearch and TFIDF
○ Content-based Recommendations with FAISS, TFIDF, and Python
○ Graph Neural Networks with PyTorch
○ Matrix Factorization
Test-drive ArangoDB and ArangoML using Oasis
14-days for free
https://github.com/arangoml/
Register now at
https://bit.ly/3blNaKR

Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022

Similar to Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022 (20)

More from ArangoDB Database

More from ArangoDB Database (20)

Recently uploaded

Recently uploaded (20)

Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022

Editor's Notes