http://bit.ly/ArangoDBGraphAnalytics
tl;dr
Graph Analytics
Answer questions from
Graph Data
2
Graph Embeddings and
Graph Neural Networks
Learning Graphs
Graph-based Machine
Learning Metadata
Utilizing Graphs for Operating
ML Infrastructure
https://dzone.com/articles/graph-databases-machine-learning
Challenge...
Agenda ML Infrastructure &
Metadata
Graphs
Graph Database
Graph Analytics
Graph Embeddings
Graphs Neural Networks Part 2
Jörg Schad, PhD
●
○
○
○
●
@joerg_schad
●
●
●
●
●
This workshop...
7
… is for you!
Please share
● Expectations
● Questions
● Feedback
● Ask for breaks if needed
● ….
… is also virtual!
● Let us work together in these times!
Who are you?
8
Background
Expectations
...
This workshop...
9
https://github.com/joerg84/Graph_Powered_ML_Workshop
Why should you care?
10
https://towardsdatascience.com/predictions-and-hopes-for-graph-ml-in-2021-6af2121c3e3d
What problems can we solve?
Graph Analytics
Answer questions from
Graph
- Community
Detection
- Recommendations
- Centrality
- Path Finding
- Fraud Detection
- Permission
Management
- ...
11
Graph Embeddings and
Graph Neural Networks
Learning Graphs
- Node/Link Classification
- Link Prediction
- Classification of Graphs
- ...
Graph-based Machine
Learning Metadata
Utilizing Graphs for Operating
ML Infrastructure
- Data Provenance
- Audit Trails
- Privacy (GDPR/CCPA)
- ,,,
Agenda ML Infrastructure &
Metadata
Graphs
Graph Database
Graph Analytics
Graph Embeddings I
Graphs Neural Networks
Graph Analytics with ArangoDB
Graph Data Model
● Connections are first class citizens
● Vertices and Edges
● Native or build on top of other data models
13
Graph Analytics with ArangoDB
Graph Properties
● (un)directed
○ Facebook vs Twitter
● weighted
● Sparse/Dense
● (a)cyclic
Graph Queries
● Traversals
● Search
● Graph Algorithms
14
Optional Lab: Graphs & Properties
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_properties.ipynb
Graph Analytics with
16
▸
▸
▸
▸
Graph Databases
17
18
AQL - A Query Language That Feels Like Coding
● Common query language for all
data-models
● Aims to be human-readable
● Same language for all clients, no
matter what programming language
people use
● Easy to understand for anyone with
an SQL background
FOR c IN company
FILTER c.name == @companyName
FOR department IN 1..6 INBOUND c isPartOf
RETURN {
c: c.name,
department: department.name,
ordered: (
FOR o IN orders
FILTER o.contact == department.contact
RETURN {date: o.date, amount: o.amount}
)
}
FOR d IN v_imdb
SEARCH
ANALYZER(d.description
IN TOKENS('amazing action world alien sci-fi science documental', 'text_en') ||
BOOST(d.description IN TOKENS('galaxy', 'text_en'), 5), 'text_en')
SORT BM25(d) DESC
LIMIT 10
FOR vertex, edge, path IN 1..1 INBOUND d imdb_edges
FILTER path.edges[0].$label == "DIRECTED"
RETURN DISTINCT {
"director" : vertex.name,
"movie" : d.title
}
ArangoSearch is a powerful search and similarity ranking
engine natively integrated into ArangoDB. Combine search
with any other data model.
19
ArangoSearch
Property-Graph-Model
Languages
● Tinkerpop/Gremlin
● Cypher
● AQL
● ...
● subject, predicate, and object
● No internal structure of nodes/edges
● Languages
● SPARQL
20
Person
name: Max
City
location:
born_in
year: 1984
---
RDF Triple Store
Ontologies & Logic for Inference
21
https://w3c.github.io/rdf-star/
<<:bob foaf:age 23>> ex:certainty 0.9 .
SELECT ?p ?a ?c WHERE {
<<?p foaf:age ?a>> ex:certainty ?c .
}
Support
- Convert to plain RDF (tool)
- Optimized storage/processing
- Conversion to PG (tool)
Max
Job1
start
end
empl
oyer
Lab: SPARQL
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Sparql.ipynb
Graph Modelling
Edge Attribute Vertex Attribute
23
Person
name: Max
rated
rating: 5
---
Person
name: Max Movie:
Free Solo:
Movie:
Free Solo
Rating
rating: 5
gave rated_by
Lab: Property Graph Queries
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graphs_Queries.ipynb
Graph Analytics with ArangoDB
25
http://btimmermans.com/2017/12/11/machine-learning-overview/
(Graph) Analytics
26
https://research.aimultiple.com/graph-analytics/
Why Graph?
Knowledge Graphs and Machine Learning
Graph Algorithms
● Search/Traversal
○ Find a node/edge
○ BFS/DFS (already covered)
● Pathfinding
○ How to get from a to b
● Centrality
○ What are the important nodes (e.g.,
influencer) in a network?
● Cycle Detection
○ Deadlock Detection
○ Network Analysis
● Community Detection
○ Are there subgroups?
30
Shortest Path
● Shortest Path
○ Dijkstra
○ Bellman-Ford
● K shortest path
● Single Source Shortest path
● All-Pairs Shortest Path
31
https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
Minimal Spanning Tree
● Network Broadcast/routing
● Image segmentation
● Algorithms
○ Prim’s algorithm
■ Extend from random start vertex
○ Kruskal’s algorithm
■ Keep choosing cheapest edges as
long as it doesn’t create a cycle
32
https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
Minimal Spanning Tree
33
https://amortizedminds.wordpress.com/tag/algorithm-2/
Minimal Spanning Tree
34
https://amortizedminds.wordpress.com/tag/algorithm-2/
Cycle Detection
● Deadlock Detection
● Network Analysis
● Algorithms
○ DFS
○ Floyd’s algorithm
■ tortoise and the hare algorithm
○ Brent’s algorithm
○ Johnson’s algorithm
35
https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
Community Detection
● Triangle Count
● (Strongly )Connected Components
○ Kosaraju’s algorithm
○ Tarjan’s algorithm
● Label Propagation
● Application
○ Social Networks
○ Clustering
○ …
https://networkx.github.io/documentation/stable/r
eference/algorithms/community.html
36
Topological Sort
●
●
●
● Applications
○ Dependencies
○ Scheduling
■ E.g., Makefiles
37
Maximum flow
●
●
●
●
○
38
Centrality
● Degree Centrality
○ How many in/outgoing connections
● Closeness Centrality
○ Average closeness to all nodes
● Betweenness Centrality
○ Connecting subgroups
○ How often is node on shortest path
● PageRank
○ Transitive Influence
39
https://www.arangodb.com/docs/stable/graphs-pregel.html#vertex-centrality
40
https://networkx.github.io/
Graph ToolBox
● Load and store graphs
● Analyze network structure
● Build network models
● Design new network algorithms
● Visualize
● ...
Optional) Lab: NetworkX
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/NetworkX.ipynb
Lab: Graphs Algorithms
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_properties.ipynb
Graph Analytics with ArangoDB
43
Fraud Detections
Panama papers
Enterprise Hierarchies
Permission Management
Internet Of Things
Bill of Materials
Representation Learning ...
44
https://blog.dgraph.io/post/recommendation/
45
https://www.independent.co.uk/arts-entertainment/films/features/films-best-wat
ch-coronavirus-isolation-quarantine-movies-classic-greatest-essential-list-a939
4006.html
46
User Movie
Rates
47
User Movie
Rates
I
Collaborative Filtering
“Find highly rated movies, by people
who also like movies I rated highly”
1. Find movies I rated with 5 stars
2. Find users who also rated these
movies also with 5 stars
3. Find additional movies also
rated 5 stars by those users
Lab: Graph Analytics
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_Analytics.ipynb
Fraud Detection
49
Bank
Collection
Branch
Collection
Customer
Vertex
Collection
Account
Vertex
Collection
Transaction
Edge
Collection
AccountHolder
Edge
Collection
Lab: Fraud Detection
https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb
51
PageRank works by counting the number and quality of links
to a page to determine a rough estimate of how important the
website is. The underlying assumption is that more important
websites are likely to receive more links from other websites.
Google
https://en.wikipedia.org/wiki/PageRank
52
Goal: How likely a random surfer will end up at a page?
- Random walk across link graph
- Iteratively distributing rank to neighbouring nodes
https://en.wikipedia.org/wiki/PageRank
https://stanford.edu/~rezab/classes/cme323/S15/notes/lec8.pdf
53
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
54
https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
Lab: Pregel
https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Pregel.ipynb
Thanks for listening!
Reach out with Feedback/Questions!
• @arangodb
• https://www.arangodb.com/
• docker pull arangodb
https://www.udemy.com/course/getting-started-with-arangodb/

Graph Analytics with ArangoDB