Graph Exploration w/ Neo4j
1
https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/graph-data-technologies-graph-databases-for-beginners.png
Our Project Partners
2
3GRAPH EXPLORATION
Efficiently extracting knowledge from graph data
even if we do not know exactly what we are looking for
Graph Exploration: From Users to Large Graphs.
CIKM 2016, SIGMOD 2017, KDD 2018
Adaptive Databases
GRAPH EXPLORATION
4
Graph Exploration Stack
Intuitive queries
Interactive algorithms
Users
Graph
Adaptive Databases
Intuitive queries
Interactive algorithms
GRAPH EXPLORATION
5
Graph Exploration Stack
Users
Graph
This project is about ...
6
3,27 × 109
base pairs
Graph Exploration in Biology - Complex Graphs
http://jcs.biologists.org/content/joces/118/21/4947/F3.large.jpg
Graph Exploration in Biology - Status Quo
MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHERE p1.name = 'foo1‘ AND p2.name = 'foo2p‘ AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snpORDER BY snp.sid
RETURN collect(snp.sid)
7
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype
)
WHERE p1.name = 'foo1'
AND p2.name = ‘foo2'
AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snp
MATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
RETURN collect(DISTINCT g.name)
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHEREp1.name = 'foo1'
ANDp2.name = ‘foo2'
ANDa1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snp
MATCH(snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
WITH DISTINCT g ORDER BY g.name
MATCH(g)-[:CODES]-(:Transcript)-[:CODES]-(p:Protein)-[:MEMBER]-(go:Goterm)
WHERE go.namespace= 'biological_process'
WITH DISTINCT go,p
RETURN go.name, count(p) ORDER BY count(p)DESC
LIMIT 10
MATCH
(p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype)
WHERE p1.name = 'foo1'
AND p2.name = ‘foo2'AND a1.p < 0.01 AND a2.p < 0.01
WITH DISTINCT snpMATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene)
WHERE l.feature= 'gene'
WITH DISTINCT gORDER BY g.name
MATCH (g)-[:CODES]-(:Transcript)-[:IS]-(ps:Probeset)-[:SIG]-(s:Sample)
WHERE s.name= 'mustafavi'
RETURN DISTINCT g.name
Can we do better?
8
Problem
● Given two node sets:
How similar are they in my understanding?
● Example
→ Set of movies I like
→ Set of movies I don’t know
→ Will I like the movies I don’t know?
9
As
Good
As It
Gets
Hell or
High
Water
Pulp
Fiction
The
Matrix
Skyfall
Avatar
?
What is a Knowledge Graph?
● (directed) graph G : ⟨V, E, φ, ψ⟩, where 
○ V is a set of nodes,
○ E ⊆ V × V is a set of edges,
○ φ : V → LV
 is an edge labeling function
and
○ ψ : E → LE
is a node labeling function
We refer to the elements of LV
and LE
as node
labels and edge labels
10
What are Meta-Paths?
Graph Path
Graph Schema Meta-Path
A meta-path for a path ⟨n1
, ..., nt
⟩, ni
∈ V , 1 ≤ i ≤ t is a sequence 
P : ⟨φ(n1
),ψ (n1
, n2
), ..., ψ (nt−1
, nt
), φ(nt
)⟩ that alternates node- and
edge-types along the path. 
11
Motivating Example
Q: How famous is Diane Kruger in America?
MATCH(n:Person)
WHERE n.name = “Diane Kruger”
RETURN n
MATCH(m:Movie)
WHERE m.location = “America”
RETURN m
Diane
Kruger
As
Good
As It
Gets
Stand
By MeTop
Gun
Pulp
Fiction A Few
Good
Men
The
Matrix
Up
12
How similar are they?
● Similarity depends on
○ expert knowledge
○ connections among nodes
13
Individualized
exploration
Extract ratingsCompute Meta-Paths
What does the System do and how?
Overview
✓✗
◎
Learn representation
for meta-paths
Calculate
similarity
14
Approximate Meta-Paths
Problem: How to compute all meta-paths fast?
Approx. Solution: Mine meta-paths using the graph’s schema!
Meta-Paths Computation
Compute schema
15
Approximate Meta-Paths
Problem: How to compute all meta-paths fast?
Approx. Solution: Mine meta-paths using the graph’s schema!
Meta-Paths Computation
Meta-Path
approximation
• Extract real and some
non-existent
meta-paths via DFS
on schema
Classifier
• Real meta-paths as
training data
• Predict probability of
meta-path
Exhaustive Meta-Path
computation
• Extract real
meta-paths via DFS
• Early termination
Training data Inference data
Neo4j Graph Algorithm Procedures
Use procedures for schema and meta-paths computation
16
Learning a Meta-Path Embedding
Problem: Vector representation required for active
learning and preference prediction.
Meta-Paths Embedding
?
(3 5 1)T
17
Learning a Meta-Path Embedding
Problem: Vector representation required for active
learning and preference prediction.
Solution: Embed meta-paths
→ Similar meta-paths should have similar vectors.
Challenges:
- Multiple similarity interpretations for meta-paths
- Large number of meta-paths.
- Shared parts between meta-paths.
Our method: Modified word2vec (and paragraph
vectors) method.
Meta-Paths Embedding
(3 5 1)T
18
Learn the Domain Value of all Meta-Paths
- Problem: Users don’t want to rate all meta-paths
→ too many
→ time-consuming
→ tedious and boring
- Solution: Label only a few, but very informative paths
→ explore the space of all meta-paths
→ sequentially query the most uncertain datapoint
⇒ use active learning strategy
Active Learning
✓✗
◎
19
Active Learning
20
Use Learned Preferences for Graph Exploration
Result Explanation
Icons made by Eucalyp from www.flaticon.com is
licensed by CC 3.0 BY
Graph (with
meta-paths)
Domain Knowledge
What is
important in
the graph?
Personalized Exploration
Tool
Similarity Measure
Related Nodes
Stats
21
Personalized Exploration Tool
Result Explanation
Transform Nodes to
Vectors
(Graph-Embedding)
Adapt Vectors Using
Domain-Knowledge
Personalized Vector
Space
precomputed
22
Personalized Exploration Tool
Result Explanation
23
Problem: Suggest similar nodes with regard to given node sets and
user preferences.
Solution: Learn personalized similarity metric between nodes →
Similar nodes should have similar vector space embeddings.
Challenges:
- Different view of abstraction between meta-path preferences and
node instances
- High computational costs but online adaption of embedding space
needed
Our method: Modified node2vec through bias on already
pretrained embeddings
Personalized Exploration Tool
Result Explanation
What nodes
are close to
my selection?How close are
my sets?
Find clusters!
What are
outliers?Personalized Vector
Space
24
System Architecture - How does it work with Neo4j?
Neo4j Graph Database
Neo4j Graph Algorithm Procedures
Containing Meta-Paths Computation
Python Backend Server
ReactJS Frontend
Meta-Path
Embedding
Active Learning Explanation
Node selection
Meta-Path
ordering
Result
visualization
25
● Easy to get your code running in neo4j.
● Neo4j-graph-algorithms: efficiency vs convenience.
● Sometimes no stack-trace when an error occurs.
● Great support and community. Always available.
● Cypher: Easy to begin with, hard to master.
(hpi)-[:LIKES]->(neo4j)
What about neo4j?
Meta-Paths Computation
26
Outlook
4 research topics:
● Efficient meta-paths computation
● Sensible meta-paths embedding
● Precise active learning of meta-paths ranking
● Explanation and exploration using ranked meta-paths
Open Source!
● Meta-paths computation in neo4j graph algos
● MetaExp Repository
● You can use and improve as well!
https://hpi.de/mueller/metaexp/ 27

Neo4j MeetUp - Graph Exploration with MetaExp

  • 1.
    Graph Exploration w/Neo4j 1 https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/graph-data-technologies-graph-databases-for-beginners.png
  • 2.
  • 3.
    3GRAPH EXPLORATION Efficiently extractingknowledge from graph data even if we do not know exactly what we are looking for Graph Exploration: From Users to Large Graphs. CIKM 2016, SIGMOD 2017, KDD 2018
  • 4.
    Adaptive Databases GRAPH EXPLORATION 4 GraphExploration Stack Intuitive queries Interactive algorithms Users Graph
  • 5.
    Adaptive Databases Intuitive queries Interactivealgorithms GRAPH EXPLORATION 5 Graph Exploration Stack Users Graph This project is about ...
  • 6.
    6 3,27 × 109 basepairs Graph Exploration in Biology - Complex Graphs http://jcs.biologists.org/content/joces/118/21/4947/F3.large.jpg
  • 7.
    Graph Exploration inBiology - Status Quo MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype) WHERE p1.name = 'foo1‘ AND p2.name = 'foo2p‘ AND a1.p < 0.01 AND a2.p < 0.01 WITH DISTINCT snpORDER BY snp.sid RETURN collect(snp.sid) 7 MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype ) WHERE p1.name = 'foo1' AND p2.name = ‘foo2' AND a1.p < 0.01 AND a2.p < 0.01 WITH DISTINCT snp MATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene) WHERE l.feature= 'gene' RETURN collect(DISTINCT g.name) MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype) WHEREp1.name = 'foo1' ANDp2.name = ‘foo2' ANDa1.p < 0.01 AND a2.p < 0.01 WITH DISTINCT snp MATCH(snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene) WHERE l.feature= 'gene' WITH DISTINCT g ORDER BY g.name MATCH(g)-[:CODES]-(:Transcript)-[:CODES]-(p:Protein)-[:MEMBER]-(go:Goterm) WHERE go.namespace= 'biological_process' WITH DISTINCT go,p RETURN go.name, count(p) ORDER BY count(p)DESC LIMIT 10 MATCH (p1:Phenotype)-[:HAS]-(a1:Association)-[:HAS]-(snp:Snp)-[:HAS]-(a2:Association)-[:HAS]-(p2:Phenotype) WHERE p1.name = 'foo1' AND p2.name = ‘foo2'AND a1.p < 0.01 AND a2.p < 0.01 WITH DISTINCT snpMATCH (snp)-[:IN]-(pw:PositionWindow)<-[:IN]-(l:Locus)--(g:Gene) WHERE l.feature= 'gene' WITH DISTINCT gORDER BY g.name MATCH (g)-[:CODES]-(:Transcript)-[:IS]-(ps:Probeset)-[:SIG]-(s:Sample) WHERE s.name= 'mustafavi' RETURN DISTINCT g.name
  • 8.
    Can we dobetter? 8
  • 9.
    Problem ● Given twonode sets: How similar are they in my understanding? ● Example → Set of movies I like → Set of movies I don’t know → Will I like the movies I don’t know? 9 As Good As It Gets Hell or High Water Pulp Fiction The Matrix Skyfall Avatar ?
  • 10.
    What is aKnowledge Graph? ● (directed) graph G : ⟨V, E, φ, ψ⟩, where  ○ V is a set of nodes, ○ E ⊆ V × V is a set of edges, ○ φ : V → LV  is an edge labeling function and ○ ψ : E → LE is a node labeling function We refer to the elements of LV and LE as node labels and edge labels 10
  • 11.
    What are Meta-Paths? GraphPath Graph Schema Meta-Path A meta-path for a path ⟨n1 , ..., nt ⟩, ni ∈ V , 1 ≤ i ≤ t is a sequence  P : ⟨φ(n1 ),ψ (n1 , n2 ), ..., ψ (nt−1 , nt ), φ(nt )⟩ that alternates node- and edge-types along the path.  11
  • 12.
    Motivating Example Q: Howfamous is Diane Kruger in America? MATCH(n:Person) WHERE n.name = “Diane Kruger” RETURN n MATCH(m:Movie) WHERE m.location = “America” RETURN m Diane Kruger As Good As It Gets Stand By MeTop Gun Pulp Fiction A Few Good Men The Matrix Up 12
  • 13.
    How similar arethey? ● Similarity depends on ○ expert knowledge ○ connections among nodes 13
  • 14.
    Individualized exploration Extract ratingsCompute Meta-Paths Whatdoes the System do and how? Overview ✓✗ ◎ Learn representation for meta-paths Calculate similarity 14
  • 15.
    Approximate Meta-Paths Problem: Howto compute all meta-paths fast? Approx. Solution: Mine meta-paths using the graph’s schema! Meta-Paths Computation Compute schema 15
  • 16.
    Approximate Meta-Paths Problem: Howto compute all meta-paths fast? Approx. Solution: Mine meta-paths using the graph’s schema! Meta-Paths Computation Meta-Path approximation • Extract real and some non-existent meta-paths via DFS on schema Classifier • Real meta-paths as training data • Predict probability of meta-path Exhaustive Meta-Path computation • Extract real meta-paths via DFS • Early termination Training data Inference data Neo4j Graph Algorithm Procedures Use procedures for schema and meta-paths computation 16
  • 17.
    Learning a Meta-PathEmbedding Problem: Vector representation required for active learning and preference prediction. Meta-Paths Embedding ? (3 5 1)T 17
  • 18.
    Learning a Meta-PathEmbedding Problem: Vector representation required for active learning and preference prediction. Solution: Embed meta-paths → Similar meta-paths should have similar vectors. Challenges: - Multiple similarity interpretations for meta-paths - Large number of meta-paths. - Shared parts between meta-paths. Our method: Modified word2vec (and paragraph vectors) method. Meta-Paths Embedding (3 5 1)T 18
  • 19.
    Learn the DomainValue of all Meta-Paths - Problem: Users don’t want to rate all meta-paths → too many → time-consuming → tedious and boring - Solution: Label only a few, but very informative paths → explore the space of all meta-paths → sequentially query the most uncertain datapoint ⇒ use active learning strategy Active Learning ✓✗ ◎ 19
  • 20.
  • 21.
    Use Learned Preferencesfor Graph Exploration Result Explanation Icons made by Eucalyp from www.flaticon.com is licensed by CC 3.0 BY Graph (with meta-paths) Domain Knowledge What is important in the graph? Personalized Exploration Tool Similarity Measure Related Nodes Stats 21
  • 22.
    Personalized Exploration Tool ResultExplanation Transform Nodes to Vectors (Graph-Embedding) Adapt Vectors Using Domain-Knowledge Personalized Vector Space precomputed 22
  • 23.
    Personalized Exploration Tool ResultExplanation 23 Problem: Suggest similar nodes with regard to given node sets and user preferences. Solution: Learn personalized similarity metric between nodes → Similar nodes should have similar vector space embeddings. Challenges: - Different view of abstraction between meta-path preferences and node instances - High computational costs but online adaption of embedding space needed Our method: Modified node2vec through bias on already pretrained embeddings
  • 24.
    Personalized Exploration Tool ResultExplanation What nodes are close to my selection?How close are my sets? Find clusters! What are outliers?Personalized Vector Space 24
  • 25.
    System Architecture -How does it work with Neo4j? Neo4j Graph Database Neo4j Graph Algorithm Procedures Containing Meta-Paths Computation Python Backend Server ReactJS Frontend Meta-Path Embedding Active Learning Explanation Node selection Meta-Path ordering Result visualization 25
  • 26.
    ● Easy toget your code running in neo4j. ● Neo4j-graph-algorithms: efficiency vs convenience. ● Sometimes no stack-trace when an error occurs. ● Great support and community. Always available. ● Cypher: Easy to begin with, hard to master. (hpi)-[:LIKES]->(neo4j) What about neo4j? Meta-Paths Computation 26
  • 27.
    Outlook 4 research topics: ●Efficient meta-paths computation ● Sensible meta-paths embedding ● Precise active learning of meta-paths ranking ● Explanation and exploration using ranked meta-paths Open Source! ● Meta-paths computation in neo4j graph algos ● MetaExp Repository ● You can use and improve as well! https://hpi.de/mueller/metaexp/ 27