SlideShare a Scribd company logo
1 of 69
Download to read offline
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
Two ways to ask questions during lecture:
 In-person (encouraged)
 On Ed:
▪ At the beginning of class, we will open a new
discussion thread dedicated to this lecture
▪ When to ask on Ed?
▪ If you are watching the livestream remotely
▪ If you have a minor clarifying question
▪ If we run out of time to get to your question live
▪ Otherwise, try raising your hand first!
▪ Class goes till 3pm (not 2:50pm, sorry)
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 3
 Colabs 0 and 1 will be released on our course
website at 3pm today (Thu 9/23)
 Colab 0:
▪ Does not need to be handed-in
▪ TAs will hold two recitations (on Zoom) to walk
through Colab 0 with you:
▪ Federico – Friday (9/24), 3-5pm PT
▪ Yige – Monday (9/27), 10am-12pm PT
▪ Links to Zoom will be posted on Ed
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 4
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 5
 Colabs 0 and 1 will be released on our course
website at 3pm today (Thu 9/23)
 Colab 1:
▪ Due on Thursday 10/07 (2 weeks from today)
▪ Submit written answers and code on Gradescope
▪ Will cover material from Lectures 1-4, but you can
get started right away!
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Node-level prediction
 Link-level prediction
 Graph-level prediction
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 7
C
A
B
D E
H
F
G
Link-level
?
Node-level
?
?
Graph-level
 Design features for nodes/links/graphs
 Obtain features for all training data
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 8
C
A
B
D E
H
F
G
Node features
Graph features
Link features
∈ ℝ𝐷
∈ ℝ𝐷
∈ ℝ𝐷
 Train an ML model:
▪ Random forest
▪ SVM
▪ Neural network, etc.
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 9
𝒙𝟏 𝑦1
𝒙𝑵 𝑦𝑁
 Apply the model:
▪ Given a new
node/link/graph, obtain
its features and make a
prediction
𝒙 𝑦
 Using effectivefeatures over graphs is the key
to achieving good model performance.
 Traditional ML pipeline uses hand-designed
features.
 In this lecture, we overview the traditional
features for:
▪ Node-level prediction
▪ Link-level prediction
▪ Graph-level prediction
 For simplicity, we focus on undirected graphs.
9/27/2021 10
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Goal: Make predictions for a set of objects
Design choices:
 Features: d-dimensional vectors
 Objects: Nodes, edges, sets of nodes,
entire graphs
 Objective function:
▪ What task are we aiming to solve?
11
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Example: Node-level prediction
 Given:
 Learn a function:
How do we learn the function?
12
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
14
? ?
?
?
?
Machine
Learning
Node classification
ML needs features.
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Goal: Characterize the structure and position of
a node in the network:
▪ Node degree
▪ Node centrality
▪ Clustering coefficient
▪ Graphlets
9/27/2021 15
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
H
F
G
Node feature
 The degree 𝑘𝑣 of node 𝑣 is the number of
edges (neighboring nodes) the node has.
 Treats all neighboring nodes equally.
9/27/2021 16
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
H
F
G
𝑘𝐴 = 1
𝑘𝐵 = 2
𝑘𝐶 = 3
𝑘𝐷 = 4
 Node degree counts the neighboring nodes
without capturing their importance.
 Node centrality 𝑐𝑣 takes the node importance
in a graph into account
 Differentways to model importance:
▪ Engienvector centrality
▪ Betweenness centrality
▪ Closeness centrality
▪ and many others…
9/27/2021 17
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Eigenvector centrality
 Eigenvector centrality:
▪ A node 𝑣 is important if surrounded by important
neighboring nodes 𝑢 ∈ 𝑁(𝑣).
▪ We model the centrality of node 𝑣 as the sum of
the centrality of neighboring nodes:
▪ Notice that the above equation models centrality
in a recursive manner. How do we solve it?
9/27/2021 18
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝜆 is normalization constant (it will turn
out to be the largest eigenvalue of A)
 Eigenvector centrality:
▪ Rewrite the recursive equation in the matrix form.
▪ We see that centrality 𝑐 is the eigenvector of 𝑨!
▪ The largest eigenvalue 𝜆𝑚𝑎𝑥 is always positive and
unique (by Perron-Frobenius Theorem).
▪ The eigenvector 𝒄𝑚𝑎𝑥 corresponding to 𝜆𝑚𝑎𝑥 is
used for centrality.
9/27/2021 19
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
B
C
D I
J
K
𝜆𝒄 = 𝑨𝒄
• 𝑨: Adjacency matrix
𝑨𝑢𝑣= 1 if 𝑢 ∈ 𝑁(𝑣)
• 𝒄: Centrality vector
• 𝜆: Eigenvalue
𝜆 is normalization const
(largest eigenvalue of A)
 Betweennesscentrality:
▪ A node is important if it lies on many shortest
paths between other nodes.
▪ Example:
9/27/2021 20
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
𝑐𝐴 = 𝑐𝐵 = 𝑐𝐸 = 0
𝑐𝐶 = 3
(A-C-B, A-C-D, A-C-D-E)
𝑐𝐷 = 3
(A-C-D-E, B-D-E, C-D-E)
 Closeness centrality:
▪ A node is important if it has small shortest path
lengths to all other nodes.
▪ Example:
9/27/2021 21
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
𝑐𝐴 = 1/(2 + 1 + 2 + 3) = 1/8
(A-C-B, A-C, A-C-D, A-C-D-E)
𝑐𝐷 = 1/(2 + 1 + 1 + 1) = 1/5
(D-C-A, D-B, D-C, D-E)
 Measures how connected 𝑣′
𝑠 neighboring
nodes are:
 Examples:
9/27/2021 22
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝑣
𝑣 𝑣
𝑒𝑣 = 1 𝑒𝑣 = 0.5 𝑒𝑣 = 0
#(node pairs among 𝑘𝑣 neighboring nodes)
In our examples below the denominator is 6 (4 choose 2).
 Observation: Clustering coefficient counts the
#(triangles) in the ego-network
 We can generalize the above by counting
#(pre-specified subgraphs,i.e., graphlets).
9/27/2021 23
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝑣 𝑣
3 triangles (out of 6 node triplets)
𝑒𝑣 = 0.5
𝑣 𝑣
 Goal: Describe network structure around node 𝑢
▪ Graphlets are small subgraphs that describe the
structure of node 𝑢’s network neighborhood
Analogy:
 Degree counts #(edges) that a node touches
 Clustering coefficient counts #(triangles) that a
node touches.
 Graphlet Degree Vector (GDV): Graphlet-base
features for nodes
▪ GDV counts #(graphlets) that a node touches
9/27/2021 24
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
𝑢 E
 Considering graphlets of size 2-5 nodes we get:
▪ Vector of 73 coordinates is a signature of a node
that describes the topology of node's neighborhood
 Graphlet degree vector provides a measure of
a node’s local network topology:
▪ Comparing vectors of two nodes provides a more
detailed measure of local topological similarity than
node degrees or clustering coefficient.
9/27/2021 25
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
𝑢 E
 Def: Induced subgraph is another graph, formed
from a subset of vertices and all of the edges
connecting the vertices in that subset.
 Def: Graph Isomorphism
▪ Two graphs which contain the same number of nodes
connected in the same way are said to be isomorphic.
9/27/2021 26
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Isomorphic
Node mapping: (e2,c2), (e1, c5),
(e3,c4), (e5,c3), (e4,c1)
Non-Isomorphic
The right graph has cycles of length 3 but he left
graph does not, so the graphs cannot be isomorphic.
C
A
B
𝑢 E
C
B
𝑢
Induced
subgraph:
C
B
𝑢
Not induced
subgraph:
Source: Mathoverflow
Graphlets: Rooted connected
induced non-isomorphic subgraphs:
9/27/2021 27
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Przulj et al., Bioinformatics2004
There are 73 different graphlets on up to 5 nodes
Graphlet id (Root/
“position” of node 𝑢)
C
A
B
𝑢 E
𝑢 has
graphlets:
0, 1, 2, 3, 5,
10, 11, …
Take some nodes
and all the edges
between them.
 Graphlet Degree Vector (GDV): A count
vector of graphlets rooted at a given node.
 Example:
9/27/2021 28
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
et Degree Vector
rphism “orbit” takes into account the
s of the graph
et degree vector is a feature vector with
ncy of the node in each orbit position
Pedro Ribeiro
Graphlet Degree Vector
An automorphism “orbit” takes into account the
symmetries of the graph
The graphlet degree vector is a feature vector with
the frequency of the node in each orbit position
aphlet Degree Vector
utomorphism “orbit” takes into account the
metries of the graph
graphlet degree vector is a feature vector with
frequency of the node in each orbit position
Possible graphlets up to size 3
𝑢
𝑎 𝑏 𝑐 𝑑
GDV of node 𝑢:
𝑎, 𝑏, 𝑐, 𝑑
[2,1,0,2]
Graphlet instances of node u:
 We have introduced different ways to obtain
node features.
 They can be categorized as:
▪ Importance-based features:
▪ Node degree
▪ Different node centrality measures
▪ Structure-based features:
▪ Node degree
▪ Clustering coefficient
▪ Graphlet count vector
9/27/2021 29
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Importance-based features: capture the
importance of a node in a graph
▪ Node degree:
▪ Simply counts the number of neighboring nodes
▪ Node centrality:
▪ Models importance of neighboring nodes in a graph
▪ Different modeling choices: eigenvector centrality,
betweenness centrality, closeness centrality
 Useful for predicting influential nodes in a graph
▪ Example: predicting celebrity users in a social
network
9/27/2021 30
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Structure-based features:Capture topological
properties of local neighborhood around a node.
▪ Node degree:
▪ Counts the number of neighboring nodes
▪ Clustering coefficient:
▪ Measures how connected neighboring nodes are
▪ Graphlet degree vector:
▪ Counts the occurrences of different graphlets
 Useful for predicting a particular role a node
plays in a graph:
▪ Example: Predicting protein functionality in a
protein-protein interaction network.
9/27/2021 31
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Differentways to label nodes of the network:
32
Figure 3: Complementary visualizations of Les Misérables co-
appearance network generated by node2vec with label colors
reflecting homophily (top) and structural equivalence(bottom).
also exclude arecent approach, GraRep [6], that generalizes LINE
to incorporate information from network neighborhoods beyond 2-
hops, but does not scale and hence, provides an unfair comparison
withother neural embedding basedfeaturelearning methods. Apart
from spectral clustering which has aslightly higher time complex-
ity sinceit involvesmatrix factorization, our experiments stand out
from prior work in thesensethat all other comparison benchmarks
Algorithm Dataset
BlogCatalog PPI Wikip
Spectral Clustering 0.0405 0.0681 0.039
DeepWalk 0.2110 0.1768 0.127
LINE 0.0784 0.1447 0.116
node2vec 0.2581 0.1791 0.155
node2vec settings (p,q) 0.25, 0.25 4, 1 4, 0.
Gain of node2vec [%] 22.3 1.3 21.8
Table 2: Macro-F1 scores for multilabel classification on
Catalog, PPI (Homo sapiens) and Wikipedia word coo
rencenetworkswith a balanced 50% train-test split.
labeled data with a grid search over p, q 2 { 0.25, 0.50, 1,
Under the above experimental settings, we present our resu
two tasks under consideration.
4.3 Multi-label classification
In the multi-label classification setting, every node is ass
oneor morelabelsfrom afiniteset L . During thetraining pha
observe a certain fraction of nodes and all their labels. The
to predict the labels for the remaining nodes. This is a challe
task especially if L is large. We perform multi-label classifi
on thefollowing datasets:
• BlogCatalog [44]: This is a network of social relation
of the bloggers listed on the BlogCatalog website. T
bels represent blogger interests inferred through the
dataprovidedby thebloggers. Thenetwork has10,312
333,983 edges and 39 different labels.
• Protein-Protein Interactions (PPI) [5]: We use a sub
of the PPI network for Homo Sapiens. The subgrap
responds to the graph induced by nodes for which we
obtain labels from the hallmark gene sets [21] and rep
Node features defined so
far would allow to
distinguish nodes in the
above example
However, the features
defines so far would not
allow for distinguishing the
above node labelling
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 The task is to predict new links based on the
existing links.
 At test time, node pairs (with no existing links)
are ranked, and top 𝐾 node pairs are predicted.
 The key is to design features for a pair of nodes.
9/27/2021 34
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
H
F
G
?
?
Two formulations of the link prediction task:
 1) Links missing at random:
▪ Remove a random set of links and then
aim to predict them
 2) Links over time:
▪ Given 𝐺[𝑡0,𝑡0
′
] a graph defined by edges
up to time 𝑡0
′
, output a ranked list L
of edges (not in 𝐺[𝑡0,𝑡0
′
]) that are
predicted to appear in time 𝐺[𝑡1, 𝑡1
′
]
▪ Evaluation:
▪ n = |Enew|: # new edges that appear during
the test period [𝑡1, 𝑡1
′
]
▪ Take top n elements of L and count correct edges
9/27/2021 Jure Leskovec, Stanford CS224W:Machine Learning withGraphs, http://cs224w.stanford.edu 35
𝐺[𝑡0,𝑡0
′ ]
𝐺[𝑡1,𝑡1
′]
 Methodology:
▪ For each pair of nodes (x,y) compute score c(x,y)
▪ For example, c(x,y) could be the # of common neighbors
of x and y
▪ Sort pairs (x,y) by the decreasing score c(x,y)
▪ Predict top n pairs as new links
▪ See which of these links actually
appear in 𝐺[𝑡1, 𝑡1
′
]
9/27/2021 Jure Leskovec, Stanford CS224W:Machine Learning withGraphs, http://cs224w.stanford.edu 36
X
 Distance-based feature
 Local neighborhoodoverlap
 Global neighborhoodoverlap
9/27/2021 37
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
H
F
G
Link feature
Shortest-path distance between two nodes
 Example:
 However, this does not capture the degree of
neighborhood overlap:
▪ Node pair (B, H) has 2 shared neighboring nodes,
while pairs (B, E) and (A, B) only have 1 such node.
9/27/2021 38
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝑆𝐵𝐻 = 𝑆𝐵𝐸 = 𝑆𝐴𝐵 = 2
C
A
B
D E
H
F
G
𝑆𝐵𝐺 = 𝑆𝐵𝐹 = 3
Captures # neighboring nodes shared between
two nodes 𝒗𝟏 and 𝒗𝟐:
 Common neighbors: |𝑁 𝑣1 ∩ 𝑁 𝑣2 |
▪ Example: 𝑁 𝐴 ∩ 𝑁 𝐵 = 𝐶 = 1
 Jaccard’s coefficient:
|𝑁 𝑣1 ∩𝑁 𝑣2 |
|𝑁 𝑣1 ∪𝑁 𝑣2 |
▪ Example:
𝑁 𝐴 ∩𝑁 𝐵
𝑁 𝐴 ∪𝑁 𝐵
=
{𝐶}
{𝐶,𝐷}
=
1
2
 Adamic-Adar index:
σ𝑢∈𝑁 𝑣1 ∩𝑁 𝑣2
1
log(𝑘𝑢)
▪ Example:
1
log(𝑘𝐶)
=
1
log 4
9/27/2021 39
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
F
𝑁𝐴
𝑁𝐵
 Limitation of local neighborhood features:
▪ Metric is always zero if the two nodes do not have
any neighbors in common.
▪ However, the two nodes may still potentially be
connected in the future.
 Global neighborhood overlap metrics resolve
the limitation by considering the entire graph.
9/27/2021 40
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
F
𝑁𝐴
𝑁𝐸
𝑁𝐴 ∩ 𝑁𝐸 = 𝜙
|𝑁𝐴 ∩ 𝑁𝐸| = 0
 Katz index: count the number of walksof all
lengths between a given pair of nodes.
 Q: How to compute #walks between two
nodes?
 Use powers of the graph adjacency matrix!
9/27/2021 41
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Computing #walks between two nodes
▪ Recall: 𝑨𝑢𝑣 = 1 if 𝑢 ∈ 𝑁(𝑣)
▪ Let 𝑷𝒖𝒗
(𝑲)
= #walks of length 𝑲 between 𝒖 and 𝒗
▪ We will show 𝑷(𝑲)
= 𝑨𝒌
▪ 𝑷𝒖𝒗
(𝟏)
= #walks of length 1 (direct neighborhood)
between 𝑢 and 𝑣 = 𝑨𝒖𝒗
9/27/2021 42
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
1
4
3
2
𝑷𝟏𝟐
(𝟏)
= 𝑨𝟏𝟐
 How to compute ?
▪ Step 1: Compute #walks of length 1 between each
of 𝒖’s neighbor and 𝒗
▪ Step 2: Sum up these #walks across u’s neighbors
▪ 𝑷𝒖𝒗
(𝟐)
= σ𝒊 𝑨𝒖𝒊 ∗ 𝑷𝒊𝒗
(𝟏)
= σ𝒊 𝑨𝒖𝒊 ∗ 𝑨𝒊𝒗 = 𝑨𝒖𝒗
𝟐
9/27/2021 43
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Node 1’s neighbors
#walks of length 1 between
Node 1’s neighbors and Node 2 𝑷𝟏𝟐
(𝟐)
= 𝑨12
2
Power of
adjacency
 Katz index: count the number of walksof all
lengths between a pair of nodes.
 How to compute #walks between two nodes?
 Use adjacency matrix powers!
▪ 𝑨𝑢𝑣 specifies #walks of length 1 (direct
neighborhood) between 𝑢 and 𝑣.
▪ 𝑨𝑢𝑣
𝟐
specifies #walks of length 2 (neighbor of
neighbor) between 𝑢 and 𝑣.
▪ And, 𝑨𝑢𝑣
𝒍
specifies #walks of length 𝒍.
9/27/2021 44
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Katz index between 𝑣1 and 𝑣2 is calculated as
 Katz index matrix is computed in closed-form:
9/27/2021 45
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
0 < 𝛽 < 1: discount factor
= σ𝑖=0
∞
𝛽𝑖
𝑨𝑖
by geometric series of matrices
#walks of length 𝑙
between 𝑣1 and 𝑣2
Sum over all walk lengths
 Distance-basedfeatures:
▪ Uses the shortest path length between two nodes
but does not capture how neighborhood overlaps.
 Local neighborhood overlap:
▪ Captures how many neighboring nodes are shared
by two nodes.
▪ Becomes zero when no neighbor nodes are shared.
 Global neighborhood overlap:
▪ Uses global graph structure to score two nodes.
▪ Katz index counts #walks of all lengths between two
nodes.
9/27/2021 46
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
CS224W: Machine Learning with Graphs
Jure Leskovec, Stanford University
http://cs224w.stanford.edu
 Goal: We want features that characterize the
structure of an entire graph.
 For example:
9/27/2021 48
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
C
A
B
D E
H
F
G
 Kernel methods are widely-used for traditional
ML for graph-level prediction.
 Idea: Design kernels instead of feature vectors.
 A quick introduction to Kernels:
▪ Kernel 𝐾 𝐺, 𝐺′
∈ ℝ measures similarity b/w data
▪ Kernel matrix 𝑲 = 𝐾 𝐺, 𝐺′
𝐺,𝐺′ must always be
positive semidefinite (i.e., has positive eigenvalues)
▪ There exists a feature representation 𝜙(∙) such that
𝐾 𝐺, 𝐺′ = 𝜙 G T𝜙 𝐺′
▪ Once the kernel is defined, off-the-shelf ML model,
such as kernel SVM, can be used to make predictions.
9/27/2021 49
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Graph Kernels: Measure similarity between
two graphs:
▪ Graphlet Kernel [1]
▪ Weisfeiler-Lehman Kernel [2]
▪ Other kernels are also proposed in the literature
(beyond the scope of this lecture)
▪ Random-walk kernel
▪ Shortest-pathgraph kernel
▪ And many more…
9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 50
[1] Shervashidze, Nino, et al. "Efficient graphlet kernels for large graph comparison." Artificial Intelligence and Statistics. 2009.
[2] Shervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.9 (2011).
 Goal: Design graph feature vector 𝜙 𝐺
 Key idea: Bag-of-Words (BoW) for a graph
▪ Recall: BoW simply uses the word counts as
features for documents (no ordering considered).
▪ Naïve extension to a graph: Regard nodes as words.
▪ Since both graphs have 4 red nodes, we get the
same feature vector for two different graphs…
9/27/2021 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 51
𝜙( ) = 𝜙
What if we use Bag of node degrees?
Deg1: Deg2: Deg3:
 Both Graphlet Kernel and Weisfeiler-Lehman
(WL) Kernel use Bag-of-* representation of
graph, where * is more sophisticatedthan
node degrees!
9/27/2021 52
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
𝜙( ) = count( ) = [1, 2, 1]
𝜙( ) = count( ) = [0, 2, 2]
Obtains different features
for different graphs!
 Key idea: Count the number of different
graphlets in a graph.
▪ Note: Definition of graphlets here is slightly
different from node-level features.
▪ The two differences are:
▪ Nodes in graphlets here do not need to be connected (allows for
isolated nodes)
▪ The graphlets here are not rooted.
▪ Examples in the next slide illustrate this.
9/27/2021 53
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
Let 𝓖𝒌 = (𝒈𝟏, 𝒈𝟐,… , 𝒈𝒏𝒌
) be a list of graphlets
of size 𝒌.
▪ For 𝑘 = 3 , there are 4 graphlets.
▪ For 𝑘 = 4 , there are 11 graphlets.
9/27/2021 54
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝑔1 𝑔2 𝑔3 𝑔4
Shervashidzeet al., AISTATS 2011
 Given graph 𝐺, and a graphlet list 𝒢𝑘 =
(𝑔1, 𝑔2,… , 𝑔𝑛𝑘
), define the graphlet count
vector 𝒇𝐺 ∈ ℝ𝑛𝑘 as
(𝒇𝐺)𝑖= #(𝑔𝑖 ⊆ 𝐺) for 𝑖 = 1,2,… , 𝑛𝑘.
9/27/2021 55
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Example for 𝑘 = 3.
9/27/2021 56
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝑔1 𝑔2 𝑔3 𝑔4
𝐺
𝒇𝐺 = (1, 3, 6, 0)T
 Given two graphs, 𝐺 and 𝐺′
, graphlet kernel is
computed as
𝐾 𝐺, 𝐺′
= 𝒇𝐺
T
𝒇𝐺′
 Problem: if 𝐺 and 𝐺′
have different sizes, that
will greatly skew the value.
 Solution: normalize each feature vector
9/27/2021 57
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
𝐾 𝐺, 𝐺′
= 𝒉𝐺
T
𝒉𝐺′
Limitations: Counting graphlets is expensive!
 Counting size-𝑘 graphlets for a graph with size 𝑛
by enumeration takes 𝑛𝑘
.
 This is unavoidable in the worst-casesince
subgraph isomorphism test (judging whether a
graph is a subgraph of another graph) is NP-hard.
 If a graph’s node degree is bounded by 𝑑, an
𝑂(𝑛𝑑𝑘−1
) algorithm exists to count all the
graphlets of size 𝑘.
Can we design a more efficient graph kernel?
9/27/2021 58
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Goal: Design an efficient graph feature
descriptor 𝜙 𝐺
 Idea: Use neighborhoodstructure to
iteratively enrich node vocabulary.
▪ Generalized version of Bag of node degrees since
node degrees are one-hop neighborhood
information.
 Algorithm to achieve this:
Color refinement
9/27/2021 59
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Given: A graph 𝐺 with a set of nodes 𝑉.
▪ Assign an initial color 𝑐 0
𝑣 to each node 𝑣.
▪ Iteratively refine node colors by
𝑐 𝑘+1
𝑣 = HASH 𝑐 𝑘
𝑣 , 𝑐 𝑘
𝑢
𝑢∈𝑁 𝑣
,
where HASH maps different inputs to different colors.
▪ After 𝐾 steps of color refinement, 𝑐 𝐾
𝑣
summarizes the structure of 𝐾-hop neighborhood
9/27/2021 60
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
Example of color refinementgiven two graphs
▪ Assign initial colors
▪ Aggregate neighboring colors
9/27/2021 61
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
1 1
1
1 1
1
1 1
1
1 1
1
1,111 1,11
1,1111
1,1 1,1
1,111
1,11 1,111
1,1111
1,1 1,1
1,111
𝐺1
𝐺2
Example of color refinementgiven two graphs
▪ Aggregated colors
▪ Hash aggregated colors
9/27/2021 62
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
4 3
5
2 2
4
3 4
5
2 2
4
Hash table
1,1
1,11
1,111
1,1111
-->
-->
-->
-->
2
3
4
5
1,111 1,11
1,1111
1,1 1,1
1,111
1,11 1,111
1,1111
1,1 1,1
1,111
Example of color refinementgiven two graphs
▪ Aggregated colors
▪ Hash aggregated colors
9/27/2021 63
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
4,345 3,44
5,2244
2,5 2,5
4,345
3,45 4,345
5,2344
2,5 2,4
4,245
4 3
5
2 2
4
3 4
5
2 2
4
Example of color refinementgiven two graphs
▪ Aggregated colors
▪ Hash aggregated colors
9/27/2021 64
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
11 8
12
7 7
11
9 11
13
7 6
10
Hash table
2,4
2,5
3,44
3,45
4,245
4,345
5,2244
5,2344
-->
-->
-->
-->
-->
-->
-->
-->
6
7
8
9
10
11
12
13
4,345 3,44
5,2244
2,5 2,5
4,345
3,45 4,345
5,2344
2,5 2,4
4,245
After color refinement,WL kernel counts number
of nodes with a given color.
9/27/2021 65
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
= [6,2,1,2,1,0,2,1,0, 0, 0, 2, 1]
Counts
Colors
1,2,3,4,5,6,7,8,9,10,11,12,13
= [6,2,1,2,1,1,1,0,1, 1, 1, 0, 1]
1,2,3,4,5,6,7,8,9,10,11,12,13
The WL kernel value is computed by the inner
product of the color count vectors:
K( , )
=
= 49
9/27/2021 66
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
 WL kernel is computationally efficient
▪ The time complexity for color refinement at each step is
linear in #(edges), since it involves aggregating neighboring
colors.
 When computing a kernel value, only colors
appeared in the two graphs need to be tracked.
▪ Thus, #(colors) is at most the total number of nodes.
 Counting colors takes linear-time w.r.t. #(nodes).
 In total, time complexity is linear in #(edges).
9/27/2021 67
Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
 Graphlet Kernel
▪ Graph is represented as Bag-of-graphlets
▪ Computationally expensive
 Weisfeiler-LehmanKernel
▪ Apply 𝐾-step color refinement algorithm to enrich
node colors
▪ Different colors capture different 𝐾-hop neighborhood
structures
▪ Graph is represented as Bag-of-colors
▪ Computationally efficient
▪ Closely related to Graph Neural Networks (as we
will see!)
9/27/2021 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 68
 Traditional ML Pipeline
▪ Hand-crafted feature + ML model
 Hand-crafted features for graph data
▪ Node-level:
▪ Node degree, centrality, clustering coefficient, graphlets
▪ Link-level:
▪ Distance-based feature
▪ local/global neighborhood overlap
▪ Graph-level:
▪ Graphlet kernel, WL kernel
9/27/2021 69
Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu

More Related Content

Similar to 02-tradition-ml.pdf

1 chayes
1 chayes1 chayes
1 chayesYandex
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model佳蓉 倪
 
lecture09-graphs.pptx
lecture09-graphs.pptxlecture09-graphs.pptx
lecture09-graphs.pptxNageshBabu34
 
graphs data structure and algorithm link list
graphs data structure and algorithm link listgraphs data structure and algorithm link list
graphs data structure and algorithm link listKamranAli649587
 
Intro to Software Engineering - Requirements Analysis
Intro to Software Engineering - Requirements AnalysisIntro to Software Engineering - Requirements Analysis
Intro to Software Engineering - Requirements AnalysisRadu_Negulescu
 
End-to-End Network Performance Estimation Using Signal ComplexitySlides
End-to-End Network Performance Estimation Using Signal ComplexitySlidesEnd-to-End Network Performance Estimation Using Signal ComplexitySlides
End-to-End Network Performance Estimation Using Signal ComplexitySlidesTokyo University of Science
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Sergey Karayev
 
12735 cse539 at2 hw
12735 cse539 at2 hw12735 cse539 at2 hw
12735 cse539 at2 hwAnkur Saxena
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONaftab alam
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Pirouz Nourian
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Unit 5 Java Programming with Linux-converted.pdf
Unit 5 Java Programming with Linux-converted.pdfUnit 5 Java Programming with Linux-converted.pdf
Unit 5 Java Programming with Linux-converted.pdfranjithunni35
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptxmustafa sarac
 
Automatic Generation of Game Content using a Graph-based Wave Function Collap...
Automatic Generation of Game Content using a Graph-based Wave Function Collap...Automatic Generation of Game Content using a Graph-based Wave Function Collap...
Automatic Generation of Game Content using a Graph-based Wave Function Collap...Hwanhee Kim
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisVitomir Kovanovic
 

Similar to 02-tradition-ml.pdf (20)

1 chayes
1 chayes1 chayes
1 chayes
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
lecture09-graphs.pptx
lecture09-graphs.pptxlecture09-graphs.pptx
lecture09-graphs.pptx
 
graphs data structure and algorithm link list
graphs data structure and algorithm link listgraphs data structure and algorithm link list
graphs data structure and algorithm link list
 
Graph
GraphGraph
Graph
 
Intro to Software Engineering - Requirements Analysis
Intro to Software Engineering - Requirements AnalysisIntro to Software Engineering - Requirements Analysis
Intro to Software Engineering - Requirements Analysis
 
End-to-End Network Performance Estimation Using Signal ComplexitySlides
End-to-End Network Performance Estimation Using Signal ComplexitySlidesEnd-to-End Network Performance Estimation Using Signal ComplexitySlides
End-to-End Network Performance Estimation Using Signal ComplexitySlides
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
 
12735 cse539 at2 hw
12735 cse539 at2 hw12735 cse539 at2 hw
12735 cse539 at2 hw
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Unit 5 Java Programming with Linux-converted.pdf
Unit 5 Java Programming with Linux-converted.pdfUnit 5 Java Programming with Linux-converted.pdf
Unit 5 Java Programming with Linux-converted.pdf
 
SolidModeling.ppt
SolidModeling.pptSolidModeling.ppt
SolidModeling.ppt
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptx
 
Automatic Generation of Game Content using a Graph-based Wave Function Collap...
Automatic Generation of Game Content using a Graph-based Wave Function Collap...Automatic Generation of Game Content using a Graph-based Wave Function Collap...
Automatic Generation of Game Content using a Graph-based Wave Function Collap...
 
Introduction to Epistemic Network Analysis
Introduction to Epistemic Network AnalysisIntroduction to Epistemic Network Analysis
Introduction to Epistemic Network Analysis
 

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 

Recently uploaded (20)

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 

02-tradition-ml.pdf

  • 1. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 2. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 3. Two ways to ask questions during lecture:  In-person (encouraged)  On Ed: ▪ At the beginning of class, we will open a new discussion thread dedicated to this lecture ▪ When to ask on Ed? ▪ If you are watching the livestream remotely ▪ If you have a minor clarifying question ▪ If we run out of time to get to your question live ▪ Otherwise, try raising your hand first! ▪ Class goes till 3pm (not 2:50pm, sorry) 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 3
  • 4.  Colabs 0 and 1 will be released on our course website at 3pm today (Thu 9/23)  Colab 0: ▪ Does not need to be handed-in ▪ TAs will hold two recitations (on Zoom) to walk through Colab 0 with you: ▪ Federico – Friday (9/24), 3-5pm PT ▪ Yige – Monday (9/27), 10am-12pm PT ▪ Links to Zoom will be posted on Ed 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 4
  • 5. 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 5  Colabs 0 and 1 will be released on our course website at 3pm today (Thu 9/23)  Colab 1: ▪ Due on Thursday 10/07 (2 weeks from today) ▪ Submit written answers and code on Gradescope ▪ Will cover material from Lectures 1-4, but you can get started right away!
  • 6. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 7.  Node-level prediction  Link-level prediction  Graph-level prediction 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 7 C A B D E H F G Link-level ? Node-level ? ? Graph-level
  • 8.  Design features for nodes/links/graphs  Obtain features for all training data 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 8 C A B D E H F G Node features Graph features Link features ∈ ℝ𝐷 ∈ ℝ𝐷 ∈ ℝ𝐷
  • 9.  Train an ML model: ▪ Random forest ▪ SVM ▪ Neural network, etc. 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 9 𝒙𝟏 𝑦1 𝒙𝑵 𝑦𝑁  Apply the model: ▪ Given a new node/link/graph, obtain its features and make a prediction 𝒙 𝑦
  • 10.  Using effectivefeatures over graphs is the key to achieving good model performance.  Traditional ML pipeline uses hand-designed features.  In this lecture, we overview the traditional features for: ▪ Node-level prediction ▪ Link-level prediction ▪ Graph-level prediction  For simplicity, we focus on undirected graphs. 9/27/2021 10 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 11. Goal: Make predictions for a set of objects Design choices:  Features: d-dimensional vectors  Objects: Nodes, edges, sets of nodes, entire graphs  Objective function: ▪ What task are we aiming to solve? 11 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 12. Example: Node-level prediction  Given:  Learn a function: How do we learn the function? 12 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 13. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 14. 14 ? ? ? ? ? Machine Learning Node classification ML needs features. 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 15. Goal: Characterize the structure and position of a node in the network: ▪ Node degree ▪ Node centrality ▪ Clustering coefficient ▪ Graphlets 9/27/2021 15 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E H F G Node feature
  • 16.  The degree 𝑘𝑣 of node 𝑣 is the number of edges (neighboring nodes) the node has.  Treats all neighboring nodes equally. 9/27/2021 16 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E H F G 𝑘𝐴 = 1 𝑘𝐵 = 2 𝑘𝐶 = 3 𝑘𝐷 = 4
  • 17.  Node degree counts the neighboring nodes without capturing their importance.  Node centrality 𝑐𝑣 takes the node importance in a graph into account  Differentways to model importance: ▪ Engienvector centrality ▪ Betweenness centrality ▪ Closeness centrality ▪ and many others… 9/27/2021 17 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu Eigenvector centrality
  • 18.  Eigenvector centrality: ▪ A node 𝑣 is important if surrounded by important neighboring nodes 𝑢 ∈ 𝑁(𝑣). ▪ We model the centrality of node 𝑣 as the sum of the centrality of neighboring nodes: ▪ Notice that the above equation models centrality in a recursive manner. How do we solve it? 9/27/2021 18 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝜆 is normalization constant (it will turn out to be the largest eigenvalue of A)
  • 19.  Eigenvector centrality: ▪ Rewrite the recursive equation in the matrix form. ▪ We see that centrality 𝑐 is the eigenvector of 𝑨! ▪ The largest eigenvalue 𝜆𝑚𝑎𝑥 is always positive and unique (by Perron-Frobenius Theorem). ▪ The eigenvector 𝒄𝑚𝑎𝑥 corresponding to 𝜆𝑚𝑎𝑥 is used for centrality. 9/27/2021 19 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu B C D I J K 𝜆𝒄 = 𝑨𝒄 • 𝑨: Adjacency matrix 𝑨𝑢𝑣= 1 if 𝑢 ∈ 𝑁(𝑣) • 𝒄: Centrality vector • 𝜆: Eigenvalue 𝜆 is normalization const (largest eigenvalue of A)
  • 20.  Betweennesscentrality: ▪ A node is important if it lies on many shortest paths between other nodes. ▪ Example: 9/27/2021 20 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E 𝑐𝐴 = 𝑐𝐵 = 𝑐𝐸 = 0 𝑐𝐶 = 3 (A-C-B, A-C-D, A-C-D-E) 𝑐𝐷 = 3 (A-C-D-E, B-D-E, C-D-E)
  • 21.  Closeness centrality: ▪ A node is important if it has small shortest path lengths to all other nodes. ▪ Example: 9/27/2021 21 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E 𝑐𝐴 = 1/(2 + 1 + 2 + 3) = 1/8 (A-C-B, A-C, A-C-D, A-C-D-E) 𝑐𝐷 = 1/(2 + 1 + 1 + 1) = 1/5 (D-C-A, D-B, D-C, D-E)
  • 22.  Measures how connected 𝑣′ 𝑠 neighboring nodes are:  Examples: 9/27/2021 22 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝑣 𝑣 𝑣 𝑒𝑣 = 1 𝑒𝑣 = 0.5 𝑒𝑣 = 0 #(node pairs among 𝑘𝑣 neighboring nodes) In our examples below the denominator is 6 (4 choose 2).
  • 23.  Observation: Clustering coefficient counts the #(triangles) in the ego-network  We can generalize the above by counting #(pre-specified subgraphs,i.e., graphlets). 9/27/2021 23 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝑣 𝑣 3 triangles (out of 6 node triplets) 𝑒𝑣 = 0.5 𝑣 𝑣
  • 24.  Goal: Describe network structure around node 𝑢 ▪ Graphlets are small subgraphs that describe the structure of node 𝑢’s network neighborhood Analogy:  Degree counts #(edges) that a node touches  Clustering coefficient counts #(triangles) that a node touches.  Graphlet Degree Vector (GDV): Graphlet-base features for nodes ▪ GDV counts #(graphlets) that a node touches 9/27/2021 24 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B 𝑢 E
  • 25.  Considering graphlets of size 2-5 nodes we get: ▪ Vector of 73 coordinates is a signature of a node that describes the topology of node's neighborhood  Graphlet degree vector provides a measure of a node’s local network topology: ▪ Comparing vectors of two nodes provides a more detailed measure of local topological similarity than node degrees or clustering coefficient. 9/27/2021 25 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B 𝑢 E
  • 26.  Def: Induced subgraph is another graph, formed from a subset of vertices and all of the edges connecting the vertices in that subset.  Def: Graph Isomorphism ▪ Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic. 9/27/2021 26 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu Isomorphic Node mapping: (e2,c2), (e1, c5), (e3,c4), (e5,c3), (e4,c1) Non-Isomorphic The right graph has cycles of length 3 but he left graph does not, so the graphs cannot be isomorphic. C A B 𝑢 E C B 𝑢 Induced subgraph: C B 𝑢 Not induced subgraph: Source: Mathoverflow
  • 27. Graphlets: Rooted connected induced non-isomorphic subgraphs: 9/27/2021 27 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu Przulj et al., Bioinformatics2004 There are 73 different graphlets on up to 5 nodes Graphlet id (Root/ “position” of node 𝑢) C A B 𝑢 E 𝑢 has graphlets: 0, 1, 2, 3, 5, 10, 11, … Take some nodes and all the edges between them.
  • 28.  Graphlet Degree Vector (GDV): A count vector of graphlets rooted at a given node.  Example: 9/27/2021 28 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu et Degree Vector rphism “orbit” takes into account the s of the graph et degree vector is a feature vector with ncy of the node in each orbit position Pedro Ribeiro Graphlet Degree Vector An automorphism “orbit” takes into account the symmetries of the graph The graphlet degree vector is a feature vector with the frequency of the node in each orbit position aphlet Degree Vector utomorphism “orbit” takes into account the metries of the graph graphlet degree vector is a feature vector with frequency of the node in each orbit position Possible graphlets up to size 3 𝑢 𝑎 𝑏 𝑐 𝑑 GDV of node 𝑢: 𝑎, 𝑏, 𝑐, 𝑑 [2,1,0,2] Graphlet instances of node u:
  • 29.  We have introduced different ways to obtain node features.  They can be categorized as: ▪ Importance-based features: ▪ Node degree ▪ Different node centrality measures ▪ Structure-based features: ▪ Node degree ▪ Clustering coefficient ▪ Graphlet count vector 9/27/2021 29 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 30.  Importance-based features: capture the importance of a node in a graph ▪ Node degree: ▪ Simply counts the number of neighboring nodes ▪ Node centrality: ▪ Models importance of neighboring nodes in a graph ▪ Different modeling choices: eigenvector centrality, betweenness centrality, closeness centrality  Useful for predicting influential nodes in a graph ▪ Example: predicting celebrity users in a social network 9/27/2021 30 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 31.  Structure-based features:Capture topological properties of local neighborhood around a node. ▪ Node degree: ▪ Counts the number of neighboring nodes ▪ Clustering coefficient: ▪ Measures how connected neighboring nodes are ▪ Graphlet degree vector: ▪ Counts the occurrences of different graphlets  Useful for predicting a particular role a node plays in a graph: ▪ Example: Predicting protein functionality in a protein-protein interaction network. 9/27/2021 31 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 32. Differentways to label nodes of the network: 32 Figure 3: Complementary visualizations of Les Misérables co- appearance network generated by node2vec with label colors reflecting homophily (top) and structural equivalence(bottom). also exclude arecent approach, GraRep [6], that generalizes LINE to incorporate information from network neighborhoods beyond 2- hops, but does not scale and hence, provides an unfair comparison withother neural embedding basedfeaturelearning methods. Apart from spectral clustering which has aslightly higher time complex- ity sinceit involvesmatrix factorization, our experiments stand out from prior work in thesensethat all other comparison benchmarks Algorithm Dataset BlogCatalog PPI Wikip Spectral Clustering 0.0405 0.0681 0.039 DeepWalk 0.2110 0.1768 0.127 LINE 0.0784 0.1447 0.116 node2vec 0.2581 0.1791 0.155 node2vec settings (p,q) 0.25, 0.25 4, 1 4, 0. Gain of node2vec [%] 22.3 1.3 21.8 Table 2: Macro-F1 scores for multilabel classification on Catalog, PPI (Homo sapiens) and Wikipedia word coo rencenetworkswith a balanced 50% train-test split. labeled data with a grid search over p, q 2 { 0.25, 0.50, 1, Under the above experimental settings, we present our resu two tasks under consideration. 4.3 Multi-label classification In the multi-label classification setting, every node is ass oneor morelabelsfrom afiniteset L . During thetraining pha observe a certain fraction of nodes and all their labels. The to predict the labels for the remaining nodes. This is a challe task especially if L is large. We perform multi-label classifi on thefollowing datasets: • BlogCatalog [44]: This is a network of social relation of the bloggers listed on the BlogCatalog website. T bels represent blogger interests inferred through the dataprovidedby thebloggers. Thenetwork has10,312 333,983 edges and 39 different labels. • Protein-Protein Interactions (PPI) [5]: We use a sub of the PPI network for Homo Sapiens. The subgrap responds to the graph induced by nodes for which we obtain labels from the hallmark gene sets [21] and rep Node features defined so far would allow to distinguish nodes in the above example However, the features defines so far would not allow for distinguishing the above node labelling 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 33. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 34.  The task is to predict new links based on the existing links.  At test time, node pairs (with no existing links) are ranked, and top 𝐾 node pairs are predicted.  The key is to design features for a pair of nodes. 9/27/2021 34 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E H F G ? ?
  • 35. Two formulations of the link prediction task:  1) Links missing at random: ▪ Remove a random set of links and then aim to predict them  2) Links over time: ▪ Given 𝐺[𝑡0,𝑡0 ′ ] a graph defined by edges up to time 𝑡0 ′ , output a ranked list L of edges (not in 𝐺[𝑡0,𝑡0 ′ ]) that are predicted to appear in time 𝐺[𝑡1, 𝑡1 ′ ] ▪ Evaluation: ▪ n = |Enew|: # new edges that appear during the test period [𝑡1, 𝑡1 ′ ] ▪ Take top n elements of L and count correct edges 9/27/2021 Jure Leskovec, Stanford CS224W:Machine Learning withGraphs, http://cs224w.stanford.edu 35 𝐺[𝑡0,𝑡0 ′ ] 𝐺[𝑡1,𝑡1 ′]
  • 36.  Methodology: ▪ For each pair of nodes (x,y) compute score c(x,y) ▪ For example, c(x,y) could be the # of common neighbors of x and y ▪ Sort pairs (x,y) by the decreasing score c(x,y) ▪ Predict top n pairs as new links ▪ See which of these links actually appear in 𝐺[𝑡1, 𝑡1 ′ ] 9/27/2021 Jure Leskovec, Stanford CS224W:Machine Learning withGraphs, http://cs224w.stanford.edu 36 X
  • 37.  Distance-based feature  Local neighborhoodoverlap  Global neighborhoodoverlap 9/27/2021 37 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E H F G Link feature
  • 38. Shortest-path distance between two nodes  Example:  However, this does not capture the degree of neighborhood overlap: ▪ Node pair (B, H) has 2 shared neighboring nodes, while pairs (B, E) and (A, B) only have 1 such node. 9/27/2021 38 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝑆𝐵𝐻 = 𝑆𝐵𝐸 = 𝑆𝐴𝐵 = 2 C A B D E H F G 𝑆𝐵𝐺 = 𝑆𝐵𝐹 = 3
  • 39. Captures # neighboring nodes shared between two nodes 𝒗𝟏 and 𝒗𝟐:  Common neighbors: |𝑁 𝑣1 ∩ 𝑁 𝑣2 | ▪ Example: 𝑁 𝐴 ∩ 𝑁 𝐵 = 𝐶 = 1  Jaccard’s coefficient: |𝑁 𝑣1 ∩𝑁 𝑣2 | |𝑁 𝑣1 ∪𝑁 𝑣2 | ▪ Example: 𝑁 𝐴 ∩𝑁 𝐵 𝑁 𝐴 ∪𝑁 𝐵 = {𝐶} {𝐶,𝐷} = 1 2  Adamic-Adar index: σ𝑢∈𝑁 𝑣1 ∩𝑁 𝑣2 1 log(𝑘𝑢) ▪ Example: 1 log(𝑘𝐶) = 1 log 4 9/27/2021 39 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E F 𝑁𝐴 𝑁𝐵
  • 40.  Limitation of local neighborhood features: ▪ Metric is always zero if the two nodes do not have any neighbors in common. ▪ However, the two nodes may still potentially be connected in the future.  Global neighborhood overlap metrics resolve the limitation by considering the entire graph. 9/27/2021 40 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E F 𝑁𝐴 𝑁𝐸 𝑁𝐴 ∩ 𝑁𝐸 = 𝜙 |𝑁𝐴 ∩ 𝑁𝐸| = 0
  • 41.  Katz index: count the number of walksof all lengths between a given pair of nodes.  Q: How to compute #walks between two nodes?  Use powers of the graph adjacency matrix! 9/27/2021 41 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 42.  Computing #walks between two nodes ▪ Recall: 𝑨𝑢𝑣 = 1 if 𝑢 ∈ 𝑁(𝑣) ▪ Let 𝑷𝒖𝒗 (𝑲) = #walks of length 𝑲 between 𝒖 and 𝒗 ▪ We will show 𝑷(𝑲) = 𝑨𝒌 ▪ 𝑷𝒖𝒗 (𝟏) = #walks of length 1 (direct neighborhood) between 𝑢 and 𝑣 = 𝑨𝒖𝒗 9/27/2021 42 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 1 4 3 2 𝑷𝟏𝟐 (𝟏) = 𝑨𝟏𝟐
  • 43.  How to compute ? ▪ Step 1: Compute #walks of length 1 between each of 𝒖’s neighbor and 𝒗 ▪ Step 2: Sum up these #walks across u’s neighbors ▪ 𝑷𝒖𝒗 (𝟐) = σ𝒊 𝑨𝒖𝒊 ∗ 𝑷𝒊𝒗 (𝟏) = σ𝒊 𝑨𝒖𝒊 ∗ 𝑨𝒊𝒗 = 𝑨𝒖𝒗 𝟐 9/27/2021 43 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu Node 1’s neighbors #walks of length 1 between Node 1’s neighbors and Node 2 𝑷𝟏𝟐 (𝟐) = 𝑨12 2 Power of adjacency
  • 44.  Katz index: count the number of walksof all lengths between a pair of nodes.  How to compute #walks between two nodes?  Use adjacency matrix powers! ▪ 𝑨𝑢𝑣 specifies #walks of length 1 (direct neighborhood) between 𝑢 and 𝑣. ▪ 𝑨𝑢𝑣 𝟐 specifies #walks of length 2 (neighbor of neighbor) between 𝑢 and 𝑣. ▪ And, 𝑨𝑢𝑣 𝒍 specifies #walks of length 𝒍. 9/27/2021 44 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 45.  Katz index between 𝑣1 and 𝑣2 is calculated as  Katz index matrix is computed in closed-form: 9/27/2021 45 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 0 < 𝛽 < 1: discount factor = σ𝑖=0 ∞ 𝛽𝑖 𝑨𝑖 by geometric series of matrices #walks of length 𝑙 between 𝑣1 and 𝑣2 Sum over all walk lengths
  • 46.  Distance-basedfeatures: ▪ Uses the shortest path length between two nodes but does not capture how neighborhood overlaps.  Local neighborhood overlap: ▪ Captures how many neighboring nodes are shared by two nodes. ▪ Becomes zero when no neighbor nodes are shared.  Global neighborhood overlap: ▪ Uses global graph structure to score two nodes. ▪ Katz index counts #walks of all lengths between two nodes. 9/27/2021 46 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 47. CS224W: Machine Learning with Graphs Jure Leskovec, Stanford University http://cs224w.stanford.edu
  • 48.  Goal: We want features that characterize the structure of an entire graph.  For example: 9/27/2021 48 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu C A B D E H F G
  • 49.  Kernel methods are widely-used for traditional ML for graph-level prediction.  Idea: Design kernels instead of feature vectors.  A quick introduction to Kernels: ▪ Kernel 𝐾 𝐺, 𝐺′ ∈ ℝ measures similarity b/w data ▪ Kernel matrix 𝑲 = 𝐾 𝐺, 𝐺′ 𝐺,𝐺′ must always be positive semidefinite (i.e., has positive eigenvalues) ▪ There exists a feature representation 𝜙(∙) such that 𝐾 𝐺, 𝐺′ = 𝜙 G T𝜙 𝐺′ ▪ Once the kernel is defined, off-the-shelf ML model, such as kernel SVM, can be used to make predictions. 9/27/2021 49 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 50.  Graph Kernels: Measure similarity between two graphs: ▪ Graphlet Kernel [1] ▪ Weisfeiler-Lehman Kernel [2] ▪ Other kernels are also proposed in the literature (beyond the scope of this lecture) ▪ Random-walk kernel ▪ Shortest-pathgraph kernel ▪ And many more… 9/27/2021 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 50 [1] Shervashidze, Nino, et al. "Efficient graphlet kernels for large graph comparison." Artificial Intelligence and Statistics. 2009. [2] Shervashidze, Nino, et al. "Weisfeiler-lehman graph kernels." Journal of Machine Learning Research 12.9 (2011).
  • 51.  Goal: Design graph feature vector 𝜙 𝐺  Key idea: Bag-of-Words (BoW) for a graph ▪ Recall: BoW simply uses the word counts as features for documents (no ordering considered). ▪ Naïve extension to a graph: Regard nodes as words. ▪ Since both graphs have 4 red nodes, we get the same feature vector for two different graphs… 9/27/2021 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 51 𝜙( ) = 𝜙
  • 52. What if we use Bag of node degrees? Deg1: Deg2: Deg3:  Both Graphlet Kernel and Weisfeiler-Lehman (WL) Kernel use Bag-of-* representation of graph, where * is more sophisticatedthan node degrees! 9/27/2021 52 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 𝜙( ) = count( ) = [1, 2, 1] 𝜙( ) = count( ) = [0, 2, 2] Obtains different features for different graphs!
  • 53.  Key idea: Count the number of different graphlets in a graph. ▪ Note: Definition of graphlets here is slightly different from node-level features. ▪ The two differences are: ▪ Nodes in graphlets here do not need to be connected (allows for isolated nodes) ▪ The graphlets here are not rooted. ▪ Examples in the next slide illustrate this. 9/27/2021 53 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 54. Let 𝓖𝒌 = (𝒈𝟏, 𝒈𝟐,… , 𝒈𝒏𝒌 ) be a list of graphlets of size 𝒌. ▪ For 𝑘 = 3 , there are 4 graphlets. ▪ For 𝑘 = 4 , there are 11 graphlets. 9/27/2021 54 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝑔1 𝑔2 𝑔3 𝑔4 Shervashidzeet al., AISTATS 2011
  • 55.  Given graph 𝐺, and a graphlet list 𝒢𝑘 = (𝑔1, 𝑔2,… , 𝑔𝑛𝑘 ), define the graphlet count vector 𝒇𝐺 ∈ ℝ𝑛𝑘 as (𝒇𝐺)𝑖= #(𝑔𝑖 ⊆ 𝐺) for 𝑖 = 1,2,… , 𝑛𝑘. 9/27/2021 55 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 56.  Example for 𝑘 = 3. 9/27/2021 56 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝑔1 𝑔2 𝑔3 𝑔4 𝐺 𝒇𝐺 = (1, 3, 6, 0)T
  • 57.  Given two graphs, 𝐺 and 𝐺′ , graphlet kernel is computed as 𝐾 𝐺, 𝐺′ = 𝒇𝐺 T 𝒇𝐺′  Problem: if 𝐺 and 𝐺′ have different sizes, that will greatly skew the value.  Solution: normalize each feature vector 9/27/2021 57 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu 𝐾 𝐺, 𝐺′ = 𝒉𝐺 T 𝒉𝐺′
  • 58. Limitations: Counting graphlets is expensive!  Counting size-𝑘 graphlets for a graph with size 𝑛 by enumeration takes 𝑛𝑘 .  This is unavoidable in the worst-casesince subgraph isomorphism test (judging whether a graph is a subgraph of another graph) is NP-hard.  If a graph’s node degree is bounded by 𝑑, an 𝑂(𝑛𝑑𝑘−1 ) algorithm exists to count all the graphlets of size 𝑘. Can we design a more efficient graph kernel? 9/27/2021 58 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 59.  Goal: Design an efficient graph feature descriptor 𝜙 𝐺  Idea: Use neighborhoodstructure to iteratively enrich node vocabulary. ▪ Generalized version of Bag of node degrees since node degrees are one-hop neighborhood information.  Algorithm to achieve this: Color refinement 9/27/2021 59 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 60.  Given: A graph 𝐺 with a set of nodes 𝑉. ▪ Assign an initial color 𝑐 0 𝑣 to each node 𝑣. ▪ Iteratively refine node colors by 𝑐 𝑘+1 𝑣 = HASH 𝑐 𝑘 𝑣 , 𝑐 𝑘 𝑢 𝑢∈𝑁 𝑣 , where HASH maps different inputs to different colors. ▪ After 𝐾 steps of color refinement, 𝑐 𝐾 𝑣 summarizes the structure of 𝐾-hop neighborhood 9/27/2021 60 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
  • 61. Example of color refinementgiven two graphs ▪ Assign initial colors ▪ Aggregate neighboring colors 9/27/2021 61 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 1 1 1 1 1 1 1 1 1 1 1 1 1,111 1,11 1,1111 1,1 1,1 1,111 1,11 1,111 1,1111 1,1 1,1 1,111 𝐺1 𝐺2
  • 62. Example of color refinementgiven two graphs ▪ Aggregated colors ▪ Hash aggregated colors 9/27/2021 62 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 4 3 5 2 2 4 3 4 5 2 2 4 Hash table 1,1 1,11 1,111 1,1111 --> --> --> --> 2 3 4 5 1,111 1,11 1,1111 1,1 1,1 1,111 1,11 1,111 1,1111 1,1 1,1 1,111
  • 63. Example of color refinementgiven two graphs ▪ Aggregated colors ▪ Hash aggregated colors 9/27/2021 63 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 4,345 3,44 5,2244 2,5 2,5 4,345 3,45 4,345 5,2344 2,5 2,4 4,245 4 3 5 2 2 4 3 4 5 2 2 4
  • 64. Example of color refinementgiven two graphs ▪ Aggregated colors ▪ Hash aggregated colors 9/27/2021 64 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 11 8 12 7 7 11 9 11 13 7 6 10 Hash table 2,4 2,5 3,44 3,45 4,245 4,345 5,2244 5,2344 --> --> --> --> --> --> --> --> 6 7 8 9 10 11 12 13 4,345 3,44 5,2244 2,5 2,5 4,345 3,45 4,345 5,2344 2,5 2,4 4,245
  • 65. After color refinement,WL kernel counts number of nodes with a given color. 9/27/2021 65 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu = [6,2,1,2,1,0,2,1,0, 0, 0, 2, 1] Counts Colors 1,2,3,4,5,6,7,8,9,10,11,12,13 = [6,2,1,2,1,1,1,0,1, 1, 1, 0, 1] 1,2,3,4,5,6,7,8,9,10,11,12,13
  • 66. The WL kernel value is computed by the inner product of the color count vectors: K( , ) = = 49 9/27/2021 66 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu
  • 67.  WL kernel is computationally efficient ▪ The time complexity for color refinement at each step is linear in #(edges), since it involves aggregating neighboring colors.  When computing a kernel value, only colors appeared in the two graphs need to be tracked. ▪ Thus, #(colors) is at most the total number of nodes.  Counting colors takes linear-time w.r.t. #(nodes).  In total, time complexity is linear in #(edges). 9/27/2021 67 Jure Leskovec, Stanford CS224W: Machine Learning withGraphs, http://cs224w.stanford.edu
  • 68.  Graphlet Kernel ▪ Graph is represented as Bag-of-graphlets ▪ Computationally expensive  Weisfeiler-LehmanKernel ▪ Apply 𝐾-step color refinement algorithm to enrich node colors ▪ Different colors capture different 𝐾-hop neighborhood structures ▪ Graph is represented as Bag-of-colors ▪ Computationally efficient ▪ Closely related to Graph Neural Networks (as we will see!) 9/27/2021 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu 68
  • 69.  Traditional ML Pipeline ▪ Hand-crafted feature + ML model  Hand-crafted features for graph data ▪ Node-level: ▪ Node degree, centrality, clustering coefficient, graphlets ▪ Link-level: ▪ Distance-based feature ▪ local/global neighborhood overlap ▪ Graph-level: ▪ Graphlet kernel, WL kernel 9/27/2021 69 Jure Leskovec,Stanford CS224W:Machine Learning with Graphs,http://cs224w.stanford.edu