KDD17Tutorial_final (1).pdf

Learning Representations of
Large-Scale Networks!
Jian Tang1, Cheng Li2, Qiaozhu Mei2
1HEC Montréal & Montréal Ins=tute of Learning Algorithms (MILA)
2School of Informa=on, University of Michigan
1!

4
- http://nevac.ischool.drexel.edu/~james/infovis09/FP-tree-visual.html
• 
•

5
Coauthor
Network
Information
retrieval
Machine learning Data mining
- Q.Mei, D.Cai, D.Zhang, and C.Zhai, Topic Modeling with Hitting Time, WWW 2008

6
• 
• 
• 
- Graph from Jerry Zhu’s tutorial in ICML 07

7
- from Lada Adamic’s course

12!
!
 
 
 
 
 
 
Text representation, e.g., word and document
representation, …!
…!
Deep learning has been attracting increasing!
attention …!
A future direction of deep learning is to integrate!
unlabeled data …!
The Skip-gram model is quite effective and efficient …!
…!
degree
network
edge
node
word
document
classification
text
embedding
•

13!
….! ….!
….!
….!….!
….!
!
Heatmaps
Network Diagrams
Scatter Plots
….! !

,
 
 
 
 
 
 
 
 
 
16!

,
 
 
 
 
 
 
 
 
 
17!

20!
 
 
 
 
 
O =
1
2
wij (
!
ui
(i, j)∈E
∑ −
!
uj )2
= tr(UT
LU)
U =[
!
u1,
!
u2,",
!
uN ] L = D −W Dii = wij
j
∑
Lu = λDu

21!
 
 
 
(w1,w2, !,wT )
!
vi ∈ Rd

26!
 
 
p1(vi,vj ) =
exp(
!
ui
T !
uj )
exp(
!
um
T !
un )
(m,n)∈V×V
∑
p̂1(vi,vj ) =
wij
wmn
(m,n)∈E
∑
O1 = KL(p̂1, p1) = − wij log p1(vi,vj )
(i, j)∈E
∑
!
ui

28!
 
 
 
 
 
 
 
 
 
∂O2
∂
!
ui
= wij
∂log p̂2 (vj | vi )
∂
!
ui

30!
 
 
p(vj | vi ) =
exp(
!
u'i
T !
uj )
exp(
!
u'k
T !
ui )
k∈V
∑

degree
network
edge
node
word
document
classification
text
embedding
!
…!
Deep learning has been attracting increasing!
attention …!
A future direction of deep learning is to integrate!
unlabeled data …!
Information networks encode the relationships!
between the data objects …!
text
information
network
word
…
classification
doc_1
doc_2
doc_3
doc_4
…
…
40!
•

• 
•  . . ,
• 
• 
46!

…!
Deep learning has been attracting increasing …!
A future direction of deep learning is to integrate …!
Information networks encode the relationships!
label! document!
label
label
null
null
null
degree
network
edge
node
word
document
classification
text
embedding
text
information
network
word
…
classification
doc_1
doc_2
doc_3
doc_4
…
…
text
information
network
word
…
classification
label_2
label_1
label_3
…
…
59!

• 
63!
• 
• 
=
H(0)
= X

….!
….!
….!….!
….!
….!
66!
Heatmaps
Network Diagrams
Scatter Plots

LargeVis!
t-SNE!
Random projection trees !
79!

p(eij =1) =
1
1+ ||
!
yi −
!
yj ||2
p(eij = wij ) = p(eij =1)
wij
80!

O = p(eij = wij )
(i, j)∈E
∏ (1− p(eij = wij )
(i, j)∈E
∏ )γ
81!

LargeVis! t-SNE (optimal parameters)!
t-SNE (default parameters)!
≈
83!

h"ps://github.com/tangjianpku/LINE
h"ps://github.com/lferry007/LargeVis
h"ps://github.com/elbamos/largeVis

h"ps://jlorince.github.io/viz-tutorial/
h"ps://github.com/NLeSC/DiVE
:
…!
94!

!
 
 
 
 
 
 
 
 
 
 
96!

!
 
 
 
 
 
 
 
 
97!

!
 
 
 
 
 
 
 
 
98!

!
 
 
 
 
 
 
 
 
 
100!

!
 
 
 
 
 
Efficient graphlet kernels for large graph comparison
F1 F2 F3 F4 F5 F6
F7 F8 F9 F10 F11
Figure 2: All graphlets of size 4
We now consider size 4 graphlets.
Modulo isomorphism there are 11 graphlets of size 4
(see Figure 2). Let us denote these graphlets Fi and
their counts |Fi|, i 2 1, 2, . . . , 11. As in the previous
case, we will first count all graphlets which contain at
least one edge.
Assume we want to count subgraphs containing edge
(v1, v2). As before, for v2 there are |N(v1)| choices
and for each pair (v1, v2) we have 4 cases for the third
node v3: v3 2 N(v1) N(v2), v3 2 N(v1) N(v2),
dataset size classes
MUTAG 188 2 (125 vs.
PTC 344 2 (192 vs.
Enzyme 600 6 (100 each
D & D 1178 2 (691 vs.
Table 1: Statistics on cl
these graphlets by 2.
5 Experiments
In this section, we evaluate th
nel and compare it with stat
in terms of runtime, scalabi
racy. Our baseline compara
dom walk kernel of (Gärtne
et al., 2004; Vishwanathan e
common walks in two graph
kernel of (Borgwardt & Krie
shortest path lengths in two
101!

!
 
 
 
 
 
 
 
 
103!

!
 
 
 
 
 
 
 
104!

!
! ! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
105!

!
!
106!
!
!
!
!
!
!
!
…!
…!

!
 
 
 
 
 
 
 
108!

!
 
 
 
 
 
 
 
 
 
109!

!
Convolutional architecture
can be applied!
110!

Low-level
representation of
network!
#1)
conv.)
#2)
conv.)
dense)
layers)
bins&
*mes&
bins&
*me&
#1)
conv.)
#2)
conv.)
#1)
conv.)
#2)
conv.)
#1)
conv.)
#2)
conv.)
convolu*on&layers&over&bins&
convolu*on&layers&over&*mes&
output&unit&
input&
input&
HKS&
transpose&
Graph&Structure&
High-level
representation
of network!
#1)
bins&
*mes&
#1)
conv.)
#2)
conv.)
c
c
convolu*on&layers&over&bins&
input&
HKS&
transpose&
Graph&Structure&
111!

!
 
 
 
 
 
 
 
 
112!

!
 
 
  ó
  ó
 
 
 
 
 
113!

!
 
 
  ó
  ó
  ó
  ó
How to assemble by
end-to-end learning?!
We can adapt deep
learning methods
developed for text!
114!

!
GRU
GRU
GRU
GRU
GRU
GRU
B
C
F
ℎ
#
ℎ
$
ℎ
%
x
3
x
2
x
4
dense
layers
Attention
Size
Increment
(c)
(d)
equence
Input
Output
'
sequence
&
nodes
Output!
Dense
layers!
A B C F
D F + +
A
A
A
A
D F + +
GRU
GRU
GRU
GRU
GRU
GRU
GRU
GRU
A B C F
ℎ" ℎ# ℎ$ ℎ%
x
1
x
3
x
2
x
4
Sample
Attentio
(b) (c) (d)
& nodes
'
sequence
Sequence Input
Node
Embedding
Bi-directional
GRU
Output
'
sequence
& nodes
A B C F
D F + +
A
A
A
A
D F + +
C
D
F
A
GRU
GRU
GRU
GRU
GRU
GRU
GRU
GRU
A B C F
ℎ" ℎ# ℎ$ ℎ%
x
1
x
3
x
2
x
4
Sample
Attention
(a) (b) (c) (d)
& nodes
'
sequence
Sequence Input
Node
Embedding
Bi-directional
GRU
Output
'
sequence
& nodes
A B C F
D F + +
A
A
A
A
D F + +
GRU
GRU
GRU
GRU
GRU
GRU
GRU
GRU
A B C F
ℎ" ℎ# ℎ$ ℎ%
x
1
x
3
x
2
x
4
Sample
Attentio
(b) (c) (d)
& nodes
'
sequence
Sequence Input
Node
Embedding
Bi-directional
GRU
Output
'
sequence
& nodes
C
D
E
F
B
A
A B C F
D F + +
A
A
A
A
D F + +
𝑇 nodes
𝐾
sequence
sampling
Sample through
random walks!
115!

!
 
 
 
 
 
 
 
 
 
 
116!

!
2#"
1"
a>en?on"
dense"
layers"
output%unit%
λ1"..."λT
A>en?on"over"T"nodes"
A>en?on"
over"K"
sequences"
with"mini;
batch"size"="2"
pgeo
(1;pgeo)pgeo
(1;pgeo)2pgeo
...
1st mini-batch!
2nd mini-batch!
Assume attention
has geometric
distribution.!
We learn pgeo to
learn the actual #seq!
117!

!
 
 
 
 
 
 
 
 
118!

!
 
  p({Hi},{Xi})∝ Φ(Hi, Xi )
i∈V
∏ Ψ(Hi, Hj )
(i, j)∈E
∏
 
 
119!
!
!
!
!
!
! !
!
!
!
!
!
! Graph label!
A hidden
variable!
A variable
for node
attribute!

!
 
 
 
 
 
 
 
 
121!

!
 
 
 
 
 
 
 
122!

KDD17Tutorial_final (1).pdf

Recommended

Recommended

More Related Content

Similar to KDD17Tutorial_final (1).pdf

Similar to KDD17Tutorial_final (1).pdf (20)

Recently uploaded

Recently uploaded (20)

KDD17Tutorial_final (1).pdf