More Related Content Similar to icdm06-tong.ppt Similar to icdm06-tong.ppt (20) icdm06-tong.ppt1. Fast Random Walk with
Restart and Its Applications
Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan
ICDM 2006 Dec. 18-22, HongKong
2. 2
Motivating Questions
• Q: How to measure the relevance?
• A: Random walk with restart
• Q: How to do it efficiently?
• A: This talk tries to answer!
4. 4
Random walk with restart
Node 4
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
1
4
3
2
5
6
7
9
10
8
11
12
0.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector
More red, more relevant
Nearby nodes, higher scores
4
r
7. 7
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region
{Grass, Forest, Cat, Tiger}
11. 11
CePS: Example
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V.
Jagadish
Laks V.S.
Lakshmanan
Heikki
Mannila
Christos
Faloutsos
Padhraic
Smyth
Corinna
Cortes
15 10
13
1 1
6
1 1
4 Daryl
Pregibon
10
2
1
1
3
1
6
12. 12
Other Applications
• Content-based Image Retrieval [He]
• Personalized PageRank [Jeh], [Widom],
[Haveliwala]
• Anomaly Detection (for node; link) [Sun]
• Link Prediction [Getoor], [Jensen]
• Semi-supervised Learning [Zhu], [Zhou]
• …
13. 13
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
14. 14
Computing RWR
1
4
3
2
5 6
7
9 10
8
11
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.05
0.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 0
0.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1
n x 1
Ranking vector Starting vector
Adjacent matrix
1
(1 )
i i i
r cWr c e
Restart p
16. 16
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
0
0.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
?
?
17. 17
1
4
3
2
5 6
7
9 10
8
11
12
0.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
OntheFly:
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
0
0.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
0
0
0
1
0
0
0
0
0
0
0
0
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
1
4
3
2
5 6
7
9 10
8
11
12
0.3
0
0.3
0.1
0.3
0
0
0
0
0
0
0
0.12
0.18
0.12
0.35
0.03
0.07
0.07
0.07
0
0
0
0
0.19
0.09
0.19
0.18
0.18
0.04
0.04
0.06
0.02
0
0.02
0
0.14
0.13
0.14
0.26
0.10
0.06
0.06
0.08
0.01
0.01
0.01
0
0.16
0.10
0.16
0.21
0.15
0.05
0.05
0.07
0.02
0.01
0.02
0.01
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
No pre-computation/ light storage
Slow on-line response O(mE)
i
r i
r
18. 18
0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
0.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
0.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
0.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
4
PreCompute
1 2 3 4 5 6 7 8 9 10 11 12
r r r r r r r r r r r r
1
4
3
2
5 6
7
9 10
8
11
12
0.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
1
3
2
5 6
7
9 10
8
11
12
[Haveliwala]
R:
19. 19
2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
PreCompute:
1
4
3
2
5 6
7
9 10
8
11
12
0.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
1
4
3
2
5 6
7
9 10
8
11
12
Fast on-line response
Heavy pre-computation/storage cost
O(n ) O(n )
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
3 2
21. 21
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
22. 22
Basic Idea
1
4
3
2
5 6
7
9 10
8
11
12
0.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
1
4
3
2
5 6
7
9 10
8
11
12
Find Community
Fix the remaining
Combine
1
4
3
2
5 6
7
9 10
8
11
12
1
4
3
2
5 6
7
9 10
8
11
12
5 6
7
9 10
8
11
12
1
4
3
2
5 6
7
9 10
8
11
12
1
4
3
2
5 6
7
9 10
8
11
12
1
4
3
2
24. 24
• Q: Efficiently recover one column of Q
• A: A few, instead of MANY, matrix-vector multiplication
On-Line Query Stage
+
0
0
0
0
0
0
1
0
0
0
0
0
-1
i
e i
r
25. 25
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
26. 26
Pre-compute Stage
• p1: B_Lin Decomposition
– P1.1 partition
– P1.2 low-rank approximation
• p2: Q matrices
– P2.1 computing (for each partition)
– P2.2 computing (for concept space)
1
1
Q
32. 32
Comparing and
• Computing Time
– 100,000 nodes; 100 partitions
– Computing 100,00x is Faster!
• Storage Cost
– 100x saving!
Q1,1
Q1,2
Q1,k
1
1
Q
=
1
1
Q
1
1
Q
33. 33
• Q: How to fix the green portions?
W +
~
~
~
1
1
Q
+ ?
36. 36
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
40. 40
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
41. 41
Experimental Setup
• Dataset
– DBLP/authorship
– Author-Paper
– 315k nodes
– 1,800k edges
• Approx. Quality: Relative Accuracy
• Application: Center-Piece Subgraph
42. 42
Query Time vs. Pre-Compute Time
Log Query Time
Log Pre-compute Time
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-computation:
•Two orders saving
43. 43
Query Time vs. Pre-Storage
Log Query Time
Log Storage
•Quality: 90%+
•On-line:
•Up to 150x speedup
•Pre-storage:
•Three orders saving
44. 44
Roadmap
• Background
– RWR: Definitions
– RWR: Algorithms
• Basic Idea
• FastRWR
– Pre-Compute Stage
– On-Line Stage
• Experimental Results
• Conclusion
45. 45
Conclusion
• FastRWR
– Reasonable quality preservation (90%+)
– 150x speed-up: query time
– Orders of magnitude saving: pre-compute & storage
• More in the paper
– The variant of FastRWR and theoretic justification
– Implementation details
• normalization, low-rank approximation, sparse
– More experiments
• Other datasets, other applications
Editor's Notes 1 2 3 4 5 6 10 11 12 13 14 15 17 18 19 20 21 23 24 25 26 27 28 29 31 34 35 36 37 38 39 40 41 42 43 44 45 46