발표자: 신기정(CMU 박사 과정)
발표일: 2018.1.
웹, 소셜 미디어, 리뷰 데이터 등 어디서든 쉽게 그래프 데이터를 찾아볼 수 있습니다. 그중 많은 데이터는 테라바이트 혹은 그 이상의 대용량이고, 계속 변화합니다. 또한, 많은 부가 정보를 포함하고 있어 텐서, 즉 다차원 행렬의 형태로 표현됩니다.
본 발표에서는 이러한 대용량의 동적인 그래프 및 텐서 데이터의 구조를 분석하고 이상점을 탐지하기 위한 알고리즘들을 소개합니다. 특히, 그래프 내의 삼각형의 개수를 정확하게 예측하는 기법과 비이상적으로 밀집된 지점을 탐색하는 기법을 중점적으로 소개할 예정입니다.
소개될 알고리즘들은 샘플링과 근사 기법을 활용하며 분산 환경과 스트림 환경에 적합하게 설계되었습니다. 또한, 소셜 미디어의 가짜 영향력자 탐색, 가짜 리뷰 탐색, 네트워크 공격 탐색 등에 적용될 수 있습니다.
4. Properties of Real-world Graphs
• Large: many nodes, more edges
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 4/87
2B+ users
0.5B+ products
0.5B+ users
0.3B+ buyers
• Dynamic: additions/deletions of nodes and edges
5. Properties of Real-world Graphs
• Rich with Attributes: timestamps, scores, text, etc.
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 5/87
…
…
6. Matrices for Graphs
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 6/87
0 0
1 0
1 1
0
0
Graph Adjacency Matrix
7. Tensors for Rich Graphs
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 7/87
1
0
0
0
0
• Tensors: multi-dimensional array
3-order tensor
(3-dimensional array)
+ Stars
(4-order tensor)
+ Text
(5-order tensor)
…
0
8. Research Goals
•Sub-Tasks:
◦ Structure Analysis
◦ Anomaly Detection
◦ Behavior Modeling
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 8/87
To Understand
Large Dynamic Graphs and Tensors
on User Behavior
9. Research Goals
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 9/87
Structure
Anomaly
& Fraud
User
BehaviorContrast
10. Overview of My Research
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 10/87
Structure
Analysis
Anomaly
Detection
Behavior
Modeling
Graphs
Triangle Count
[ICDM17]
Bipartite Graph
[KDD16][TKDD17] Purchase
[IJCAI17]Degeneracy Unipartite Graph
[ICDM16][KAIS18]
Tensors
PARAFAC
[ICDM14][TKDE17]
Tucker
[WSDM17]
Static Tensor
[PKDD16][TKDD18]
[WSDM17][PKDD17]
Dynamic Tensor
[KDD17]
Progression
[WWW18]
Other work: Similar Node Search [SIGMOD15][TODS16]
11. Roadmap
• Overview
• #1: Counting Triangles in Graph Stream <<
• #2: Detecting Dense Subtensors
• Conclusion
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 11/87
12. WRS: Waiting Room Sampling
for Accurate Triangle Counting
in Real Graph Streams
Kijung Shin
13. Triangles in a Graph
• Graphs are everywhere!
◦ Social Network, Web, Emails, etc.
• Triangles are a fundamental primitive
◦ 3 nodes connected to each other
• Counting triangles has many applications
◦ Community detection, Spam detection, Query optimization
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 13/87
Introduction Experiments ConclusionAlgorithmPattern
14. Challenges in Real Graphs
• Many algorithms for small and fixed graphs
• However, real-world graphs are
◦ Large: may not fit in main memory
◦ Growing: new nodes and edges are added
• Need to consider realistic settings
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 14/87
Introduction Experiments ConclusionAlgorithmPattern
time 𝑡 time 𝑡+2 time 𝑡+6
15. Streaming Model
• Stream of edges
◦ edges are streamed one by one from sources
• Any or adversarial order
◦ edges can be in any order in the stream
• Limited memory
◦ not every edge can be stored in memory
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 15/87
Introduction Experiments ConclusionAlgorithmPattern
too strong
Source Destination
16. Relaxed Streaming Model
• Stream of edges
◦ edges are streamed one by one from sources
• Chronological order
◦ natural for dynamic graphs
◦ edges are streamed when they are created
• Limited memory
◦ not every edge can be stored in memory
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 16/87
“What temporal patterns do exist?”
“How can we exploit them for accurate triangle
counting?”
Introduction Experiments ConclusionAlgorithmPattern
18. Time Interval of a Triangle
• Time interval of a triangle:
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 18/87
Introduction Experiments ConclusionAlgorithmPattern
–
arrival order
of its last edge
arrival order
of its first edge
arrival order
1 2 3 4 5 6 7 8
Time interval
19. Time Interval Distribution
• Temporal Locality:
◦ average time interval is
◦ 2X shorter in the chronological order
◦ than in a random order
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 19/87
random arrival order
chronological arrival order
random order
chronological
order
20. Temporal Locality (cont.)
• One interpretation:
◦ edges are more likely to form
◦ triangles with edges close in time
◦ than with edges far in time
• Another interpretation:
◦ new edges are more likely to form
◦ triangles with recent edges
◦ than with old edges
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 20/87
Introduction Experiments ConclusionAlgorithmPattern
“How can we exploit temporal locality
for accurate triangle counting?”
chronological
order
random
order
22. Problem Definition
• Given:
◦ a graph stream in the chronological order
◦ memory budget: 𝑘 (i.e., up to 𝑘 edges can be stored)
• Estimate: counts of global and local triangles
• To Minimize: estimation error
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 22/87
Introduction Experiments ConclusionAlgorithmPattern
• Global triangles: all triangles in
the graph
• Local triangles: the triangles
incident to each node
3
2
1 2
3
4
1
3
2
1
23. Algorithm Overview
• General approach used in [LK15] [DERU16]
◦ ∆: our estimate of global triangle counts
◦ 𝑝 𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 23/87
Introduction Experiments ConclusionAlgorithmPattern
24. Algorithm Overview
• General approach used in [LK15] [DERU16]
◦ ∆: our estimate of global triangle counts
◦ 𝑝 𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 24/87
Introduction Experiments ConclusionAlgorithmPattern
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
𝑢 − 𝑣
memory
new edge
(1) Edge Arrival
25. Algorithm Overview
• General approach used in [LK15] [DERU16]
◦ ∆: our estimate of global triangle counts
◦ 𝑝 𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 25/87
Introduction Experiments ConclusionAlgorithmPattern
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
𝑢 − 𝑣 𝑢 − 𝑣
𝑥
∆← ∆ + 1/𝑝 𝑢𝑣𝑥
discover!
memory
(1) Edge Arrival (2) Counting Step
𝑢 − 𝑣new edge
26. Algorithm Overview
• General approach used in [LK15] [DERU16]
◦ ∆: our estimate of global triangle counts
◦ 𝑝 𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 26/87
Introduction Experiments ConclusionAlgorithmPattern
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
𝑢 − 𝑣 𝑢 − 𝑣
𝑦
∆← ∆ + 1/𝑝 𝑢𝑣𝑦
discover!
(2) Counting Step
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
memory
(1) Edge Arrival
𝑢 − 𝑣new edge
27. Algorithm Overview
• General approach used in [LK15] [DERU16]
◦ ∆: our estimate of global triangle counts
◦ 𝑝 𝑢𝑣𝑤: probability that triangle (𝑢, 𝑣, 𝑤) is discovered
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 27/87
Introduction Experiments ConclusionAlgorithmPattern
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
𝑢 − 𝑣 𝑢 − 𝑣
𝑦
∆← ∆ + 1/𝑝 𝑢𝑣𝑦
𝑢
|
𝑥
𝑢
|
𝑣
𝑣
|
𝑥
𝑣
|
𝑣
(2) Counting Step
𝑢
|
𝑥
𝑢
|
𝑦
𝑣
|
𝑥
𝑣
|
𝑣
memory
(1) Edge Arrival (3) Sampling Step
(to be explained)
𝑢 − 𝑣new edge
28. Bias and Variance Analyses
• Bias and variance analyses results
Theorem. Unbiasedness of our estimate:
Theorem. Variance of our estimate:
• should increase discovering probability 𝑝 𝑢𝑣𝑤
◦ which depends on sampling algorithms
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 28/87
Bias[∆] = E ∆ −True count = 0
Var ∆ ≈ σ(𝑢,𝑣,𝑤) (1/𝑝 𝑢𝑣𝑤 − 1)
Introduction Experiments ConclusionAlgorithmPattern
29. Increasing Discovering Prob.
• Recall Temporal Locality:
◦ new edges are more likely to form
◦ triangles with recent edges
◦ than with old edges
• Waiting-Room Sampling (WRS)
◦ exploits temporal locality
◦ by treating recent edges better than old edges
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 29/87
Introduction Experiments ConclusionAlgorithmPattern
“How can we increase
discovering probabilities of triangles?”
30. Waiting-Room Sampling (WRS)
• Divides memory space into two parts
◦ Waiting Room: stores latest edges
◦ Reservoir: store samples from the remaining edges
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 30/87
𝑒79 𝑒78 𝑒77 𝑒76
Waiting Room (FIFO) Reservoir (Random Replace)
𝛼% of budget 100 − 𝛼 % of budget
𝑒80New edge
Introduction Experiments ConclusionAlgorithmPattern
𝑒61 𝑒7 𝑒18 𝑒25 𝑒40 𝑒1 𝑒28
35. Experimental Settings
• Competitors: Mascot [YK15], Triest-IMPR [DERU16]
• Datasets:
• Memory budget 𝑘: 10% of the edges
• Size of waiting-room: 10% of memory budget
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 35/87
Introduction Experiments ConclusionAlgorithmPattern
citation email friendship
36. Distribution of Estimates
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 36/87
• Waiting Room Sampling (WRS) gives
◦ unbiased estimates
◦ with smallest variance
Introduction Experiments ConclusionAlgorithmPattern
WRS
Triest-IMPR
MASCOT
True Count
37. Discovering Probability
• WRS increases discovering probability 𝑝 𝑢𝑣𝑤
• WRS discovers up to 3 × more triangles
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 37/87
Introduction Experiments ConclusionAlgorithmPattern
WRS
Triest-IMPR
MASCOT
38. Estimation Errors
• WRS is most accurate
• WRS reduces estimation error up to 𝟒𝟕%
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 38/87
Introduction Experiments ConclusionAlgorithmPattern
44. Motivation: Review Fraud
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 44/87
Bob’s
Carol’s
Alice’s
Alice
Introduction Experiments ConclusionProposed Method
45. Fraud Forms Dense Blocks
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 45/87
Restaurants
AccountsRestaurants Accounts
Adjacency Matrix
Introduction Experiments ConclusionProposed Method
46. Problem: Natural Dense Blocks
• Question. How can we
distinguish them?
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 46/87
Restaurants
Accounts
Adjacency Matrix
suspicious dense block
formed by fraudsters
natural dense block
(core, community, etc.)
Introduction Experiments ConclusionProposed Method
47. Solution: Tensor Modeling
• Natural dense blocks are
sparse on the time axis
(formed gradually)
• Suspicious dense blocks are
also dense on the time axis
(due to synchronous
behavior)
• Suspicious dense blocks are
denser than natural dense
blocks in the tensor model
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 47/87
Restaurants
Accounts
Introduction Experiments ConclusionProposed Method
48. Solution: Tensor Modeling (Cont.)
• Any side information can be used instead of/in
addition to time in review datasets
• Using multiple side information
⟹ high-order tensors
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 48/87
IP Address keywords Number of stars
Introduction Experiments ConclusionProposed Method
49. Anomaly/Fraud in Tensors
• Dense blocks signal anomalies/fraud in many
tensor datasets
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 49/87
Src IP
DstIP
Src User
DstUser
User
Page
TCP Dumps
Wikipedia
Revision History
Time-evolving
Social Network
Introduction Experiments ConclusionProposed Method
50. Research Question
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 50/87
“Given a large-scale high-order tensor,
how can we find dense subtensors in it?”
Introduction Experiments ConclusionProposed Method
51. Road Map
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 51/87
• Overview
• #1: Counting Triangles in Graph Stream
• #2: Detecting Dense Subtensors
◦ Introduction
◦ Proposed Method: M-Zoom
Terminologies and Problem Definition <<
Algorithm
◦ Experiments
◦ Conclusion
• Conclusion
52. Terminologies
• Assume a block (subtensor) 𝑩 in a 3-way tensor 𝑹
◦ 𝑺𝒊𝒛𝒆(𝐵): 𝐼1 + 𝐼2 + 𝐼3
◦ 𝑽𝒐𝒍 𝐵 = 𝑉𝑜𝑙𝑢𝑚𝑒(𝐵): 𝐼1 × 𝐼2 × 𝐼3
◦ 𝑴𝒂𝒔𝒔(𝐵): sum of entries in 𝐵
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 52/87
Introduction Experiments ConclusionProposed Method
𝑅
𝐼1
𝐼2
𝐼3
𝐵
53. Density Measures
• Density measures:
Traditional Density: ρ 𝑇 𝐵 = 𝑀𝑎𝑠𝑠 𝐵 /Vol(B)
- Maximized by a single entry with maximum value
Arithmetic Avg. Degree: ρ 𝐴 𝐵 = 𝑀𝑎𝑠𝑠 𝐵 /Size(B)
Geometric Avg. Degree: ρ 𝐺 𝐵 = 𝑀𝑎𝑠𝑠 𝐵 /
3
Vol B
Suspiciousness (Jiang et al. 2015) :
ρ 𝑆 𝐵, 𝑅 = 𝑉𝑜𝑙 𝐵 𝐷 𝐾𝐿
𝑀𝑎𝑠𝑠 𝐵
𝑉𝑜𝑙 𝐵
,
𝑀𝑎𝑠𝑠 𝑅
𝑉𝑜𝑙 𝑅
• Note that our method is not limited by specific
density measures
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 53/87
Introduction Experiments ConclusionProposed Method
54. Problem Definition
• Given: (1) 𝑹: a tensor,
(2) 𝝆: a density measure,
(3) 𝒌: the number of blocks we aim to find
• Find: 𝒌 distinct dense blocks maximizing 𝝆
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 54/87
(𝑹 = , 𝝆 = 𝝆 𝑨, 𝒌 = 𝟑)
, , }{
Introduction Experiments ConclusionProposed Method
55. Requirements
• Our goal is to design an approximation algorithm
• M-Zoom, our proposed method, satisfies all the
requirements
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 55/87
Scalable: runs in near-linear time
Accurate: provides an accuracy guarantee
Flexible: works well with various density metrics
Effective: produces meaningful results in practice
0
Introduction Experiments ConclusionProposed Method
56. Road Map
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 56/87
• Overview
• #1: Counting Triangles in Graph Stream
• #2: Detecting Dense Subtensors
◦ Introduction
◦ Proposed Method: M-Zoom
Terminologies and Problem Definition
Algorithm <<
◦ Experiments
◦ Conclusion
• Conclusion
57. Single Dense Block Detection
• Greedy search method
• Starts from the entire tensor
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 57/87
5 3 0
4 6 1
2 0 0
1 0 1
0
0
𝜌 = 2.9
Introduction Experiments ConclusionProposed Method
58. Single Dense Block Detection (cont.)
• Remove a slice to maximize density 𝝆
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 58/87
5 3 0
4 6 1
2 0 0 𝜌 = 3
Introduction Experiments ConclusionProposed Method
59. Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 59/87
5 3
4 6
2 0 𝜌 =3.3
• Remove a slice to maximize density 𝝆
Introduction Experiments ConclusionProposed Method
Single Dense Block Detection (cont.)
60. Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 60/87
5 3
4 6
2 0 𝜌 = 3.6
• Remove a slice to maximize density 𝝆
Introduction Experiments ConclusionProposed Method
Single Dense Block Detection (cont.)
61. 0
1
2
3
4
0 2 4 6 8
Density
Iteration
• Until all slices are removed
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 61/87
𝜌 = 0
Introduction Experiments ConclusionProposed Method
Single Dense Block Detection (cont.)
62. • Output: return the densest block so far
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 62/87
5 3
4 6
2 0 𝜌 = 3.6
Introduction Experiments ConclusionProposed Method
Single Dense Block Detection (cont.)
63. Speeding Up Process
• Theorem 1 [Remove Minimum Mass First]
Among slices in the same mode, removing the slice
with minimum mass is always best
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 63/87
12 > 9 > 2
Introduction Experiments ConclusionProposed Method
64. Accuracy Guarantee
• Theorem 2 [Approximation Guarantee]
𝝆 𝑨 𝑩 ≥
𝟏
𝑵
𝝆 𝑨 𝑩∗
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 64/87
M-Zoom Result Order Densest Block
Scalable: runs in near-linear time
Accurate: provides an accuracy guarantee
Flexible: works well with various density metrics
Effective: produces meaningful results in practice
0
.
3
7
0
Properties of M-Zoom:
Introduction Experiments ConclusionProposed Method
65. (Optional) Post Processing
• Local Search:
◦ grow(G) or shrink(S) until we reach a local optimum
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 65/87
Introduction Experiments ConclusionProposed Method
66. Handling Multiple Blocks
• Deflation: Remove found blocks before finding others
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 66/87
Find &
Remove
Find &
Remove
Find &
Remove
Restore
Introduction Experiments ConclusionProposed Method
68. Exp1. Scalability Test
• Goal: Measure scalability w.r.t. each input factor
• M-Zoom scales almost linearly!
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 68/87
#Non-Zeros Order Dimensionality
(#Slices)
M-Zoom
(with post process)
M-Zoom
Introduction Experiments ConclusionProposed Method
69. Exp1. Scalability Test (cont.)
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 69/87
Scalable: runs in near-linear time
Accurate: provides an accuracy guarantee
Flexible: works well with various density metrics
Effective: produces meaningful results in practice
0
.
3
7
0
Properties of M-Zoom:
Introduction Experiments ConclusionProposed Method
70. Exp2. Speed-Accuracy Test
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 70/87
Introduction Experiments ConclusionProposed Method
• Goal: compare speed and accuracy of dense-block
detection methods with various density measures
• Methods Compared:
◦ M-Zoom: Proposed Method
◦ CPD: Tensor Decomposition
◦ CrossSpot (Jiang et al. 2015): Local Search Method
71. Exp2. Speed-Accuracy Test (cont.)
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 71/87
• Data: Music Review (User X Music X Time X Rate)
Arithmetic Average
Degree (𝜌 𝐴)
Geometric Average
Degree (𝜌 𝐺)
Suspiciousness (𝜌 𝑆)
M-Zoom (with post process) M-Zoom
Introduction Experiments ConclusionProposed Method
72. Exp2. Speed-Accuracy Test (cont.)
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 72/87
Arithmetic Average
Degree (𝜌 𝐴)
Geometric Average
Degree (𝜌 𝐺)
Suspiciousness (𝜌 𝑆)
M-Zoom (with post process) M-Zoom
• Data: Wikipedia Revision History (User X Page X Time)
Introduction Experiments ConclusionProposed Method
73. Exp2. Speed-Accuracy Test (cont.)
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 73/87
• M-Zoom is up to 100× faster than its competitors
• M-Zoom shows comparable accuracy with its
competitors regardless of density measures
Scalable: runs in near-linear time
Accurate: provides an accuracy guarantee
Flexible: works well with various density metrics
Effective: producing meaningful results in practice
0
.
3
7
0
Properties of M-Zoom:
Introduction Experiments ConclusionProposed Method
74. Exp3. Discovery in Practice
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 74/87
First three blocks found by M-Zoom
11 users revised 10 pages
2,305 times within 16 hours
→ Edit wars
Introduction Experiments ConclusionProposed Method
• Data: Korean Wikipedia Revision History
(User X Page X Time)
75. Exp3. Discovery in Practice (cont.)
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 75/87
First three blocks found by M-Zoom
8 accounts revised 12 pages
2.5 million times
→ Bots
Introduction Experiments ConclusionProposed Method
• Data: English Wikipedia Revision History
(User X Page X Time)
76. Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 76/87
→ Network Attacks
First three blocks found by M-Zoom
Exp3. Discovery in Practice (cont.)
Introduction Experiments ConclusionProposed Method
• Data: TCP Dump (7-order tensor)
77. Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 77/87
Exp3. Discovery in Practice (cont.)
First three blocks found by M-Zoom
Introduction Experiments ConclusionProposed Method
9 accounts gives 1 product
369 reviews to in 22 hours
→ Spam reviews
• Data: App Market Review Data
(User X Product X Timestamp X Score)
78. Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 78/87
Scalable: runs in near-linear time
Accurate: provides an accuracy guarantee
Flexible: works well with various density metrics
Effective: produces meaningful results in practice
0
.
3
7
0
Properties of M-Zoom:
Exp3. Discovery in Practice (cont.)
Introduction Experiments ConclusionProposed Method
81. Related Work
• Disk-resident or Distributed Tensors [WSDM17]
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 81/87
#Non-Zeros
Proposed
(Serial)
Proposed
(Hadoop)
M-Zoom
82. Related Work
• Dynamic Tensors [KDD17]
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 82/87
ToePeubot
Dvtbot
Territory
Regionalism
Idol
Presidential
election
Religion
Disease
Politician
Dictator
Local election TV Show
Bot for auto
classification
Anime
Baseball
History
Edit War Bot Vandalism
Bursty edit (informative) Bursty edit (useless)
83. Roadmap
• Overview
• #1: Counting Triangles in Graph Stream
• #2: Detecting Dense Subtensors
• Conclusion <<
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 83/87
84. Conclusion
• Goal:
to understand Large Dynamic Graphs and Tensors
• Sub-Tasks:
◦ Structure Analysis
◦ Anomaly Detection
◦ Behavior Modeling
• Approaches:
◦ Streaming Algorithms (Sampling)
◦ Approximation Algorithms
◦ Distributed Algorithms
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 84/87
85. Thank You
• Papers, software, datasets: http://kijungshin.com/
• Email: kijungs@cs.cmu.edu
• Thanks to:
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 85/87
86. References
• Structure Analysis (Graphs)
◦ Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “CoreScope: Graph Mining Using k-Core Analysis - Patterns,
Anomalies and Algorithms”, ICDM 2016
◦ Kijung Shin, “Mining Large Dynamic Graphs and Tensors for Accurate Triangle Counting in Real Graph Streams”,
ICDM 2017
◦ Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with
Applications”, KAIS 2018
• Structure Analysis (Tensors)
◦ Kijung Shin and U Kang, “Distributed Methods for High-dimensional and Large-scale Tensor Factorization”, ICDM
2014
◦ Kijung Shin, Lee Sael, and U Kang, “Fully Scalable Methods for Distributed Tensor Factorization”, TKDE 2017
◦ Jinoh Oh, Kijung Shin, Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu, “S-HOT: Scalable High-Order
Tucker Decomposition”, WSDM 2017
• Anomaly Detection Analysis (Graphs)
◦ Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos, “FRAUDAR: Bounding Graph
Fraud in the Face of Camouflage”, KDD 2016
◦ Bryan Hooi, Kijung Shin, Hyun Ah Song, Alex Beutel, Neil Shah, and Christos Faloutsos, “Graph-Based Fraud
Detection in the Face of Camouflage”, TKDD 2017
◦ Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “CoreScope: Graph Mining Using k-Core Analysis - Patterns,
Anomalies and Algorithms”, ICDM 2016 [Duplicated]
◦ Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with
Applications”, KAIS 2018 [Duplicated]
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/87
87. References
• Anomaly Detection Analysis (Tensors)
◦ Kijung Shin, Bryan Hooi, and Christos Faloutsos, “Mining Large Dynamic Graphs and Tensors (by Kijung Shin)”,
ECML/PKDD 2016
◦ Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “D-Cube: Dense-Block Detection in Terabyte-Scale
Tensors”, WSDM 2017
◦ Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “DenseAlert: Incremental Dense-Subtensor Detection in
Tensor Streams”, KDD 2017
◦ Hemank Lamba, Bryan Hooi, Kijung Shin, Christos Faloutsos, and J¨urgen Pfeffer, “ZooRank: Ranking Suspicious
Entities in Time-Evolving Tensors”, ECML/PKDD 2017
◦ Kijung Shin, Bryan Hooi, and Christos Faloutsos, “Fast, Accurate and Flexible Algorithms for Dense Subtensor
Mining”, TKDD 2018
• Behavior Modeling
◦ Kijung Shin, Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan, “Discovering Progression Stages in
Trillion-Scale Behavior Logs”, WWW 2018
◦ Kijung Shin, Euiwoong Lee, Dhivya Eswaran, and Ariel D. Procaccia, “Why You Should Charge Your Friends for
Borrowing Your Stuff”, IJCAI 2017
• Similar Node Search
◦ Kijung Shin, Jinhong Jung, Lee Sael, and U Kang, “BEAR: Block Elimination Approach for Random Walk with Restart
on Large Graphs”, SIGMOD 2015
◦ Jinhong Jung, Kijung Shin, Lee Sael, and U Kang, “Random Walk with Restart on Large Graphs Using Block
Elimination”, TODS 2016
Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 87/87