Pei Lee, ICDE 2014, Chicago, IL, USA
Incremental Cluster Evolution Tracking
from Highly Dynamic Network Data
Pei Lee, Laks V.S. Lakshmanan
Computer Science Department
University of British Columbia
Vancouver, BC, Canada
Evangelos E. Milios
Computer Science Department
Dalhousie University
Halifax, NS, Canada
1
2014-4-16
Outline
2
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Outline
3
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Evolving Network
 Network changes with time
 Examples:
 Social Network
 add/remove friends or followers
 Co-authorship/citation network
 new collaborations/citations added every year
 Email/Calling Graph
 every edge has a time stamp
4
An illustration of evolving co-authorship network
5
Taken from http://wiki.cns.iu.edu/pages/viewpage.action?pageId=2199676
Social Streams:
Twitter, Facebook, etc6
7
Social Event Evolution Tracking
Event Evolution Patterns
8
Post Network
(time t)
Post Network
(time t+1)
Event Snapshots
(time t)
Event Snapshots
(time t+1)
Evolution
Patterns:
emerge
disappear
grow
decay
merge
split
evolve
Evolving Network
Social Events
9
Model social stream as an evolving network
Outline
10
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Traditional Evolving Network
Mining Approaches
 Divide and Conquer:
 decompose a dynamic network into a series of
snapshots for each moment,
 apply graph mining algorithms on each snapshot
to find useful patterns,
 match patterns between consecutive moments to
generate a dynamic pattern sequence.
 Imagine the finding of evolving clusters
11
Illustrating Divide-and-Conquer
12
Taken from http://sydney.edu.au/engineering/it/~shhong/gallery.htm
Moment 1
Moment 2
Moment 3
Moment 4
Moment 5
Divide-and-Conquer:
Clustering in evolving networks13
 Ct: a cluster we find at snapshot of time t;
 Ct+1: a cluster we find at snapshot of time t+1.
 How to define “Ct evolves to Ct+1”?
 Heuristics:
 If Ct and Ct+1 have the overlap above a given
threshold, we say they are matched.
 Formally, based on Jaccard similarity:
Drawbacks of Divide-and-conquer
14
 Quality:
 It is difficult to decide the threshold K
 The matching between two consecutive snapshots
will lose accuracy
 Performance:
 Need to cluster each snapshot from scratch
 Lots of redundant computation
New Proposal: Incremental Computation
for dense subgraph mining
15
 Basic Idea:
 For the very first snapshot, mine the graph pattern
set S0 from scratch
 After this, this step is never applied again.
 On the steady state, let t start at 1
 Obtain the graph update ΔG by comparing the
network at moment t with moment t-1
 Derive St from St-1 based on ΔG
 Let t increase to t+1
Divide-and-Conquer vs. Incremental
Computation
16
 Divide-and-Conquer:
 1, 2, 3, 4
 Incremental Computation:
 Initial step: 1
 Steady state: 5
 Advantages:
 Avoid redundant computation
 More accurately capture the evolution patterns
Incremental Computation
Framework17
 Adjust the clusters at each moment as the
updating of networks
Outline
18
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Post Network Construction
19
 A social stream is a FIFO queue of posts
 Post similarity:
 Post Network:
 Each post is a node
 Each edge is constructed if the similarity of end nodes
is higher than a given threshold
Content similarity
Time distance
Evolving Post Network
20
 We can build a post network for your daily
timeline in Facebook/Twitter/LinkedIn
 As the streaming of posts, the post network is
evolving very quickly
 Challenges of evolving post network mining:
 The quick surge of post streams (speed)
 A large number of posts are noise (quality)
 The huge amount of posts (scalability)
Observing Time Window
21
 Len: time window length
 Δt: time window shifting size at each moment
 Notations:
How to filter out noise?
22
 Noise is ubiquitous in social streams
 “Good morning ”, “thank you ^.^”, etc
 About 40% tweets make very little sense
How to filter out noise?
23
 Distinguish posts into three types:
wt(p): the priority of post p at moment t
 For the example in social network:
 Core: person with lots of friends
 Border: not core, but a friend of core
 Noise: not core, and not a friend of core
Outline
24
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Skeletal graph of a post network
25
 Skeletal Graph:
 A graph consisting of all core posts
 A brief summary of the original post network
 Clusters can be derived from skeletal graphs
 Our algorithm monitors the changing of
skeletal graphs
Network Evolution Operations
26
 Add a post
 Remove a post
Cluster Evolution Operations
27
 We define 6 cluster evolution patterns:
 appear, disappear, grow, decay, merge and split
Summary: Cluster Evolution
28
 Add a post:
 a new cluster may appear
 An existing cluster may grow
 Multiple clusters may merge into the single one
 Delete a post:
 An existing cluster may disappear
 An existing cluster may decay
 An existing cluster may split into multiple clusters
Network Evolution to Cluster Evolution
29
 Cluster evolution of adding a post
Network Evolution to Cluster Evolution
30
 Cluster evolution of deleting a post
Bulk Updating
31
 Existing incremental computation on dynamic
graphs usually treats the addition/deletion of
nodes or edges one by one
 Since social posts arrive at a high speed, the
post-by-post incremental updating will lead to
very poor performance
 Bulk updating: update subgraph-by-subgraph
 a bulk = a post cluster
 More details in Section VII of the paper
Proposed Algorithms
32
 ICM: Incremental
Cluster Maintenance
 eTrack: Cluster
Evolution Tracking
Outline
33
 Motivation
 Evolving network meets social event
 Incremental Computation Framework
 Divide-and-conquer vs. incremental computation
 Post Network Construction
 Combat noise
 Network and Cluster Evolution
 Evolution operations
 Empirical Study
 Examples
Twitter Technology domain data sets
34
 Time span: 1 month
 Tech-Lite: collecting all the timelines of users
listed in the Technology category of “Who to
follow” and their retweeted users
 streaming rate is about 11700 tweets/day
 Tech-Full: collecting all the timelines followed
by users who are in the Technology category
 streaming rate is about 7216 tweets/hour
Ground Truth
35
 Major events from News articles:
 Crawl news from major technology websites
 By treating the news article titles as posts, we
apply our approach to extract events
 Peaks in Google Trends
Precision and recall
36
 HashtagPeaks: use common hashtags to compute post
similarity
 UnigramPeaks: use common unigrams to compute post
similarity
 Louvain: use common entities to compute post similarity
and apply Louvain community detection algorithm
 eTrack: use common entities to compute post similarity and
apply our approach
Top 10 social events detected by
different methods37
Running time
38
 (a) Adjusting time window length
 (b) Adjusting step length
Cluster Evolution Examples
39
40
41
Conclusion
42
 Theoretical side:
 We propose an incremental computation
framework for cluster evolution tracking in highly
dynamic networks
 Application side:
 We propose an efficient tracking system for event
evolution patterns in social streams
 Q & A
Post Network Mining
43
 A snapshot of post network is constructed by
the posts in the same time window
As social posts stream in, events (dense clusters) are identified out
Relationships between post
network, skeletal graph and clusters
44
 Skeletal graph is a sketch of post network
 Clusters can be generated from the skeletal
graphs

[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

  • 1.
    Pei Lee, ICDE2014, Chicago, IL, USA Incremental Cluster Evolution Tracking from Highly Dynamic Network Data Pei Lee, Laks V.S. Lakshmanan Computer Science Department University of British Columbia Vancouver, BC, Canada Evangelos E. Milios Computer Science Department Dalhousie University Halifax, NS, Canada 1 2014-4-16
  • 2.
    Outline 2  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 3.
    Outline 3  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 4.
    Evolving Network  Networkchanges with time  Examples:  Social Network  add/remove friends or followers  Co-authorship/citation network  new collaborations/citations added every year  Email/Calling Graph  every edge has a time stamp 4
  • 5.
    An illustration ofevolving co-authorship network 5 Taken from http://wiki.cns.iu.edu/pages/viewpage.action?pageId=2199676
  • 6.
  • 7.
  • 8.
    Event Evolution Patterns 8 PostNetwork (time t) Post Network (time t+1) Event Snapshots (time t) Event Snapshots (time t+1) Evolution Patterns: emerge disappear grow decay merge split evolve
  • 9.
    Evolving Network Social Events 9 Modelsocial stream as an evolving network
  • 10.
    Outline 10  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 11.
    Traditional Evolving Network MiningApproaches  Divide and Conquer:  decompose a dynamic network into a series of snapshots for each moment,  apply graph mining algorithms on each snapshot to find useful patterns,  match patterns between consecutive moments to generate a dynamic pattern sequence.  Imagine the finding of evolving clusters 11
  • 12.
    Illustrating Divide-and-Conquer 12 Taken fromhttp://sydney.edu.au/engineering/it/~shhong/gallery.htm Moment 1 Moment 2 Moment 3 Moment 4 Moment 5
  • 13.
    Divide-and-Conquer: Clustering in evolvingnetworks13  Ct: a cluster we find at snapshot of time t;  Ct+1: a cluster we find at snapshot of time t+1.  How to define “Ct evolves to Ct+1”?  Heuristics:  If Ct and Ct+1 have the overlap above a given threshold, we say they are matched.  Formally, based on Jaccard similarity:
  • 14.
    Drawbacks of Divide-and-conquer 14 Quality:  It is difficult to decide the threshold K  The matching between two consecutive snapshots will lose accuracy  Performance:  Need to cluster each snapshot from scratch  Lots of redundant computation
  • 15.
    New Proposal: IncrementalComputation for dense subgraph mining 15  Basic Idea:  For the very first snapshot, mine the graph pattern set S0 from scratch  After this, this step is never applied again.  On the steady state, let t start at 1  Obtain the graph update ΔG by comparing the network at moment t with moment t-1  Derive St from St-1 based on ΔG  Let t increase to t+1
  • 16.
    Divide-and-Conquer vs. Incremental Computation 16 Divide-and-Conquer:  1, 2, 3, 4  Incremental Computation:  Initial step: 1  Steady state: 5  Advantages:  Avoid redundant computation  More accurately capture the evolution patterns
  • 17.
    Incremental Computation Framework17  Adjustthe clusters at each moment as the updating of networks
  • 18.
    Outline 18  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 19.
    Post Network Construction 19 A social stream is a FIFO queue of posts  Post similarity:  Post Network:  Each post is a node  Each edge is constructed if the similarity of end nodes is higher than a given threshold Content similarity Time distance
  • 20.
    Evolving Post Network 20 We can build a post network for your daily timeline in Facebook/Twitter/LinkedIn  As the streaming of posts, the post network is evolving very quickly  Challenges of evolving post network mining:  The quick surge of post streams (speed)  A large number of posts are noise (quality)  The huge amount of posts (scalability)
  • 21.
    Observing Time Window 21 Len: time window length  Δt: time window shifting size at each moment  Notations:
  • 22.
    How to filterout noise? 22  Noise is ubiquitous in social streams  “Good morning ”, “thank you ^.^”, etc  About 40% tweets make very little sense
  • 23.
    How to filterout noise? 23  Distinguish posts into three types: wt(p): the priority of post p at moment t  For the example in social network:  Core: person with lots of friends  Border: not core, but a friend of core  Noise: not core, and not a friend of core
  • 24.
    Outline 24  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 25.
    Skeletal graph ofa post network 25  Skeletal Graph:  A graph consisting of all core posts  A brief summary of the original post network  Clusters can be derived from skeletal graphs  Our algorithm monitors the changing of skeletal graphs
  • 26.
    Network Evolution Operations 26 Add a post  Remove a post
  • 27.
    Cluster Evolution Operations 27 We define 6 cluster evolution patterns:  appear, disappear, grow, decay, merge and split
  • 28.
    Summary: Cluster Evolution 28 Add a post:  a new cluster may appear  An existing cluster may grow  Multiple clusters may merge into the single one  Delete a post:  An existing cluster may disappear  An existing cluster may decay  An existing cluster may split into multiple clusters
  • 29.
    Network Evolution toCluster Evolution 29  Cluster evolution of adding a post
  • 30.
    Network Evolution toCluster Evolution 30  Cluster evolution of deleting a post
  • 31.
    Bulk Updating 31  Existingincremental computation on dynamic graphs usually treats the addition/deletion of nodes or edges one by one  Since social posts arrive at a high speed, the post-by-post incremental updating will lead to very poor performance  Bulk updating: update subgraph-by-subgraph  a bulk = a post cluster  More details in Section VII of the paper
  • 32.
    Proposed Algorithms 32  ICM:Incremental Cluster Maintenance  eTrack: Cluster Evolution Tracking
  • 33.
    Outline 33  Motivation  Evolvingnetwork meets social event  Incremental Computation Framework  Divide-and-conquer vs. incremental computation  Post Network Construction  Combat noise  Network and Cluster Evolution  Evolution operations  Empirical Study  Examples
  • 34.
    Twitter Technology domaindata sets 34  Time span: 1 month  Tech-Lite: collecting all the timelines of users listed in the Technology category of “Who to follow” and their retweeted users  streaming rate is about 11700 tweets/day  Tech-Full: collecting all the timelines followed by users who are in the Technology category  streaming rate is about 7216 tweets/hour
  • 35.
    Ground Truth 35  Majorevents from News articles:  Crawl news from major technology websites  By treating the news article titles as posts, we apply our approach to extract events  Peaks in Google Trends
  • 36.
    Precision and recall 36 HashtagPeaks: use common hashtags to compute post similarity  UnigramPeaks: use common unigrams to compute post similarity  Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm  eTrack: use common entities to compute post similarity and apply our approach
  • 37.
    Top 10 socialevents detected by different methods37
  • 38.
    Running time 38  (a)Adjusting time window length  (b) Adjusting step length
  • 39.
  • 40.
  • 41.
  • 42.
    Conclusion 42  Theoretical side: We propose an incremental computation framework for cluster evolution tracking in highly dynamic networks  Application side:  We propose an efficient tracking system for event evolution patterns in social streams  Q & A
  • 43.
    Post Network Mining 43 A snapshot of post network is constructed by the posts in the same time window As social posts stream in, events (dense clusters) are identified out
  • 44.
    Relationships between post network,skeletal graph and clusters 44  Skeletal graph is a sketch of post network  Clusters can be generated from the skeletal graphs

Editor's Notes

  • #2 The problem, challenges, theory, experiments, conclusion, (related work, theory proofs)