FINDING BURSTY TOPICS
FROM MICROBLOGS
Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim
Living Analytics Research Centre
Sch...
Abstract



1.

2.

To find topics that have bursty patterns on
microblogs
two observations:
posts published around the s...
Introduction


Retrospective bursty event detection :
Bursty detection: state machine
 Topic discovery: LDA



1.

2.
...
Method


Preliminaries
d i

, u i , t i , w i,j
 a bursty topic b as a word distribution coupled with
a bursty interval...
Our Topic Model

1.

2.
3.

4.

Assume:
C (latent) topics in the text stream, where
each topic c has a word distribution ...




Our focus is to find popular global events, we
need to separate out these “personal” posts.
A time-independent topic ...
Learning

Learning
Burst Detection


Assume:
A

series of counts( mc1 , mc2 ,..., mcT)
representing the intensity of the topic at different...
Burst Detection

σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics.
Finally, a burst is marked by a consecutive
subsequence of bur...
Experiments


Data Set
 sampled

2892 users from this dataset and
extracted their tweets between September 1 and
Novembe...


Ground Truth Generation
 top-30

bursty topics from each model
 two human judges to judge their quality by
assigning ...
Sample Results and
Discussions
Sample Results and
Discussions
two case studies to demonstrate
the effectiveness of our model


Effectiveness of Temporal Models: Both
TimeLDA and TimeU...
two case studies to demonstrate
the effectiveness of our model




Effectiveness of User Models: it is important to
filte...
Conclusions




A new topic model that considers both the
temporal information of microblog posts and
users’ personal in...
TM-LDA: EFfiCIENT
ONLINE MODELING OF
THE LATENT TOPIC
TRANSITIONS IN SOCIAL
MEDIA
ABSTRACT




TM-LDA learns the transition parameters
among topics by minimizing the prediction
error on topic distributi...

1.

2.

3.

Challenges:
to model and analyze latent topics in social
textual data;
to adaptively update the models as th...
contribution






First, we propose a novel temporally-aware
topic language model, TM-LDA, which
captures the latent t...
METHODOLOGY


TM-LDA Algorithm
 if

we define the space of topic distribution as X =
{ x ∈ Rn+ : || x || 1 = 1 } , TM-LDA...


Error Function of TM-LDA:


Iterative Minimization of the Error Function


Direct Minimization of the Error Function
TM-LDA for Twitter Stream
TM-LDA for Twitter Stream


let A = D (1 ;m ) and B = D (2 ;m +1)
UPDATING TRANSITION
PARAMETERS


Updating Transition Parameters with
Sherman-Morrison-Woodbury Formula


Updating Transition Parameters with QRfactorization
 Suppose

the QR-factorization of matrix A is A =
QR , where Q′Q =...
EXPERIMENTS


Dataset



Using Perplexity as Evaluation Metric
Predicting Future Tweets

TM-LDA first trains LDA on 7-day historical tweets and
compute the transition parameter matrix ac...






Estimated Topic Distributions ofFuture" Tweets : the
topic distribution of the tweet b.
LDA Topic Distributions o...
Efficiency of Updating Transition
Parameters
Properties of Transition
Parameters




T is a square matrix where the size of T is
determined by the number of topics t...
APPLYING TM-LDA FORTREND
ANALYSIS AND SENSEMAKING
Changing Topic Transitions over
Time
Various Topic Transition Patterns by
Cities
CONCLUSIONS




a novel temporally-aware language model,
TM-LDA, for efficiently modeling streams of
social text such as a...
Finding bursty topics from microblogs
Finding bursty topics from microblogs
Finding bursty topics from microblogs
Finding bursty topics from microblogs
Finding bursty topics from microblogs
Upcoming SlideShare
Loading in...5
×

Finding bursty topics from microblogs

265
-1

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
265
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Finding bursty topics from microblogs

  1. 1. FINDING BURSTY TOPICS FROM MICROBLOGS Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim Living Analytics Research Centre School of Information Systems Singapore Management University
  2. 2. Abstract   1. 2. To find topics that have bursty patterns on microblogs two observations: posts published around the same time are more likely to have the same topic posts published by the same user are more likely to have the same topic
  3. 3. Introduction  Retrospective bursty event detection : Bursty detection: state machine  Topic discovery: LDA   1. 2. Two assumptions: If a post is about a global event, it is likely to follow a global topic distribution that is timedependent. If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable overtime.
  4. 4. Method  Preliminaries d i , u i , t i , w i,j  a bursty topic b as a word distribution coupled with a bursty interval, denoted as ( ϕb,tbs ,tbe )   Our task: to find meaningful bursty topics from the input text stream. Our method: a topic discovery step and a burst detection step.
  5. 5. Our Topic Model  1. 2. 3. 4. Assume: C (latent) topics in the text stream, where each topic c has a word distribution ϕc. A background word distribution ϕB A single post is most likely to be about a single topic. A global topic distribution θt for each time point t .
  6. 6.   Our focus is to find popular global events, we need to separate out these “personal” posts. A time-independent topic distribution ηu for each user to capture her long term topical interests.
  7. 7. Learning 
  8. 8. Learning
  9. 9. Burst Detection  Assume: A series of counts( mc1 , mc2 ,..., mcT) representing the intensity of the topic at different time points.  These counts are generated by two Poisson distributions corresponding to a bursty state and a normal state.
  10. 10. Burst Detection σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics. Finally, a burst is marked by a consecutive subsequence of bursty states.
  11. 11. Experiments  Data Set  sampled 2892 users from this dataset and extracted their tweets between September 1 and November 30, 2011(91 days in total).  the final dataset with 3,967,927 tweets and 24,280,638 tokens.
  12. 12.  Ground Truth Generation  top-30 bursty topics from each model  two human judges to judge their quality by assigning a score of either 0 or 1  Evaluation set the number of topics C to 80, α to 50/C and β to 0.01. Each model was run for 500 iterations of Gibbs sampling.  We
  13. 13. Sample Results and Discussions
  14. 14. Sample Results and Discussions
  15. 15. two case studies to demonstrate the effectiveness of our model  Effectiveness of Temporal Models: Both TimeLDA and TimeUserLDA tend to group posts published on the same day into the same topic.
  16. 16. two case studies to demonstrate the effectiveness of our model   Effectiveness of User Models: it is important to filter out users’ “personal” posts in order to find meaningful global events.
  17. 17. Conclusions   A new topic model that considers both the temporal information of microblog posts and users’ personal interests. A Poisson-based state machine to identify bursty periods from the topics discovered by our model.
  18. 18. TM-LDA: EFfiCIENT ONLINE MODELING OF THE LATENT TOPIC TRANSITIONS IN SOCIAL MEDIA
  19. 19. ABSTRACT   TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings. We develop an efficient updating algorithm to adjust transition parameters, as new documents stream in.
  20. 20.  1. 2. 3. Challenges: to model and analyze latent topics in social textual data; to adaptively update the models as the massive social content streams in; to facilitate temporal-aware applications of social media
  21. 21. contribution    First, we propose a novel temporally-aware topic language model, TM-LDA, which captures the latent topic transitions in temporally-sequenced documents. Second, we design an efficient algorithm to update TM-LDA which enables it to be performed on large scale data. Finally, we evaluate TM-LDA against the static topic modeling method(LDA)
  22. 22. METHODOLOGY  TM-LDA Algorithm  if we define the space of topic distribution as X = { x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be considered as a function f : X → X .  the prediction error  TM-LDA is modeled as a non-linear mapping:
  23. 23.  Error Function of TM-LDA:
  24. 24.  Iterative Minimization of the Error Function
  25. 25.  Direct Minimization of the Error Function
  26. 26. TM-LDA for Twitter Stream
  27. 27. TM-LDA for Twitter Stream  let A = D (1 ;m ) and B = D (2 ;m +1)
  28. 28. UPDATING TRANSITION PARAMETERS  Updating Transition Parameters with Sherman-Morrison-Woodbury Formula
  29. 29.  Updating Transition Parameters with QRfactorization  Suppose the QR-factorization of matrix A is A = QR , where Q′Q = I and R is an upper triangular matrix. RT=Q’B
  30. 30. EXPERIMENTS  Dataset  Using Perplexity as Evaluation Metric
  31. 31. Predicting Future Tweets TM-LDA first trains LDA on 7-day historical tweets and compute the transition parameter matrix accordingly. Then for each new tweet generated on the 8th day, it predicts the topic distribution of the following tweet.
  32. 32.    Estimated Topic Distributions ofFuture" Tweets : the topic distribution of the tweet b. LDA Topic Distributions of Future" Tweets : the inferred topic distribution of the tweet b . LDA Topic Distributions ofPrevious" Tweets : the inferred topic distribution of the tweet a .
  33. 33. Efficiency of Updating Transition Parameters
  34. 34. Properties of Transition Parameters   T is a square matrix where the size of T is determined by the number of topics trained in LDA. The row sum of T is always 1, which means that the overall weights emitted from atopic is 1.
  35. 35. APPLYING TM-LDA FORTREND ANALYSIS AND SENSEMAKING
  36. 36. Changing Topic Transitions over Time
  37. 37. Various Topic Transition Patterns by Cities
  38. 38. CONCLUSIONS   a novel temporally-aware language model, TM-LDA, for efficiently modeling streams of social text such as a Twitter stream for an author an efficient model updating algorithm for TMLDA
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×