2. Abstract
For
some highly structured and recurring
events, such as sports, it is better to use more
sophisticated techniques to summarize the
relevant tweets.
A solution based on learning the underlying
hidden state representation of the event via
Hidden Markov Models.
3. Introduction
one-shot
events
Have “structure” or are long-running
(a)the most recent tweets could
be repeating the same information about
the event
(b)most users would be interested in a
summary of the occurrences in the game
so far.
4. Introduction
Our
goal:to extract a few tweets that
best describe the chain of interesting
occurrences in that event
A
1.
2.
two-step process:
Segment the event time-line
pick key tweets to describe each
segment
5. Introduction
challenges
:
Events are typically “bursty”
Separate sub-events may not be temporally
far apart
Previous instances of similar events are
available.
Tweets are noisy
Strong empirical results.
8. Characteristics of Sports
Coverage in Tweets
Some
1.
2.
issues of this data:
sub-events are marked by increased
frequency of tweets.
Boundaries of sub-events also result in a
change in vocabulary of tweets.
9. Algorithms
Baseline:
SUMMALLTEXT
associate with each tweet a vector of the
TF-logIDF of its constituent words
Cosine distance
Select those tweets which are closest to
all other tweets from theevent.
12. Algorithms
Baseline:
1.
2.
SUMMTIMEINT
Split up the duration into equal-sized
time intervals
Select the key tweets from each interval
Two
1.
2.
extra parameters:
a segmentation TS of the duration of the
event into equal-time windows
the minimum activity threshold l
15. Algorithms
Our
Approach: SUMMHMM
BACKGROUND ON HMMS:
N states labeled S1 ,…, SN ,
A set of observation symbols v1 ,…, vM
bi(k)
a ij
πi
16. Algorithms
Each
state: one class of sub-events
The symbols: the words used in tweets
The variation in symbol probabilities
across different states: the different
“language models” used by the Twitter
users
The transitions between states models the
chain of sub-events over time
17. Algorithms
Our
Modifications
OUTPUTS PER TIME – STEP: a multiset of
symbols
DETECTING BURSTS IN TWEET VOLUME:
COMBINING INFORMATION FROM
MULTIPLE EVENTS
18. Algorithms
three
sets of symbol probabilities:
(1)θ( s ) , which is specific to each state but
is the same for all events,
(2) θ( sg ) , which is specific to a particular
state for a particular game
(3) θ( bg ) , which is a background
distribution of symbols over all states
and games.
19. Algorithms
Algorithm
Summary
Input: multiple events of the same type
Learns the model parameters that bestfit
the data. (EM algorithm)
the optimal segmentation (standard V
iterbi algorithm)
23. Experiments
MANUAL
GROUND TRUTH CONSTRUCTION .
Each output tweet was matched with the
happenings in the game and labeled as
Comment-Play , Comment-Game , or
Comment-General .
31. ABSTRACT
Traditional
summarization techniques only
consider text information.
We study how user influence
models, which project user interaction
information onto a Twitter context
tree, can help Twitter context
summarization within a supervised
learning framework.
32. INTRODUCTION
A
Twitter context tree is defined as a tree
structure of tweets which are connected
with reply relationship, and the root of a
context tree is its original tweet.
two types of user influence models, called
pair-wise user influence model and global
user influence model.
Granger Causality influence model
PageRank algorithm
34. TWITTER CONTEXT TREE ANALYSIS
Whether
the tree structure can help
the summarization task
35. USER INFLUENCE MODELS
Granger
Causality Influence Model
A time series data x is to Granger cause
another time series data y ,If and only if
regressing for y in terms of both past
values of y and x is statistically significantly
more accurate than regressing for y in
terms of past values of y only. Let
36. USER INFLUENCE MODELS
Lasso-Granger method
Lag ( X,T )to denote the lagged version of
data X ;
FullyConnectedFeatureGraph ( X ) denotes
the fully connected graph defined over the
features;
Lasso ( y, Xlag )denotes the set of temporal
variables receiving a non-zero co-efficient by
the Lasso algorithm.
37.
38. USER INFLUENCE MODELS
Pagerank
Influence Model
For each user u , it has a directed edge to
each user v if u has a reply or a retweet to
v ’s tweet and we can have a global user
graph G .
41. SUMMARIZATION METHOD
Temporal
1.
2.
Signals
fit the age of tweets in a context tree
into an exponential distribution.
for each tweet, we compute its
temporal signal as the likelihood of
sampling its age from the fitted
exponential distribution.
43. EDITORIAL DATA SET
10
Twitter context trees from March 7th
to March 20th,2011
4 are initiated by Lady Gaga
6 are initiated by Justin Bieber
1. read the root tweet
2. Scans through all candidate tweets
3. Selects 5 to 10 tweets
54. CONCLUSION
User
influence information is very helpful to
generate a high quality summary for each
Twitter context tree.
All signals are converted into features, and
we cast Twitter context summarization into a
supervised learning problem.