Event summarization using tweets

Event
Summarization
using Tweets
Deepayan Chakrabarti and
KunalPunera
Yahoo!Research

Abstract
 For

some highly structured and recurring
events, such as sports, it is better to use more
sophisticated techniques to summarize the
relevant tweets.
 A solution based on learning the underlying
hidden state representation of the event via
Hidden Markov Models.

Introduction
 one-shot

events
 Have “structure” or are long-running
 (a)the most recent tweets could
be repeating the same information about
the event
 (b)most users would be interested in a
summary of the occurrences in the game
so far.

Introduction
 Our

goal：to extract a few tweets that
best describe the chain of interesting
occurrences in that event

A
1.
2.

two-step process：
Segment the event time-line
pick key tweets to describe each
segment

Introduction
 challenges

：
 Events are typically “bursty”
 Separate sub-events may not be temporally
far apart
 Previous instances of similar events are
available.
 Tweets are noisy
 Strong empirical results.

Characteristics of Sports Coverage
in Tweets

Characteristics of Sports
Coverage in Tweets

Characteristics of Sports
Coverage in Tweets
 Some
1.
2.

issues of this data:
sub-events are marked by increased
frequency of tweets.
Boundaries of sub-events also result in a
change in vocabulary of tweets.

Algorithms
 Baseline:

SUMMALLTEXT
 associate with each tweet a vector of the
TF-logIDF of its constituent words
 Cosine distance
 Select those tweets which are closest to
all other tweets from theevent.

Algorithms
 Several
1.
2.

defects：
O ( |Z|2) computations
heavily biased towards the most popular
sub-event

Algorithms
 Baseline:
1.
2.

SUMMTIMEINT
Split up the duration into equal-sized
time intervals
Select the key tweets from each interval

 Two
1.
2.

extra parameters:
a segmentation TS of the duration of the
event into equal-time windows
the minimum activity threshold l

Algorithms
 Defects:

Burstiness of tweet volume:
 Multiple sub-events in the same burst:
 “Cold Start” :


Algorithms
 Our

Approach: SUMMHMM
 BACKGROUND ON HMMS:
 N states labeled S1 ,…, SN ,
 A set of observation symbols v1 ,…, vM
 bi(k)
 a ij
πi

Algorithms
 Each

state: one class of sub-events
 The symbols: the words used in tweets
 The variation in symbol probabilities
across different states: the different
“language models” used by the Twitter
users
 The transitions between states models the
chain of sub-events over time

Algorithms
 Our

Modiﬁcations
 OUTPUTS PER TIME – STEP: a multiset of
symbols
 DETECTING BURSTS IN TWEET VOLUME:
 COMBINING INFORMATION FROM
MULTIPLE EVENTS

Algorithms
 three

sets of symbol probabilities:
 (1)θ( s ) , which is speciﬁc to each state but
is the same for all events,
 (2) θ( sg ) , which is speciﬁc to a particular
state for a particular game
 (3) θ( bg ) , which is a background
distribution of symbols over all states
and games.

Algorithms
 Algorithm

Summary
 Input: multiple events of the same type
 Learns the model parameters that bestﬁt
the data. （EM algorithm）
 the optimal segmentation （standard V
iterbi algorithm）

Algorithms
 standard

Viterbi algorithm：

Experiments
 Experimental

Setup
 professional American Football
 Sep 12th, 2010 to Jan 24th, 2011
 over 440K tweets over 150 games for an
average of around 1760 tweets per
game.

Experiments
 MANUAL

GROUND TRUTH CONSTRUCTION .
 Each output tweet was matched with the
happenings in the game and labeled as
Comment-Play , Comment-Game , or
Comment-General .

Experiments
 Play-by-Play

Performance
 RECALL
 PRECISION

Summary Construction

 EVALUATION

AT OPERATING POINT .

conclusion
 We

proposed an approach based on
learning an underlying hidden state
representation of an event .

Towards Twitter
Context
Summarization
with User
Inﬂuence
Models

ABSTRACT
 Traditional

summarization techniques only
consider text information.
 We study how user inﬂuence
models, which project user interaction
information onto a Twitter context
tree, can help Twitter context
summarization within a supervised
learning framework.

INTRODUCTION
A

Twitter context tree is defined as a tree
structure of tweets which are connected
with reply relationship, and the root of a
context tree is its original tweet.
 two types of user influence models, called
pair-wise user influence model and global
user influence model.
 Granger Causality influence model
 PageRank algorithm

TWITTER CONTEXT TREE ANALYSIS
 The

temporal growth of the Twitter
context tree

TWITTER CONTEXT TREE ANALYSIS
 Whether

the tree structure can help
the summarization task

USER INFLUENCE MODELS
 Granger

Causality Inﬂuence Model
 A time series data x is to Granger cause
another time series data y ,If and only if
regressing for y in terms of both past
values of y and x is statistically signiﬁcantly
more accurate than regressing for y in
terms of past values of y only. Let



Lasso-Granger method



Lag ( X,T )to denote the lagged version of
data X ;
FullyConnectedFeatureGraph ( X ) denotes
the fully connected graph deﬁned over the
features;
Lasso ( y, Xlag )denotes the set of temporal
variables receiving a non-zero co-eﬃcient by
the Lasso algorithm.





 Pagerank

Inﬂuence Model
 For each user u , it has a directed edge to
each user v if u has a reply or a retweet to
v ’s tweet and we can have a global user
graph G .

SUMMARIZATION METHOD
 Text-based
 TFIDF

Signals

 Popularity

Signals
 Number of replies, number of retweets,
and number of followers for a given
tweet’s author.

 Temporal
1.
2.

Signals
ﬁt the age of tweets in a context tree
into an exponential distribution.
for each tweet, we compute its
temporal signal as the likelihood of
sampling its age from the ﬁtted
exponential distribution.

Supervised Learning Framework
 Gradient

algorithm

Boosted Decision Tree(GBDT)

EDITORIAL DATA SET
 10

Twitter context trees from March 7th
to March 20th,2011
 4 are initiated by Lady Gaga
 6 are initiated by Justin Bieber
1. read the root tweet
2. Scans through all candidate tweets
3. Selects 5 to 10 tweets

EXPERIMENTS
 Evaluation

Metrics

Methods for Comparison













Centroid:
SimToRoot:
Linear:
Mead:
LexRank
SVD:
ContentOnly
ContentAttribute:
AllNoGranger:
All:

Experimental Results
 Overall

Comparison

CONCLUSION
 User

inﬂuence information is very helpful to
generate a high quality summary for each
Twitter context tree.
 All signals are converted into features, and
we cast Twitter context summarization into a
supervised learning problem.

Event summarization using tweets

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Event summarization using tweets

Similar to Event summarization using tweets (20)

More from moresmile

More from moresmile (7)

Recently uploaded

Recently uploaded (20)

Event summarization using tweets