2. ABSTRACT
we explore the problem of generating storylines
from microblogs for user input queries.
Given a query of an ongoing event, we propose
to sketch the real-time storyline of the event by a
two-level solution.
1. propose a language model with dynamic
pseudo relevance feedback to obtain relevant
tweets
2. Generate storylines via graph optimization
4. INTRODUCTION
differences between GESM and prior studies:
Well edited facts ---- short noisy text
2. GESM provides personalized service
3. A two-level framework is necessary: at the low
level, finding all relevant tweets through the
time-line of the event by a retrieve model; and
at the high level, summarizing relevant tweets
and the latent structure to produce a storyline.
1.
5. INTRODUCTION
Challenges
1、the dynamic and sparse nature of microblogs
——How to match the underlying event expressed
by the vague event query to potential relevant
tweets which possibly not contain any query terms
2、Numerous duplicate tweets and direct and
undirect re-tweets
6. INTRODUCTION
contributions
generating event storylines from microblogs
2. A dynamic pseudo relevance feedback (DPRF)
language
model
3. a graph-based optimization problem and is
solved by approximation algorithms of
minimum-weight dominating set and directed
Steiner tree
1.
7. THE FRAMEWORK OVERVIEW
generated storyline should be a graph structure
Node is labeled by a summary
Edge represents causal relationship between two
phases
Offline layer
Online layers
8. THE RETRIEVAL MODEL
Preliminaries
the original query is usually short and vague
Query expansion
In a pseudo relevance manner, suppose the few top
ranked documents d + by the initial query Q builds a
relevant model θ F , we can set the new query to be
a linear combination of original query Q and
relevant model θF
9. THE RETRIEVAL MODEL
Dynamic Pseudo Relevance Feedback
K burst periods
Assume that the prior probability of relevant
document d + is dependent on the distance of td+
to the centroid
of burst periods, denoted as Φ = { φ 1 ··· φ K }
three probability functions to model the effective
range of burst period, decay coefficient and
skewness.
1. Mixture Gaussian Distribution
2. Local Power Distribution
3. Skewed Linear Distribution
10. THE RETRIEVAL MODEL
Mixture Gaussian Distribution
Local Power Distribution
Skewed Linear Distribution
11. THE RETRIEVAL MODEL
Burst Period Detection
appear more frequently than usual
2. be continuously frequent around the time point.
detect burst periods of the event by
1. for each query term, finding the time intervals
with arbitrary length in which the query term
appears constantly frequent;
2. picking the time points within these intervals
with the
largest sum of frequencies over all query terms.
1.
12. THE RETRIEVAL MODEL
“bursty score”
find time interval Tw,j = <st, et, LS, RS> with the
maximal cumulative burst score B ( w, Tw,j )
Compute the score of any query term q at each
time point
Rank each time point by ∑q∈QH ( q,t )and choose
the largest K time point φk .
13. STORYLINE GENERATION
Representative tweets
2. Depict the evolving structure of the event
3. an optimistic connection
a multi-view tweet graph is constructed
a minimum dominant set on the tweet graph
a minimum steiner tree
1.
14. STORYLINE GENERATION
three non negative real parameters α, τ1, τ2 , τ1<
τ2 .
define E : text similarity > α
define A : τ1 ≤ t j − t i ≤ τ2
w(vi ) = 1 − score ( Q,vi ).
15. STORYLINE GENERATION
A subset S of the vertex set of an undirected
graph is a
dominating set if for each vertex u ,either u is in
S or is adjacent to a vertex in S .
17. STORYLINE GENERATION
A Steiner tree of a graph G with respect to a
vertex subset S is the edge-induced sub-tree of G
that contains all the vertices of S having the
minimum total cost, where the cost is
the total weight of the vertices.
27. CONCLUSION
The proposed dynamic pseudo relevance
feedback model
minimum weighted Steiner tree on a dominant set
充分的实验
28. OMG, I Have to Tweet That!
A Study of Factors that Influence Tweet
Rates
29. Abstract
key limitation :
it depends on people self reporting their own
behaviors
and observations.
a large scale quantitative analysis of some of
the factors that influence self reporting bias.
the daily variations in tweet rates about weather
events
30. Introduction
treating social media as a signal to measure the
relative real-world occurrence of events
critical challenge :
the bias introduced by the self-reported nature of
social media
What is it about an event that makes it more or
less “tweetable”?
A first large-scale, quantitative analysis of some
of the factors that influence self-reporting bias by
comparing a year of tweets about weather
events in cities across the United States and
Canada to ground-truth knowledge about actual
weather occurrences.
31. Introduction
three potential factors :
How extreme is the weather?
2. How expected is the weather given the time-ofyear?
3. How much did the weather change?
1.
32. Data Preparation
Jun 1, 2010 and Jun 30, 2011
56 different metropolitan areas
historical weather data provided by the National
Oceanic and Atmospheric Administration of the
United States.
33. Identifying Weather-related Tweets
discovering the rate of weather-related tweets
that occurred per-day across metropolitan areas
1. filtering the full archive of tweets for tweets that
contain at least 1 weather-related word from a
list of 179 weather-related words and phrases
2. build a classifier for weather-related tweets
34. a simple classifier that estimates the probability
of a tweet being weather related as
35. Identifying the Location of Tweets
geo-coded
the textual user- provided location field in a user’s
Twitter
profile
normalize the textual
arbitrary user-provided location information into
concrete
geo-coded coordinates
a mapping from user-provided location fields to
latitude-longitude coordinates.
2. merge location fields with similar geo-mappings
together to create clusters for roughly metropolitansized areas
1.
37. Historical Weather Data
calculate daily summaries
For each daily summary of weather data at a
location:
Expectation: how normal the observed weather
is at a location
Extremeness : how extreme the weather is on a
particular day
Change: how different the observed weather data
is from previous days’ weather
41. Analysis and Results
Correlating Expectation and Tweet Rates
expectation measure adds little information about
likely tweet rates beyond what is already
contained in basic weather data
Correlating Extremeness and Tweet Rates
extremeness can independently explain more of
the variation in weather-related tweet rates than
basic weather alone
Correlating Delta Change and Tweet Rates
there is little difference in the amount of
information gained from building these deltachange models
Combining Extremeness, Expectation, and
Delta
Change Models
44. Discussion
Additional Factors Likely to Effect Tweet
Rates
Sentiment
Privacy concerns, embarrassments and safety:
Population segments :
Mobile devices
Time-of-Day, day-of-week, holiday, and other
effects of time:
45. Conclusions
the correlation between daily tweet
rates and the expectation, extremeness, and the
change in
observed weather.
global models
location-specific models
Extremeness>change>expectation