Generating Event Storylinesfrom
Microblogs
CIKM’12
ABSTRACT
 we explore the problem of generating storylines

from microblogs for user input queries.
 Given a query of an ...
INTRODUCTION
 Generating Event Storyline from Microblogs

(GESM)
INTRODUCTION
 differences between GESM and prior studies:

Well edited facts ---- short noisy text
2. GESM provides perso...
INTRODUCTION
 Challenges

1、the dynamic and sparse nature of microblogs
——How to match the underlying event expressed
by ...
INTRODUCTION
 contributions

generating event storylines from microblogs
2. A dynamic pseudo relevance feedback (DPRF)
la...
THE FRAMEWORK OVERVIEW
 generated storyline should be a graph structure
 Node is labeled by a summary
 Edge represents ...
THE RETRIEVAL MODEL
 Preliminaries
 the original query is usually short and vague
 Query expansion
 In a pseudo releva...
THE RETRIEVAL MODEL
 Dynamic Pseudo Relevance Feedback
 K burst periods
 Assume that the prior probability of relevant
...
THE RETRIEVAL MODEL
 Mixture Gaussian Distribution

 Local Power Distribution

 Skewed Linear Distribution
THE RETRIEVAL MODEL
 Burst Period Detection

appear more frequently than usual
2. be continuously frequent around the tim...
THE RETRIEVAL MODEL
 “bursty score”

 find time interval Tw,j = <st, et, LS, RS> with the

maximal cumulative burst scor...
STORYLINE GENERATION
Representative tweets
2. Depict the evolving structure of the event
3. an optimistic connection
 a m...
STORYLINE GENERATION

 three non negative real parameters α, τ1, τ2 , τ1<

τ2 .
 define E : text similarity > α
 define ...
STORYLINE GENERATION
 A subset S of the vertex set of an undirected

graph is a
dominating set if for each vertex u ,eith...
STORYLINE GENERATION
 greedy algorithm
STORYLINE GENERATION
 A Steiner tree of a graph G with respect to a

vertex subset S is the edge-induced sub-tree of G
th...
STORYLINE GENERATION
STORYLINE GENERATION
EXPERIMENTS
 Data Set
EXPERIMENTS
 Tweet Retrieval
 49 queries
 evaluation metric :
 precision at top 30 tweets(P@30)
 mean average precisi...
EXPERIMENTS
 Comparative Study
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 Summarization Capability
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 A User Study
CONCLUSION
 The proposed dynamic pseudo relevance

feedback model
 minimum weighted Steiner tree on a dominant set
 充分的...
OMG, I Have to Tweet That!
A Study of Factors that Influence Tweet
Rates
Abstract
 key limitation :
 it depends on people self reporting their own

behaviors
and observations.
 a large scale q...
Introduction
 treating social media as a signal to measure the

relative real-world occurrence of events
 critical chall...
Introduction
 three potential factors :

How extreme is the weather?
2. How expected is the weather given the time-ofyear...
Data Preparation
 Jun 1, 2010 and Jun 30, 2011
 56 different metropolitan areas
 historical weather data provided by th...
Identifying Weather-related Tweets
 discovering the rate of weather-related tweets

that occurred per-day across metropol...
 a simple classifier that estimates the probability

of a tweet being weather related as
Identifying the Location of Tweets
 geo-coded
 the textual user- provided location field in a user’s

Twitter
profile
 ...
Identifying the Location of Tweets
Historical Weather Data
 calculate daily summaries
 For each daily summary of weather data at a

location:
 Expectation...
Analysis and Results
 Tweet Rates and Weather Reports
Analysis and Results
 Linear Regression
 the relationship between a set of weather-derived

features and the daily rate ...
Analysis and Results
 Correlating Basic Weather Data and Tweet

Rates
Analysis and Results
 Correlating Expectation and Tweet Rates
 expectation measure adds little information about

likely...
Analysis and Results
 Per-Location Models
Discussion
 Additional Factors Likely to Effect Tweet

Rates
 Sentiment
 Privacy concerns, embarrassments and safety:
...
Conclusions
 the correlation between daily tweet

rates and the expectation, extremeness, and the
change in
observed weat...
Generating event storylines from microblogs
Upcoming SlideShare
Loading in …5
×

Generating event storylines from microblogs

399 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
399
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Generating event storylines from microblogs

  1. 1. Generating Event Storylinesfrom Microblogs CIKM’12
  2. 2. ABSTRACT  we explore the problem of generating storylines from microblogs for user input queries.  Given a query of an ongoing event, we propose to sketch the real-time storyline of the event by a two-level solution. 1. propose a language model with dynamic pseudo relevance feedback to obtain relevant tweets 2. Generate storylines via graph optimization
  3. 3. INTRODUCTION  Generating Event Storyline from Microblogs (GESM)
  4. 4. INTRODUCTION  differences between GESM and prior studies: Well edited facts ---- short noisy text 2. GESM provides personalized service 3. A two-level framework is necessary: at the low level, finding all relevant tweets through the time-line of the event by a retrieve model; and at the high level, summarizing relevant tweets and the latent structure to produce a storyline. 1.
  5. 5. INTRODUCTION  Challenges 1、the dynamic and sparse nature of microblogs ——How to match the underlying event expressed by the vague event query to potential relevant tweets which possibly not contain any query terms 2、Numerous duplicate tweets and direct and undirect re-tweets
  6. 6. INTRODUCTION  contributions generating event storylines from microblogs 2. A dynamic pseudo relevance feedback (DPRF) language model 3. a graph-based optimization problem and is solved by approximation algorithms of minimum-weight dominating set and directed Steiner tree 1.
  7. 7. THE FRAMEWORK OVERVIEW  generated storyline should be a graph structure  Node is labeled by a summary  Edge represents causal relationship between two phases  Offline layer  Online layers
  8. 8. THE RETRIEVAL MODEL  Preliminaries  the original query is usually short and vague  Query expansion  In a pseudo relevance manner, suppose the few top ranked documents d + by the initial query Q builds a relevant model θ F , we can set the new query to be a linear combination of original query Q and relevant model θF
  9. 9. THE RETRIEVAL MODEL  Dynamic Pseudo Relevance Feedback  K burst periods  Assume that the prior probability of relevant document d + is dependent on the distance of td+ to the centroid of burst periods, denoted as Φ = { φ 1 ··· φ K }  three probability functions to model the effective range of burst period, decay coefficient and skewness. 1. Mixture Gaussian Distribution 2. Local Power Distribution 3. Skewed Linear Distribution
  10. 10. THE RETRIEVAL MODEL  Mixture Gaussian Distribution  Local Power Distribution  Skewed Linear Distribution
  11. 11. THE RETRIEVAL MODEL  Burst Period Detection appear more frequently than usual 2. be continuously frequent around the time point.  detect burst periods of the event by 1. for each query term, finding the time intervals with arbitrary length in which the query term appears constantly frequent; 2. picking the time points within these intervals with the largest sum of frequencies over all query terms. 1.
  12. 12. THE RETRIEVAL MODEL  “bursty score”  find time interval Tw,j = <st, et, LS, RS> with the maximal cumulative burst score B ( w, Tw,j )  Compute the score of any query term q at each time point  Rank each time point by ∑q∈QH ( q,t )and choose the largest K time point φk .
  13. 13. STORYLINE GENERATION Representative tweets 2. Depict the evolving structure of the event 3. an optimistic connection  a multi-view tweet graph is constructed  a minimum dominant set on the tweet graph  a minimum steiner tree 1.
  14. 14. STORYLINE GENERATION  three non negative real parameters α, τ1, τ2 , τ1< τ2 .  define E : text similarity > α  define A : τ1 ≤ t j − t i ≤ τ2  w(vi ) = 1 − score ( Q,vi ).
  15. 15. STORYLINE GENERATION  A subset S of the vertex set of an undirected graph is a dominating set if for each vertex u ,either u is in S or is adjacent to a vertex in S .
  16. 16. STORYLINE GENERATION  greedy algorithm
  17. 17. STORYLINE GENERATION  A Steiner tree of a graph G with respect to a vertex subset S is the edge-induced sub-tree of G that contains all the vertices of S having the minimum total cost, where the cost is the total weight of the vertices.
  18. 18. STORYLINE GENERATION
  19. 19. STORYLINE GENERATION
  20. 20. EXPERIMENTS  Data Set
  21. 21. EXPERIMENTS  Tweet Retrieval  49 queries  evaluation metric :  precision at top 30 tweets(P@30)  mean average precision(MAP)  precision at top 100 tweets(P@100)  R-precision (R-PREC)
  22. 22. EXPERIMENTS  Comparative Study
  23. 23. EXPERIMENTS  Parameter Tuning
  24. 24. EXPERIMENTS  Summarization Capability
  25. 25. EXPERIMENTS  Parameter Tuning
  26. 26. EXPERIMENTS  A User Study
  27. 27. CONCLUSION  The proposed dynamic pseudo relevance feedback model  minimum weighted Steiner tree on a dominant set  充分的实验
  28. 28. OMG, I Have to Tweet That! A Study of Factors that Influence Tweet Rates
  29. 29. Abstract  key limitation :  it depends on people self reporting their own behaviors and observations.  a large scale quantitative analysis of some of the factors that influence self reporting bias.  the daily variations in tweet rates about weather events
  30. 30. Introduction  treating social media as a signal to measure the relative real-world occurrence of events  critical challenge :  the bias introduced by the self-reported nature of social media  What is it about an event that makes it more or less “tweetable”?  A first large-scale, quantitative analysis of some of the factors that influence self-reporting bias by comparing a year of tweets about weather events in cities across the United States and Canada to ground-truth knowledge about actual weather occurrences.
  31. 31. Introduction  three potential factors : How extreme is the weather? 2. How expected is the weather given the time-ofyear? 3. How much did the weather change? 1.
  32. 32. Data Preparation  Jun 1, 2010 and Jun 30, 2011  56 different metropolitan areas  historical weather data provided by the National Oceanic and Atmospheric Administration of the United States.
  33. 33. Identifying Weather-related Tweets  discovering the rate of weather-related tweets that occurred per-day across metropolitan areas 1. filtering the full archive of tweets for tweets that contain at least 1 weather-related word from a list of 179 weather-related words and phrases 2. build a classifier for weather-related tweets
  34. 34.  a simple classifier that estimates the probability of a tweet being weather related as
  35. 35. Identifying the Location of Tweets  geo-coded  the textual user- provided location field in a user’s Twitter profile  normalize the textual  arbitrary user-provided location information into concrete geo-coded coordinates a mapping from user-provided location fields to latitude-longitude coordinates. 2. merge location fields with similar geo-mappings together to create clusters for roughly metropolitansized areas 1.
  36. 36. Identifying the Location of Tweets
  37. 37. Historical Weather Data  calculate daily summaries  For each daily summary of weather data at a location:  Expectation: how normal the observed weather is at a location  Extremeness : how extreme the weather is on a particular day  Change: how different the observed weather data is from previous days’ weather
  38. 38. Analysis and Results  Tweet Rates and Weather Reports
  39. 39. Analysis and Results  Linear Regression  the relationship between a set of weather-derived features and the daily rate of weather-related tweets
  40. 40. Analysis and Results  Correlating Basic Weather Data and Tweet Rates
  41. 41. Analysis and Results  Correlating Expectation and Tweet Rates  expectation measure adds little information about likely tweet rates beyond what is already contained in basic weather data  Correlating Extremeness and Tweet Rates  extremeness can independently explain more of the variation in weather-related tweet rates than basic weather alone  Correlating Delta Change and Tweet Rates  there is little difference in the amount of information gained from building these deltachange models  Combining Extremeness, Expectation, and Delta Change Models
  42. 42. Analysis and Results  Per-Location Models
  43. 43. Discussion  Additional Factors Likely to Effect Tweet Rates  Sentiment  Privacy concerns, embarrassments and safety:  Population segments :  Mobile devices  Time-of-Day, day-of-week, holiday, and other effects of time:
  44. 44. Conclusions  the correlation between daily tweet rates and the expectation, extremeness, and the change in observed weather.  global models  location-specific models  Extremeness>change>expectation

×