SlideShare a Scribd company logo
1 of 45
Generating Event Storylinesfrom
Microblogs
CIKM’12
ABSTRACT
 we explore the problem of generating storylines

from microblogs for user input queries.
 Given a query of an ongoing event, we propose
to sketch the real-time storyline of the event by a
two-level solution.
1. propose a language model with dynamic
pseudo relevance feedback to obtain relevant
tweets
2. Generate storylines via graph optimization
INTRODUCTION
 Generating Event Storyline from Microblogs

(GESM)
INTRODUCTION
 differences between GESM and prior studies:

Well edited facts ---- short noisy text
2. GESM provides personalized service
3. A two-level framework is necessary: at the low
level, finding all relevant tweets through the
time-line of the event by a retrieve model; and
at the high level, summarizing relevant tweets
and the latent structure to produce a storyline.
1.
INTRODUCTION
 Challenges

1、the dynamic and sparse nature of microblogs
——How to match the underlying event expressed
by the vague event query to potential relevant
tweets which possibly not contain any query terms
2、Numerous duplicate tweets and direct and
undirect re-tweets
INTRODUCTION
 contributions

generating event storylines from microblogs
2. A dynamic pseudo relevance feedback (DPRF)
language
model
3. a graph-based optimization problem and is
solved by approximation algorithms of
minimum-weight dominating set and directed
Steiner tree
1.
THE FRAMEWORK OVERVIEW
 generated storyline should be a graph structure
 Node is labeled by a summary
 Edge represents causal relationship between two

phases
 Offline layer
 Online layers
THE RETRIEVAL MODEL
 Preliminaries
 the original query is usually short and vague
 Query expansion
 In a pseudo relevance manner, suppose the few top

ranked documents d + by the initial query Q builds a
relevant model θ F , we can set the new query to be
a linear combination of original query Q and
relevant model θF
THE RETRIEVAL MODEL
 Dynamic Pseudo Relevance Feedback
 K burst periods
 Assume that the prior probability of relevant

document d + is dependent on the distance of td+
to the centroid
of burst periods, denoted as Φ = { φ 1 ··· φ K }
 three probability functions to model the effective
range of burst period, decay coefficient and
skewness.
1. Mixture Gaussian Distribution
2. Local Power Distribution
3. Skewed Linear Distribution
THE RETRIEVAL MODEL
 Mixture Gaussian Distribution

 Local Power Distribution

 Skewed Linear Distribution
THE RETRIEVAL MODEL
 Burst Period Detection

appear more frequently than usual
2. be continuously frequent around the time point.
 detect burst periods of the event by
1. for each query term, finding the time intervals
with arbitrary length in which the query term
appears constantly frequent;
2. picking the time points within these intervals
with the
largest sum of frequencies over all query terms.
1.
THE RETRIEVAL MODEL
 “bursty score”

 find time interval Tw,j = <st, et, LS, RS> with the

maximal cumulative burst score B ( w, Tw,j )
 Compute the score of any query term q at each

time point

 Rank each time point by ∑q∈QH ( q,t )and choose

the largest K time point φk .
STORYLINE GENERATION
Representative tweets
2. Depict the evolving structure of the event
3. an optimistic connection
 a multi-view tweet graph is constructed
 a minimum dominant set on the tweet graph
 a minimum steiner tree
1.
STORYLINE GENERATION

 three non negative real parameters α, τ1, τ2 , τ1<

τ2 .
 define E : text similarity > α
 define A : τ1 ≤ t j − t i ≤ τ2
 w(vi ) = 1 − score ( Q,vi ).
STORYLINE GENERATION
 A subset S of the vertex set of an undirected

graph is a
dominating set if for each vertex u ,either u is in
S or is adjacent to a vertex in S .
STORYLINE GENERATION
 greedy algorithm
STORYLINE GENERATION
 A Steiner tree of a graph G with respect to a

vertex subset S is the edge-induced sub-tree of G
that contains all the vertices of S having the
minimum total cost, where the cost is
the total weight of the vertices.
STORYLINE GENERATION
STORYLINE GENERATION
EXPERIMENTS
 Data Set
EXPERIMENTS
 Tweet Retrieval
 49 queries
 evaluation metric :
 precision at top 30 tweets(P@30)
 mean average precision(MAP)

 precision at top 100 tweets(P@100)
 R-precision (R-PREC)
EXPERIMENTS
 Comparative Study
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 Summarization Capability
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 A User Study
CONCLUSION
 The proposed dynamic pseudo relevance

feedback model
 minimum weighted Steiner tree on a dominant set
 充分的实验
OMG, I Have to Tweet That!
A Study of Factors that Influence Tweet
Rates
Abstract
 key limitation :
 it depends on people self reporting their own

behaviors
and observations.
 a large scale quantitative analysis of some of

the factors that influence self reporting bias.
 the daily variations in tweet rates about weather
events
Introduction
 treating social media as a signal to measure the

relative real-world occurrence of events
 critical challenge :
 the bias introduced by the self-reported nature of

social media
 What is it about an event that makes it more or

less “tweetable”?
 A first large-scale, quantitative analysis of some
of the factors that influence self-reporting bias by
comparing a year of tweets about weather
events in cities across the United States and
Canada to ground-truth knowledge about actual
weather occurrences.
Introduction
 three potential factors :

How extreme is the weather?
2. How expected is the weather given the time-ofyear?
3. How much did the weather change?
1.
Data Preparation
 Jun 1, 2010 and Jun 30, 2011
 56 different metropolitan areas
 historical weather data provided by the National

Oceanic and Atmospheric Administration of the
United States.
Identifying Weather-related Tweets
 discovering the rate of weather-related tweets

that occurred per-day across metropolitan areas
1. filtering the full archive of tweets for tweets that
contain at least 1 weather-related word from a
list of 179 weather-related words and phrases
2. build a classifier for weather-related tweets
 a simple classifier that estimates the probability

of a tweet being weather related as
Identifying the Location of Tweets
 geo-coded
 the textual user- provided location field in a user’s

Twitter
profile
 normalize the textual
 arbitrary user-provided location information into

concrete
geo-coded coordinates
a mapping from user-provided location fields to
latitude-longitude coordinates.
2. merge location fields with similar geo-mappings
together to create clusters for roughly metropolitansized areas
1.
Identifying the Location of Tweets
Historical Weather Data
 calculate daily summaries
 For each daily summary of weather data at a

location:
 Expectation: how normal the observed weather
is at a location
 Extremeness : how extreme the weather is on a
particular day
 Change: how different the observed weather data
is from previous days’ weather
Analysis and Results
 Tweet Rates and Weather Reports
Analysis and Results
 Linear Regression
 the relationship between a set of weather-derived

features and the daily rate of weather-related
tweets
Analysis and Results
 Correlating Basic Weather Data and Tweet

Rates
Analysis and Results
 Correlating Expectation and Tweet Rates
 expectation measure adds little information about

likely tweet rates beyond what is already
contained in basic weather data
 Correlating Extremeness and Tweet Rates
 extremeness can independently explain more of
the variation in weather-related tweet rates than
basic weather alone
 Correlating Delta Change and Tweet Rates
 there is little difference in the amount of
information gained from building these deltachange models
 Combining Extremeness, Expectation, and
Delta
Change Models
Analysis and Results
 Per-Location Models
Discussion
 Additional Factors Likely to Effect Tweet

Rates
 Sentiment
 Privacy concerns, embarrassments and safety:
 Population segments :
 Mobile devices
 Time-of-Day, day-of-week, holiday, and other
effects of time:
Conclusions
 the correlation between daily tweet

rates and the expectation, extremeness, and the
change in
observed weather.
 global models
 location-specific models
 Extremeness>change>expectation

More Related Content

Viewers also liked

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twittermoresmile
 
Презентация проекта MegaStrahovka.ru
Презентация проекта MegaStrahovka.ru Презентация проекта MegaStrahovka.ru
Презентация проекта MegaStrahovka.ru Denis Aristov
 
Презентация CarScan24.ru
Презентация CarScan24.ruПрезентация CarScan24.ru
Презентация CarScan24.ruDenis Aristov
 
When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enoughmoresmile
 
Modul pengukuran. aliran fluida.
Modul   pengukuran. aliran fluida.Modul   pengukuran. aliran fluida.
Modul pengukuran. aliran fluida.bacukids
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthmoresmile
 
Accounting principles
Accounting principlesAccounting principles
Accounting principlescantaboop
 
Doppler effect experiment and applications
Doppler effect experiment and applicationsDoppler effect experiment and applications
Doppler effect experiment and applicationsmarina fayez
 

Viewers also liked (13)

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Doppler
DopplerDoppler
Doppler
 
HSI_Intro_Short
HSI_Intro_ShortHSI_Intro_Short
HSI_Intro_Short
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
 
Презентация проекта MegaStrahovka.ru
Презентация проекта MegaStrahovka.ru Презентация проекта MegaStrahovka.ru
Презентация проекта MegaStrahovka.ru
 
Презентация CarScan24.ru
Презентация CarScan24.ruПрезентация CarScan24.ru
Презентация CarScan24.ru
 
Presentation2
Presentation2Presentation2
Presentation2
 
When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enough
 
Modul pengukuran. aliran fluida.
Modul   pengukuran. aliran fluida.Modul   pengukuran. aliran fluida.
Modul pengukuran. aliran fluida.
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
 
Accounting principles
Accounting principlesAccounting principles
Accounting principles
 
Doppler effect experiment and applications
Doppler effect experiment and applicationsDoppler effect experiment and applications
Doppler effect experiment and applications
 

Similar to Generating event storylines from microblogs

(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2wzuri
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinMinchao Lin
 
Question 1 The Sydney Harbour Bridge is a well known ic.docx
Question 1  The Sydney Harbour Bridge is a well known ic.docxQuestion 1  The Sydney Harbour Bridge is a well known ic.docx
Question 1 The Sydney Harbour Bridge is a well known ic.docxamrit47
 
Design of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic SystemsDesign of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic Systemsijtsrd
 
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...Power System Operation
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentNiccolò Tubini
 
Cyclone storm prediction using knn
Cyclone storm prediction using knnCyclone storm prediction using knn
Cyclone storm prediction using knnpriya veeramani
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsElinor Velasquez
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesaiQUANT
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionMohammad
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...tksakaki
 
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...Alfonso Crisci
 
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...Nurfadhlina Mohd Sharef
 

Similar to Generating event storylines from microblogs (20)

Answers
AnswersAnswers
Answers
 
Climate Extremes Workshop - The Dependence Between Extreme Precipitation and...
Climate Extremes Workshop -  The Dependence Between Extreme Precipitation and...Climate Extremes Workshop -  The Dependence Between Extreme Precipitation and...
Climate Extremes Workshop - The Dependence Between Extreme Precipitation and...
 
(8) Lesson 9.2
(8) Lesson 9.2(8) Lesson 9.2
(8) Lesson 9.2
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
Ax4301259274
Ax4301259274Ax4301259274
Ax4301259274
 
Question 1 The Sydney Harbour Bridge is a well known ic.docx
Question 1  The Sydney Harbour Bridge is a well known ic.docxQuestion 1  The Sydney Harbour Bridge is a well known ic.docx
Question 1 The Sydney Harbour Bridge is a well known ic.docx
 
Design of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic SystemsDesign of State Estimator for a Class of Generalized Chaotic Systems
Design of State Estimator for a Class of Generalized Chaotic Systems
 
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...
Big Data Framework for Predictive Risk Assessment of Weather Impacts on Elect...
 
Pakdd
PakddPakdd
Pakdd
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging component
 
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
 
Cyclone storm prediction using knn
Cyclone storm prediction using knnCyclone storm prediction using knn
Cyclone storm prediction using knn
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
 
Wavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX RatesWavelet Multi-resolution Analysis of High Frequency FX Rates
Wavelet Multi-resolution Analysis of High Frequency FX Rates
 
Propagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace ReductionPropagation of Error Bounds due to Active Subspace Reduction
Propagation of Error Bounds due to Active Subspace Reduction
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
 
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
SEVERE WEATHER EVENTS AND SOCIAL MEDIA STREAMS: BIGDATA APPROACH FOR IMPACT M...
 
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...
Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion T...
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Generating event storylines from microblogs

  • 2. ABSTRACT  we explore the problem of generating storylines from microblogs for user input queries.  Given a query of an ongoing event, we propose to sketch the real-time storyline of the event by a two-level solution. 1. propose a language model with dynamic pseudo relevance feedback to obtain relevant tweets 2. Generate storylines via graph optimization
  • 3. INTRODUCTION  Generating Event Storyline from Microblogs (GESM)
  • 4. INTRODUCTION  differences between GESM and prior studies: Well edited facts ---- short noisy text 2. GESM provides personalized service 3. A two-level framework is necessary: at the low level, finding all relevant tweets through the time-line of the event by a retrieve model; and at the high level, summarizing relevant tweets and the latent structure to produce a storyline. 1.
  • 5. INTRODUCTION  Challenges 1、the dynamic and sparse nature of microblogs ——How to match the underlying event expressed by the vague event query to potential relevant tweets which possibly not contain any query terms 2、Numerous duplicate tweets and direct and undirect re-tweets
  • 6. INTRODUCTION  contributions generating event storylines from microblogs 2. A dynamic pseudo relevance feedback (DPRF) language model 3. a graph-based optimization problem and is solved by approximation algorithms of minimum-weight dominating set and directed Steiner tree 1.
  • 7. THE FRAMEWORK OVERVIEW  generated storyline should be a graph structure  Node is labeled by a summary  Edge represents causal relationship between two phases  Offline layer  Online layers
  • 8. THE RETRIEVAL MODEL  Preliminaries  the original query is usually short and vague  Query expansion  In a pseudo relevance manner, suppose the few top ranked documents d + by the initial query Q builds a relevant model θ F , we can set the new query to be a linear combination of original query Q and relevant model θF
  • 9. THE RETRIEVAL MODEL  Dynamic Pseudo Relevance Feedback  K burst periods  Assume that the prior probability of relevant document d + is dependent on the distance of td+ to the centroid of burst periods, denoted as Φ = { φ 1 ··· φ K }  three probability functions to model the effective range of burst period, decay coefficient and skewness. 1. Mixture Gaussian Distribution 2. Local Power Distribution 3. Skewed Linear Distribution
  • 10. THE RETRIEVAL MODEL  Mixture Gaussian Distribution  Local Power Distribution  Skewed Linear Distribution
  • 11. THE RETRIEVAL MODEL  Burst Period Detection appear more frequently than usual 2. be continuously frequent around the time point.  detect burst periods of the event by 1. for each query term, finding the time intervals with arbitrary length in which the query term appears constantly frequent; 2. picking the time points within these intervals with the largest sum of frequencies over all query terms. 1.
  • 12. THE RETRIEVAL MODEL  “bursty score”  find time interval Tw,j = <st, et, LS, RS> with the maximal cumulative burst score B ( w, Tw,j )  Compute the score of any query term q at each time point  Rank each time point by ∑q∈QH ( q,t )and choose the largest K time point φk .
  • 13. STORYLINE GENERATION Representative tweets 2. Depict the evolving structure of the event 3. an optimistic connection  a multi-view tweet graph is constructed  a minimum dominant set on the tweet graph  a minimum steiner tree 1.
  • 14. STORYLINE GENERATION  three non negative real parameters α, τ1, τ2 , τ1< τ2 .  define E : text similarity > α  define A : τ1 ≤ t j − t i ≤ τ2  w(vi ) = 1 − score ( Q,vi ).
  • 15. STORYLINE GENERATION  A subset S of the vertex set of an undirected graph is a dominating set if for each vertex u ,either u is in S or is adjacent to a vertex in S .
  • 17. STORYLINE GENERATION  A Steiner tree of a graph G with respect to a vertex subset S is the edge-induced sub-tree of G that contains all the vertices of S having the minimum total cost, where the cost is the total weight of the vertices.
  • 21. EXPERIMENTS  Tweet Retrieval  49 queries  evaluation metric :  precision at top 30 tweets(P@30)  mean average precision(MAP)  precision at top 100 tweets(P@100)  R-precision (R-PREC)
  • 27. CONCLUSION  The proposed dynamic pseudo relevance feedback model  minimum weighted Steiner tree on a dominant set  充分的实验
  • 28. OMG, I Have to Tweet That! A Study of Factors that Influence Tweet Rates
  • 29. Abstract  key limitation :  it depends on people self reporting their own behaviors and observations.  a large scale quantitative analysis of some of the factors that influence self reporting bias.  the daily variations in tweet rates about weather events
  • 30. Introduction  treating social media as a signal to measure the relative real-world occurrence of events  critical challenge :  the bias introduced by the self-reported nature of social media  What is it about an event that makes it more or less “tweetable”?  A first large-scale, quantitative analysis of some of the factors that influence self-reporting bias by comparing a year of tweets about weather events in cities across the United States and Canada to ground-truth knowledge about actual weather occurrences.
  • 31. Introduction  three potential factors : How extreme is the weather? 2. How expected is the weather given the time-ofyear? 3. How much did the weather change? 1.
  • 32. Data Preparation  Jun 1, 2010 and Jun 30, 2011  56 different metropolitan areas  historical weather data provided by the National Oceanic and Atmospheric Administration of the United States.
  • 33. Identifying Weather-related Tweets  discovering the rate of weather-related tweets that occurred per-day across metropolitan areas 1. filtering the full archive of tweets for tweets that contain at least 1 weather-related word from a list of 179 weather-related words and phrases 2. build a classifier for weather-related tweets
  • 34.  a simple classifier that estimates the probability of a tweet being weather related as
  • 35. Identifying the Location of Tweets  geo-coded  the textual user- provided location field in a user’s Twitter profile  normalize the textual  arbitrary user-provided location information into concrete geo-coded coordinates a mapping from user-provided location fields to latitude-longitude coordinates. 2. merge location fields with similar geo-mappings together to create clusters for roughly metropolitansized areas 1.
  • 37. Historical Weather Data  calculate daily summaries  For each daily summary of weather data at a location:  Expectation: how normal the observed weather is at a location  Extremeness : how extreme the weather is on a particular day  Change: how different the observed weather data is from previous days’ weather
  • 38. Analysis and Results  Tweet Rates and Weather Reports
  • 39. Analysis and Results  Linear Regression  the relationship between a set of weather-derived features and the daily rate of weather-related tweets
  • 40. Analysis and Results  Correlating Basic Weather Data and Tweet Rates
  • 41. Analysis and Results  Correlating Expectation and Tweet Rates  expectation measure adds little information about likely tweet rates beyond what is already contained in basic weather data  Correlating Extremeness and Tweet Rates  extremeness can independently explain more of the variation in weather-related tweet rates than basic weather alone  Correlating Delta Change and Tweet Rates  there is little difference in the amount of information gained from building these deltachange models  Combining Extremeness, Expectation, and Delta Change Models
  • 42. Analysis and Results  Per-Location Models
  • 43.
  • 44. Discussion  Additional Factors Likely to Effect Tweet Rates  Sentiment  Privacy concerns, embarrassments and safety:  Population segments :  Mobile devices  Time-of-Day, day-of-week, holiday, and other effects of time:
  • 45. Conclusions  the correlation between daily tweet rates and the expectation, extremeness, and the change in observed weather.  global models  location-specific models  Extremeness>change>expectation