SlideShare a Scribd company logo
1 of 54
Event
Summarization
using Tweets
Deepayan Chakrabarti and
KunalPunera
Yahoo!Research
Abstract
 For

some highly structured and recurring
events, such as sports, it is better to use more
sophisticated techniques to summarize the
relevant tweets.
 A solution based on learning the underlying
hidden state representation of the event via
Hidden Markov Models.
Introduction
 one-shot

events
 Have “structure” or are long-running
 (a)the most recent tweets could
be repeating the same information about
the event
 (b)most users would be interested in a
summary of the occurrences in the game
so far.
Introduction
 Our

goal:to extract a few tweets that
best describe the chain of interesting
occurrences in that event

A
1.
2.

two-step process:
Segment the event time-line
pick key tweets to describe each
segment
Introduction
 challenges

:
 Events are typically “bursty”
 Separate sub-events may not be temporally
far apart
 Previous instances of similar events are
available.
 Tweets are noisy
 Strong empirical results.
Characteristics of Sports Coverage
in Tweets
Characteristics of Sports
Coverage in Tweets
Characteristics of Sports
Coverage in Tweets
 Some
1.
2.

issues of this data:
sub-events are marked by increased
frequency of tweets.
Boundaries of sub-events also result in a
change in vocabulary of tweets.
Algorithms
 Baseline:

SUMMALLTEXT
 associate with each tweet a vector of the
TF-logIDF of its constituent words
 Cosine distance
 Select those tweets which are closest to
all other tweets from theevent.
Algorithms
Algorithms
 Several
1.
2.

defects:
O ( |Z|2) computations
heavily biased towards the most popular
sub-event
Algorithms
 Baseline:
1.
2.

SUMMTIMEINT
Split up the duration into equal-sized
time intervals
Select the key tweets from each interval

 Two
1.
2.

extra parameters:
a segmentation TS of the duration of the
event into equal-time windows
the minimum activity threshold l
Algorithms
Algorithms
 Defects:

Burstiness of tweet volume:
 Multiple sub-events in the same burst:
 “Cold Start” :

Algorithms
 Our

Approach: SUMMHMM
 BACKGROUND ON HMMS:
 N states labeled S1 ,…, SN ,
 A set of observation symbols v1 ,…, vM
 bi(k)
 a ij
πi
Algorithms
 Each

state: one class of sub-events
 The symbols: the words used in tweets
 The variation in symbol probabilities
across different states: the different
“language models” used by the Twitter
users
 The transitions between states models the
chain of sub-events over time
Algorithms
 Our

Modifications
 OUTPUTS PER TIME – STEP: a multiset of
symbols
 DETECTING BURSTS IN TWEET VOLUME:
 COMBINING INFORMATION FROM
MULTIPLE EVENTS
Algorithms
 three

sets of symbol probabilities:
 (1)θ( s ) , which is specific to each state but
is the same for all events,
 (2) θ( sg ) , which is specific to a particular
state for a particular game
 (3) θ( bg ) , which is a background
distribution of symbols over all states
and games.
Algorithms
 Algorithm

Summary
 Input: multiple events of the same type
 Learns the model parameters that bestfit
the data. (EM algorithm)
 the optimal segmentation (standard V
iterbi algorithm)
Algorithms
 standard

Viterbi algorithm:
Algorithms
Experiments
 Experimental

Setup
 professional American Football
 Sep 12th, 2010 to Jan 24th, 2011
 over 440K tweets over 150 games for an
average of around 1760 tweets per
game.
Experiments
 MANUAL

GROUND TRUTH CONSTRUCTION .
 Each output tweet was matched with the
happenings in the game and labeled as
Comment-Play , Comment-Game , or
Comment-General .
Experiments
 Play-by-Play

Performance
 RECALL
 PRECISION

Summary Construction
 EVALUATION

AT OPERATING POINT .
conclusion
 We

proposed an approach based on
learning an underlying hidden state
representation of an event .
Towards Twitter
Context
Summarization
with User
Influence
Models
ABSTRACT
 Traditional

summarization techniques only
consider text information.
 We study how user influence
models, which project user interaction
information onto a Twitter context
tree, can help Twitter context
summarization within a supervised
learning framework.
INTRODUCTION
A

Twitter context tree is defined as a tree
structure of tweets which are connected
with reply relationship, and the root of a
context tree is its original tweet.
 two types of user influence models, called
pair-wise user influence model and global
user influence model.
 Granger Causality influence model
 PageRank algorithm
TWITTER CONTEXT TREE ANALYSIS
 The

temporal growth of the Twitter
context tree
TWITTER CONTEXT TREE ANALYSIS
 Whether

the tree structure can help
the summarization task
USER INFLUENCE MODELS
 Granger

Causality Influence Model
 A time series data x is to Granger cause
another time series data y ,If and only if
regressing for y in terms of both past
values of y and x is statistically significantly
more accurate than regressing for y in
terms of past values of y only. Let
USER INFLUENCE MODELS


Lasso-Granger method



Lag ( X,T )to denote the lagged version of
data X ;
FullyConnectedFeatureGraph ( X ) denotes
the fully connected graph defined over the
features;
Lasso ( y, Xlag )denotes the set of temporal
variables receiving a non-zero co-efficient by
the Lasso algorithm.




USER INFLUENCE MODELS
 Pagerank

Influence Model
 For each user u , it has a directed edge to
each user v if u has a reply or a retweet to
v ’s tweet and we can have a global user
graph G .
SUMMARIZATION METHOD
 Text-based
 TFIDF

Signals
SUMMARIZATION METHOD
 Popularity

Signals
 Number of replies, number of retweets,
and number of followers for a given
tweet’s author.
SUMMARIZATION METHOD
 Temporal
1.
2.

Signals
fit the age of tweets in a context tree
into an exponential distribution.
for each tweet, we compute its
temporal signal as the likelihood of
sampling its age from the fitted
exponential distribution.
Supervised Learning Framework
 Gradient

algorithm

Boosted Decision Tree(GBDT)
EDITORIAL DATA SET
 10

Twitter context trees from March 7th
to March 20th,2011
 4 are initiated by Lady Gaga
 6 are initiated by Justin Bieber
1. read the root tweet
2. Scans through all candidate tweets
3. Selects 5 to 10 tweets
EDITORIAL DATA SET
EXPERIMENTS
 Evaluation

Metrics
Methods for Comparison













Centroid:
SimToRoot:
Linear:
Mead:
LexRank
SVD:
ContentOnly
ContentAttribute:
AllNoGranger:
All:
Experimental Results
 Overall

Comparison
CONCLUSION
 User

influence information is very helpful to
generate a high quality summary for each
Twitter context tree.
 All signals are converted into features, and
we cast Twitter context summarization into a
supervised learning problem.

More Related Content

What's hot

Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
sangasandeep
 
Hop by hop message authentication chapter 1
Hop by hop message authentication chapter 1Hop by hop message authentication chapter 1
Hop by hop message authentication chapter 1
Selva Raj
 

What's hot (13)

529 199-206
529 199-206529 199-206
529 199-206
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
 
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
 
Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...
Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...
Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...
 
Review on key predistribution schemes in wireless sensor networks
Review on key predistribution schemes in wireless sensor networksReview on key predistribution schemes in wireless sensor networks
Review on key predistribution schemes in wireless sensor networks
 
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social MediaCrowdsourcing the Annotation of Rumourous Conversations in Social Media
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
 
Identifying Emotions in Tweets related to the Brazilian Stock Market
Identifying Emotions in Tweets related to the Brazilian Stock MarketIdentifying Emotions in Tweets related to the Brazilian Stock Market
Identifying Emotions in Tweets related to the Brazilian Stock Market
 
A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...
A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...
A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
Hop by hop message authentication chapter 1
Hop by hop message authentication chapter 1Hop by hop message authentication chapter 1
Hop by hop message authentication chapter 1
 
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 

Similar to Event summarization using tweets

Questions about questions
Questions about questionsQuestions about questions
Questions about questions
moresmile
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
1crore projects
 
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams
1crore projects
 

Similar to Event summarization using tweets (20)

Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
Real time twitter trend mining system – rt2 m
Real time twitter trend mining system – rt2 mReal time twitter trend mining system – rt2 m
Real time twitter trend mining system – rt2 m
 
Tcat
TcatTcat
Tcat
 
Lsu tcat
Lsu tcatLsu tcat
Lsu tcat
 
paper_148.pptx
paper_148.pptxpaper_148.pptx
paper_148.pptx
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor Agent
 
Tweet Cloud
Tweet CloudTweet Cloud
Tweet Cloud
 
Structural Analysis of Hacktivism on Twitter
Structural Analysis of Hacktivism on TwitterStructural Analysis of Hacktivism on Twitter
Structural Analysis of Hacktivism on Twitter
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
 
On Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet StreamsOn Summarization and Timeline Generation for Evolutionary Tweet Streams
On Summarization and Timeline Generation for Evolutionary Tweet Streams
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
 
One Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataOne Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social Metadata
 

More from moresmile

When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enough
moresmile
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
moresmile
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
moresmile
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networks
moresmile
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
moresmile
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
moresmile
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
moresmile
 

More from moresmile (7)

When relevance is not enough
When relevance is not enoughWhen relevance is not enough
When relevance is not enough
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networks
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Event summarization using tweets