SlideShare a Scribd company logo
1 of 43
Download to read offline
Mining Twitter for real-time trend and information
discovery
Yahoo! Research Barcelona
Arkaitz Zubiaga
NLP & IR Group @ UNED

December 19th, 2011
Motivation

Index

1

Motivation

2

Our Work (I): Classification of Trending Topics

3

Our Work (II): Real-Time Summarization of Events

4

Outlook

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

2 / 43
Motivation

Twitter

Twitter is a microblogging service with over 200 million users.
Users share short messages of up to 140 characters (tweets).

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

3 / 43
Motivation

Twitter: following users

Different from Facebook, following is not reciprocal.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

4 / 43
Motivation

Twitter: retweeting

Retweet: users can help spread tweets by others.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

5 / 43
Motivation

Twitter

Retweeting enables fast spread of messages.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

6 / 43
Motivation

Increase of activity on Twitter

As of October 2011, Twitter received 250 million tweets per day.
Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

7 / 43
Motivation

Variety of Twitter accounts

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

8 / 43
Motivation

Usefulness of Twitter

Twitter provides...
1

...large amounts of data in real-time,

2

from a wide variety of sources,

3

with the ability to spread rapidly.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

9 / 43
Motivation

Twitter’s popularity

Twitter has gained widespread popularity as a tool for...

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

10 / 43
Motivation

Using Twitter for... following events

(1) Live-tweeting about and following events.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

11 / 43
Motivation

Using Twitter for... helping others

(2) Helping others, as in natural disasters.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

12 / 43
Motivation

Using Twitter for... finding out about news

and (3) Finding out about breaking news.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

13 / 43
Motivation

Twitter on the media

Lots of researchers are analyzing tweets.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

14 / 43
Motivation

Trends on Twitter

The news about the Japan earthquake broke on Twitter.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

15 / 43
Motivation

Video: Japan earthquake on Twitter

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

16 / 43
Motivation

Research on Twitter

Most of the research on Twitter focus on the analysis of streams after
they happened.
Very little research deals with the real-time analysis of streams.

Our goal: How can we mine Twitter streams to acquire real-time
knowledge about events and trends?

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

17 / 43
Our Work (I): Classification of Trending Topics

Index

1

Motivation

2

Our Work (I): Classification of Trending Topics

3

Our Work (II): Real-Time Summarization of Events

4

Outlook

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

18 / 43
Our Work (I): Classification of Trending Topics

Trending Topics on Twitter

Trending topics reflect the top conversations being discussed on
Twitter more than usually.
Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

19 / 43
Our Work (I): Classification of Trending Topics

What produces trending topics?

What kinds of events leverage those trending topics?

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

20 / 43
Our Work (I): Classification of Trending Topics

Typology of Trending Topics

News: Japan earthquake.
Current events: a soccer game.
Memes: funny and viral ideas.
Commemoratives: World AIDS Day.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

21 / 43
Our Work (I): Classification of Trending Topics

Goal

Find out the type of a trending topic as soon as it emerges.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

22 / 43
Our Work (I): Classification of Trending Topics

Dataset

1,036 unique trending topics, with up to 1,500 associated
tweets as soon as they trended.
Manual classification of trending topics:
616 current events.
251 memes.
142 news.
27 commemoratives.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

23 / 43
Our Work (I): Classification of Trending Topics

Experiment Settings

Support Vector Machines (one-against-all)
500 trends for the training set.
10 runs.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

24 / 43
Our Work (I): Classification of Trending Topics

Representation of Trending Topics

2 different representation approaches:
Twitter features: 15 straightforward language-independent
features that rely on the social spread of trends.
Bag-of-words: Text of tweets (TF).

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

25 / 43
Our Work (I): Classification of Trending Topics

Results

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

26 / 43
Our Work (I): Classification of Trending Topics

Results

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

27 / 43
Our Work (I): Classification of Trending Topics

Main findings

Trending topics can accurately (78.4%) be categorized using social
features:
Outperforming use of textual content.
Without making use of external data.
In real-time as the trending topic emerges.

Arkaitz Zubiaga, Damiano Spina, V´
ıctor Fresno, and Raquel Mart´
ınez.
2011. Classifying trending topics: a typology of conversation triggers on
Twitter. CIKM 2011.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

28 / 43
Our Work (II): Real-Time Summarization of Events

Index

1

Motivation

2

Our Work (I): Classification of Trending Topics

3

Our Work (II): Real-Time Summarization of Events

4

Outlook

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

29 / 43
Our Work (II): Real-Time Summarization of Events

Events on Twitter

When users live-tweet about events:
They produce vast amounts of tweets about events.
Users want to follow what others say.
Users cannot follow the overwhelming amounts of tweets.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

30 / 43
Our Work (II): Real-Time Summarization of Events

Stream summarization

Can we summarize streams of tweets in such a way that:
Users receive a reduced stream that they can follow?
Users do not miss any key sub-event occurred during the event?

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

31 / 43
Our Work (II): Real-Time Summarization of Events

Study of soccer games

Copa America 2011 (July 1-26, 2011):
26 soccer games.
11k-70k tweets per game.
Tweets are written in 30 languages.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

32 / 43
Our Work (II): Real-Time Summarization of Events

Gold Standard

Live reports gathered from Yahoo! Sports.
Yahoo! journalists provide annotations for:
Goals.
Penalties.
Red Cards.
Disallowed Goals.
Game Starts, Ends, Stops & Resumptions.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

33 / 43
Our Work (II): Real-Time Summarization of Events

Histogram of a Soccer Game

2500

tweet rate

2000

1500

1000

1310864000

1310862000

1310860000

1310858000

1310856000

1310854000

500

time elapsed

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

34 / 43
Our Work (II): Real-Time Summarization of Events

Summarization of soccer games
2-step summarization:
1

Sub-event detection.

2

Tweet selection.

Sub-event
Detection

Tweet
Selection

tweet
tweet
tweet

summary

tweets stream
real-time

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

35 / 43
Our Work (II): Real-Time Summarization of Events

1st Step: Sub-event Detection

Increase [Zhao et al., 2011]: a sub-event occurred when a sudden
increase is given in the tweeting rate (1.7 as much as the previous
rate).
Outliers: learns from audience. High tweeting rates as
compared to rates seen so far will be considered sub-events (90%
percentile).

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

36 / 43
Our Work (II): Real-Time Summarization of Events

1st Step: Results

Increase
Outliers

P
0.29
0.51

R
0.81
0.84

F1
0.41
0.63

#
45.4
25.6

Increase-based approach provides more sub-events, with many FPs
(recall-based).
Outlier-based approach (rather based on outstanding tweeting rates)
improves in P and R.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

37 / 43
Our Work (II): Real-Time Summarization of Events

2nd Step: Tweet Selection

Each term appearing in tweets in a given timeframe is given a weight
according to:
Frequency (TF).
Language Models (KLD).
These weightings enable to choose a representative tweet, as the tweet
with higher value adding up weights of its terms.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

38 / 43
Our Work (II): Real-Time Summarization of Events

2nd Step: Results

es

en

pt

Goals (54)

TF
KLD

0.98
1.00

0.98
1.00

0.98
1.00

Penalties (2)

TF
KLD

1.00
1.00

0.50
0.50

1.00
1.00

Red cards (12)

TF
KLD

0.75
0.92

0.75
0.92

0.83
1.00

Disallowed goals (10)

TF
KLD

0.40
0.40

0.50
0.50

0.40
0.30

Game starts (26)

TF
KLD

0.73
0.84

0.74
0.79

0.79
0.83

Game ends (26)

TF
KLD

1.00
1.00

1.00
1.00

1.00
1.00

Game stops
& resumptions (63)

TF
KLD

0.62
0.68

0.60
0.60

0.57
0.59

Overall

TF
KLD

0.79
0.84

0.74
0.77

0.78
0.82

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

39 / 43
Our Work (II): Real-Time Summarization of Events

Main findings
Use of state-of-the-art text analysis methods generates accurate
summaries:
With precision and recall values above 80% (100% for key
sub-events).
In real-time as the game is being played.
In 3 different languages (es, en, pt).
Without need of external data.

Damiano Spina, Arkaitz Zubiaga, Enrique Amig´, Julio Gonzalo. Towards
o
Real-Time Summarization of Events from Twitter Streams. To Appear.
Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

40 / 43
Outlook

Index

1

Motivation

2

Our Work (I): Classification of Trending Topics

3

Our Work (II): Real-Time Summarization of Events

4

Outlook

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

41 / 43
Outlook

Outlook

Work 1:
Further dig into each type of trending topic, in order to look for
subtypes of trends.

Work 2:
Evaluate the performance of the summarizer on other kinds of
scheduled events (award ceremonies, keynote talks,...)
Evaluate novelty of information garnered from tweets.

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

42 / 43
Outlook

Any Questions?

Arkaitz Zubiaga (UNED)

Real-time mining of Twitter

December 19th, 2011

43 / 43

More Related Content

More from azubiaga

Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...azubiaga
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classificationazubiaga
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Socialesazubiaga
 
Content-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud VisualizationContent-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud Visualizationazubiaga
 
Getting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page ClassificationGetting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page Classificationazubiaga
 
Enhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social TagsEnhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social Tagsazubiaga
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?azubiaga
 
Etiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzeaEtiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzeaazubiaga
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentationazubiaga
 
Tags vs Shelves: From Social Tagging to Social Classification
Tags vs Shelves: From Social Tagging to Social ClassificationTags vs Shelves: From Social Tagging to Social Classification
Tags vs Shelves: From Social Tagging to Social Classificationazubiaga
 

More from azubiaga (10)

Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classification
 
Clasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones SocialesClasificación de Páginas Web con Anotaciones Sociales
Clasificación de Páginas Web con Anotaciones Sociales
 
Content-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud VisualizationContent-based Clustering for Tag Cloud Visualization
Content-based Clustering for Tag Cloud Visualization
 
Getting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page ClassificationGetting the Most Out of Social Annotations for Web Page Classification
Getting the Most Out of Social Annotations for Web Page Classification
 
Enhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social TagsEnhancing Navigation on Wikipedia with Social Tags
Enhancing Navigation on Wikipedia with Social Tags
 
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
 
Etiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzeaEtiketa-lainoen ikuskera hobetzeko multzokatzea
Etiketa-lainoen ikuskera hobetzeko multzokatzea
 
Master thesis presentation
Master thesis presentationMaster thesis presentation
Master thesis presentation
 
Tags vs Shelves: From Social Tagging to Social Classification
Tags vs Shelves: From Social Tagging to Social ClassificationTags vs Shelves: From Social Tagging to Social Classification
Tags vs Shelves: From Social Tagging to Social Classification
 

Mining Twitter for Real-Time Trend and Information Discovery

  • 1. Mining Twitter for real-time trend and information discovery Yahoo! Research Barcelona Arkaitz Zubiaga NLP & IR Group @ UNED December 19th, 2011
  • 2. Motivation Index 1 Motivation 2 Our Work (I): Classification of Trending Topics 3 Our Work (II): Real-Time Summarization of Events 4 Outlook Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 2 / 43
  • 3. Motivation Twitter Twitter is a microblogging service with over 200 million users. Users share short messages of up to 140 characters (tweets). Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 3 / 43
  • 4. Motivation Twitter: following users Different from Facebook, following is not reciprocal. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 4 / 43
  • 5. Motivation Twitter: retweeting Retweet: users can help spread tweets by others. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 5 / 43
  • 6. Motivation Twitter Retweeting enables fast spread of messages. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 6 / 43
  • 7. Motivation Increase of activity on Twitter As of October 2011, Twitter received 250 million tweets per day. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 7 / 43
  • 8. Motivation Variety of Twitter accounts Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 8 / 43
  • 9. Motivation Usefulness of Twitter Twitter provides... 1 ...large amounts of data in real-time, 2 from a wide variety of sources, 3 with the ability to spread rapidly. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 9 / 43
  • 10. Motivation Twitter’s popularity Twitter has gained widespread popularity as a tool for... Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 10 / 43
  • 11. Motivation Using Twitter for... following events (1) Live-tweeting about and following events. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 11 / 43
  • 12. Motivation Using Twitter for... helping others (2) Helping others, as in natural disasters. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 12 / 43
  • 13. Motivation Using Twitter for... finding out about news and (3) Finding out about breaking news. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 13 / 43
  • 14. Motivation Twitter on the media Lots of researchers are analyzing tweets. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 14 / 43
  • 15. Motivation Trends on Twitter The news about the Japan earthquake broke on Twitter. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 15 / 43
  • 16. Motivation Video: Japan earthquake on Twitter Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 16 / 43
  • 17. Motivation Research on Twitter Most of the research on Twitter focus on the analysis of streams after they happened. Very little research deals with the real-time analysis of streams. Our goal: How can we mine Twitter streams to acquire real-time knowledge about events and trends? Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 17 / 43
  • 18. Our Work (I): Classification of Trending Topics Index 1 Motivation 2 Our Work (I): Classification of Trending Topics 3 Our Work (II): Real-Time Summarization of Events 4 Outlook Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 18 / 43
  • 19. Our Work (I): Classification of Trending Topics Trending Topics on Twitter Trending topics reflect the top conversations being discussed on Twitter more than usually. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 19 / 43
  • 20. Our Work (I): Classification of Trending Topics What produces trending topics? What kinds of events leverage those trending topics? Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 20 / 43
  • 21. Our Work (I): Classification of Trending Topics Typology of Trending Topics News: Japan earthquake. Current events: a soccer game. Memes: funny and viral ideas. Commemoratives: World AIDS Day. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 21 / 43
  • 22. Our Work (I): Classification of Trending Topics Goal Find out the type of a trending topic as soon as it emerges. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 22 / 43
  • 23. Our Work (I): Classification of Trending Topics Dataset 1,036 unique trending topics, with up to 1,500 associated tweets as soon as they trended. Manual classification of trending topics: 616 current events. 251 memes. 142 news. 27 commemoratives. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 23 / 43
  • 24. Our Work (I): Classification of Trending Topics Experiment Settings Support Vector Machines (one-against-all) 500 trends for the training set. 10 runs. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 24 / 43
  • 25. Our Work (I): Classification of Trending Topics Representation of Trending Topics 2 different representation approaches: Twitter features: 15 straightforward language-independent features that rely on the social spread of trends. Bag-of-words: Text of tweets (TF). Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 25 / 43
  • 26. Our Work (I): Classification of Trending Topics Results Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 26 / 43
  • 27. Our Work (I): Classification of Trending Topics Results Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 27 / 43
  • 28. Our Work (I): Classification of Trending Topics Main findings Trending topics can accurately (78.4%) be categorized using social features: Outperforming use of textual content. Without making use of external data. In real-time as the trending topic emerges. Arkaitz Zubiaga, Damiano Spina, V´ ıctor Fresno, and Raquel Mart´ ınez. 2011. Classifying trending topics: a typology of conversation triggers on Twitter. CIKM 2011. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 28 / 43
  • 29. Our Work (II): Real-Time Summarization of Events Index 1 Motivation 2 Our Work (I): Classification of Trending Topics 3 Our Work (II): Real-Time Summarization of Events 4 Outlook Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 29 / 43
  • 30. Our Work (II): Real-Time Summarization of Events Events on Twitter When users live-tweet about events: They produce vast amounts of tweets about events. Users want to follow what others say. Users cannot follow the overwhelming amounts of tweets. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 30 / 43
  • 31. Our Work (II): Real-Time Summarization of Events Stream summarization Can we summarize streams of tweets in such a way that: Users receive a reduced stream that they can follow? Users do not miss any key sub-event occurred during the event? Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 31 / 43
  • 32. Our Work (II): Real-Time Summarization of Events Study of soccer games Copa America 2011 (July 1-26, 2011): 26 soccer games. 11k-70k tweets per game. Tweets are written in 30 languages. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 32 / 43
  • 33. Our Work (II): Real-Time Summarization of Events Gold Standard Live reports gathered from Yahoo! Sports. Yahoo! journalists provide annotations for: Goals. Penalties. Red Cards. Disallowed Goals. Game Starts, Ends, Stops & Resumptions. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 33 / 43
  • 34. Our Work (II): Real-Time Summarization of Events Histogram of a Soccer Game 2500 tweet rate 2000 1500 1000 1310864000 1310862000 1310860000 1310858000 1310856000 1310854000 500 time elapsed Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 34 / 43
  • 35. Our Work (II): Real-Time Summarization of Events Summarization of soccer games 2-step summarization: 1 Sub-event detection. 2 Tweet selection. Sub-event Detection Tweet Selection tweet tweet tweet summary tweets stream real-time Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 35 / 43
  • 36. Our Work (II): Real-Time Summarization of Events 1st Step: Sub-event Detection Increase [Zhao et al., 2011]: a sub-event occurred when a sudden increase is given in the tweeting rate (1.7 as much as the previous rate). Outliers: learns from audience. High tweeting rates as compared to rates seen so far will be considered sub-events (90% percentile). Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 36 / 43
  • 37. Our Work (II): Real-Time Summarization of Events 1st Step: Results Increase Outliers P 0.29 0.51 R 0.81 0.84 F1 0.41 0.63 # 45.4 25.6 Increase-based approach provides more sub-events, with many FPs (recall-based). Outlier-based approach (rather based on outstanding tweeting rates) improves in P and R. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 37 / 43
  • 38. Our Work (II): Real-Time Summarization of Events 2nd Step: Tweet Selection Each term appearing in tweets in a given timeframe is given a weight according to: Frequency (TF). Language Models (KLD). These weightings enable to choose a representative tweet, as the tweet with higher value adding up weights of its terms. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 38 / 43
  • 39. Our Work (II): Real-Time Summarization of Events 2nd Step: Results es en pt Goals (54) TF KLD 0.98 1.00 0.98 1.00 0.98 1.00 Penalties (2) TF KLD 1.00 1.00 0.50 0.50 1.00 1.00 Red cards (12) TF KLD 0.75 0.92 0.75 0.92 0.83 1.00 Disallowed goals (10) TF KLD 0.40 0.40 0.50 0.50 0.40 0.30 Game starts (26) TF KLD 0.73 0.84 0.74 0.79 0.79 0.83 Game ends (26) TF KLD 1.00 1.00 1.00 1.00 1.00 1.00 Game stops & resumptions (63) TF KLD 0.62 0.68 0.60 0.60 0.57 0.59 Overall TF KLD 0.79 0.84 0.74 0.77 0.78 0.82 Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 39 / 43
  • 40. Our Work (II): Real-Time Summarization of Events Main findings Use of state-of-the-art text analysis methods generates accurate summaries: With precision and recall values above 80% (100% for key sub-events). In real-time as the game is being played. In 3 different languages (es, en, pt). Without need of external data. Damiano Spina, Arkaitz Zubiaga, Enrique Amig´, Julio Gonzalo. Towards o Real-Time Summarization of Events from Twitter Streams. To Appear. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 40 / 43
  • 41. Outlook Index 1 Motivation 2 Our Work (I): Classification of Trending Topics 3 Our Work (II): Real-Time Summarization of Events 4 Outlook Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 41 / 43
  • 42. Outlook Outlook Work 1: Further dig into each type of trending topic, in order to look for subtypes of trends. Work 2: Evaluate the performance of the summarizer on other kinds of scheduled events (award ceremonies, keynote talks,...) Evaluate novelty of information garnered from tweets. Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 42 / 43
  • 43. Outlook Any Questions? Arkaitz Zubiaga (UNED) Real-time mining of Twitter December 19th, 2011 43 / 43