2. Problem Definition
● Tweets are short
○ max. 140 characters
● Redundant information
● A lot of the same information
3. Can we contextualize topics in tweets by determining
a relevance score for the event-related information
contained in the tweets?
4. Related Work
Concentric Model
● Concentric Model for news videos
○ Core
■ Key entities
■ Summarizes main fact
■ Frequently mentioned entities
○ Crust
■ Describe particular details
■ Not necessarily frequent
■ Based on relations to Core
Introducing the Concentric Model
● Relevancy Dimension
○ Rings in concentric model
■ each ring is a different level of
relevancy
○ Relevancy depends on interpretation
● Finding Predicates to Entity Relations
○ Finding relations between entities
● Tracking Stories over Time
○ How does a news topic evolve over time
José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy.
“Capturing News Stories Once, Retelling a Thousand Ways”.
José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy. “The
Concentric Nature of News Semantic Snapshots”.
5. Dataset
Dataset
● 817 Tweets about the event whaling
● 2014 and 2015
● Dataset contains:
○ Tweet text
○ Relevant Mentions
○ Scores:
■ Tweet Event Relevance Score
■ Relevant Mentions Score
■ Sentiment Score
■ Novelty Score
● Scores defined through crowdsourcing
Oana Inel, Tommaso Caselli, and Lora Aroyo. “Crowdsourcing
Salient Information from Tweets and News”.
High Tweet Event Relevance Score (1.00)
Japan Sets Off for First Whaling Since UN Court
Ruling - See more at: http://t.co/5BiHSWqjYu (
#japancc live at http://t.co/MVOUQb5AwD)
Low Tweet Event Relevance Score (0.24)
#health Why Norway Needs to Let Whaling Die -
Despite best industry efforts, the whaling industry
in Norway is fai... http://t.co/kC2c8odoS9
6. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
7. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
8. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
9. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
10. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
11. Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
12. Baseline model
Named Entity Expansion and Ranking
1. Generating list of entities → dataset
Core Generation
2. Identify entities with higher level of
representativeness → frequency of Relevant
Mentions
3. Order entities (high → low)
4. Add top ranked entities to Core until one is
found that is not semantically connected
Crust Generation
5. Add entities with semantic relationship to
core elements
Replication of the approach of Redondo et al
Fig 2. Scaled down representation of the Baseline model
13. First approach
● Calculate frequency of the relevant
mentions
● Calculate average Relevant Mention Score
● Determine Core and Crust, based on
thresholds
● Core
○ average Relevant Mention Score >= 0.70
○ number of mentions > 10
● Crust
○ average Relevant Mention Score >= 0.50
○ number of mentions > 10
Fig 3. Representation of the First approach
14. Limitations First approach
● Same Relevant Mentions, but some contain
symbols (#, :)
● Not all Relevant Mentions are lowercase
Ways to improve
● Use stemming/lemmatization
○ Stemming works better
● Get rid of all symbols
○ Tweets and Relevant Mentions should only
contain letters a-z
● Make better use of scores from the dataset
Fig 3. Representation of the First approach
15. 1. Only use a-z & Implement stemming
2. Filter on Tweet Relevance Score ≥ 0.5
3. Filter on Relevant Mention Score ≥ 0.5
Core
4. Find all single word Relevant Mentions
5. Count the occurrences in Tweets + order
6. Count the occurrences in other Relevant Mentions
7. Start from top add to core until occurrences in
other Relevant Mentions = 0
Crust
8. Find Relevant Mentions that contain Core entities
9. Count the Core words in the Relevant Mentions
10. Filter out Relevant Mentions with 1 and 2 words
11. Filter out words that only contain ‘whale’ or ‘japan’
Final Approach
Fig 5. Scaled down representation of the Final approach
18. Conclusion
● Only following the approach from Redondo et al. does not work
● Relevance scores need to be taken into account
● First approach does not work
○ to many symbols and not all lowercase
○ stemming needed
● Final approach with Relevance Score threshold of 0.60 works best
Research Question: Can we contextualize topics in tweets by
determining a relevance score for the event-related information
contained in the tweets?
19. Ideas for further research
● Does the model also work on other data?
● Are Tweets with links (to articles) more relevant?
● Does implementing novelty score for every day in dataset give a better
Concentric model?
● Does the model also work on news topics that are only mentioned during one
day (e.g. sports)?