SlideShare a Scribd company logo
1 of 19
Download to read offline
A Concentric-based
Approach to Represent News
Topics in Tweets
Presenter: Enya Nieland
Supervisor: Oana Inel
Problem Definition
● Tweets are short
○ max. 140 characters
● Redundant information
● A lot of the same information
Can we contextualize topics in tweets by determining
a relevance score for the event-related information
contained in the tweets?
Related Work
Concentric Model
● Concentric Model for news videos
○ Core
■ Key entities
■ Summarizes main fact
■ Frequently mentioned entities
○ Crust
■ Describe particular details
■ Not necessarily frequent
■ Based on relations to Core
Introducing the Concentric Model
● Relevancy Dimension
○ Rings in concentric model
■ each ring is a different level of
relevancy
○ Relevancy depends on interpretation
● Finding Predicates to Entity Relations
○ Finding relations between entities
● Tracking Stories over Time
○ How does a news topic evolve over time
José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy.
“Capturing News Stories Once, Retelling a Thousand Ways”.
José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy. “The
Concentric Nature of News Semantic Snapshots”.
Dataset
Dataset
● 817 Tweets about the event whaling
● 2014 and 2015
● Dataset contains:
○ Tweet text
○ Relevant Mentions
○ Scores:
■ Tweet Event Relevance Score
■ Relevant Mentions Score
■ Sentiment Score
■ Novelty Score
● Scores defined through crowdsourcing
Oana Inel, Tommaso Caselli, and Lora Aroyo. “Crowdsourcing
Salient Information from Tweets and News”.
High Tweet Event Relevance Score (1.00)
Japan Sets Off for First Whaling Since UN Court
Ruling - See more at: http://t.co/5BiHSWqjYu (
#japancc live at http://t.co/MVOUQb5AwD)
Low Tweet Event Relevance Score (0.24)
#health Why Norway Needs to Let Whaling Die -
Despite best industry efforts, the whaling industry
in Norway is fai... http://t.co/kC2c8odoS9
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Approach
1. Take dataset from Inel et al.
2. Use the scores provided
3. Data analysis
4. Determine a Core and Crust
5. Combine Core and Crust to
make Concentric Model
6. Evaluation of the results
Baseline model
Named Entity Expansion and Ranking
1. Generating list of entities → dataset
Core Generation
2. Identify entities with higher level of
representativeness → frequency of Relevant
Mentions
3. Order entities (high → low)
4. Add top ranked entities to Core until one is
found that is not semantically connected
Crust Generation
5. Add entities with semantic relationship to
core elements
Replication of the approach of Redondo et al
Fig 2. Scaled down representation of the Baseline model
First approach
● Calculate frequency of the relevant
mentions
● Calculate average Relevant Mention Score
● Determine Core and Crust, based on
thresholds
● Core
○ average Relevant Mention Score >= 0.70
○ number of mentions > 10
● Crust
○ average Relevant Mention Score >= 0.50
○ number of mentions > 10
Fig 3. Representation of the First approach
Limitations First approach
● Same Relevant Mentions, but some contain
symbols (#, :)
● Not all Relevant Mentions are lowercase
Ways to improve
● Use stemming/lemmatization
○ Stemming works better
● Get rid of all symbols
○ Tweets and Relevant Mentions should only
contain letters a-z
● Make better use of scores from the dataset
Fig 3. Representation of the First approach
1. Only use a-z & Implement stemming
2. Filter on Tweet Relevance Score ≥ 0.5
3. Filter on Relevant Mention Score ≥ 0.5
Core
4. Find all single word Relevant Mentions
5. Count the occurrences in Tweets + order
6. Count the occurrences in other Relevant Mentions
7. Start from top add to core until occurrences in
other Relevant Mentions = 0
Crust
8. Find Relevant Mentions that contain Core entities
9. Count the Core words in the Relevant Mentions
10. Filter out Relevant Mentions with 1 and 2 words
11. Filter out words that only contain ‘whale’ or ‘japan’
Final Approach
Fig 5. Scaled down representation of the Final approach
Evaluation
Precision Recall F1-Score
Baseline Model 0,56 0,17 0,26
Final Approach
Relevance Score
thresholds
0,3 0,64 0,71 0,67
0,4 0,83 0,51 0,64
0,5 0,85 0,43 0,57
0,6 0,97 0,55 0,70
0,7 0,72 0,33 0,46
0,8 0,50 0,64 0,56
Baseline Model
True Relevance
Total
Positive Negative
Examined
Relevance
Positive 94 75 169
Negative 463 285 648
Total 557 360 817
Final Approach - 0,60
True Relevance
Total
Positive Negative
Examined
Relevance
Positive 35 1 36
Negative 29 55 84
Total 64 56 120
Evaluation
Precision Recall F1-Score
Baseline Model 0,56 0,17 0,26
Final Approach
Relevance Score
thresholds
0,3 0,64 0,71 0,67
0,4 0,83 0,51 0,64
0,5 0,85 0,43 0,57
0,6 0,97 0,55 0,70
0,7 0,72 0,33 0,46
0,8 0,50 0,64 0,56
Baseline Model
True Relevance
Total
Positive Negative
Examined
Relevance
Positive 94 75 169
Negative 463 285 648
Total 557 360 817
Final Approach - 0,60
True Relevance
Total
Positive Negative
Examined
Relevance
Positive 35 1 36
Negative 29 55 84
Total 64 56 120
Conclusion
● Only following the approach from Redondo et al. does not work
● Relevance scores need to be taken into account
● First approach does not work
○ to many symbols and not all lowercase
○ stemming needed
● Final approach with Relevance Score threshold of 0.60 works best
Research Question: Can we contextualize topics in tweets by
determining a relevance score for the event-related information
contained in the tweets?
Ideas for further research
● Does the model also work on other data?
● Are Tweets with links (to articles) more relevant?
● Does implementing novelty score for every day in dataset give a better
Concentric model?
● Does the model also work on news topics that are only mentioned during one
day (e.g. sports)?

More Related Content

Similar to A Concentric-based Approach to Represent News Topics in Tweets

Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxABCraftsman
 
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docxmonicafrancis71118
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support SystemKavita Ganesan
 
Measuring Content Effectiveness
Measuring Content EffectivenessMeasuring Content Effectiveness
Measuring Content EffectivenessAndrea L. Ames
 
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...Scrum Bangalore
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveHyderabad Scalability Meetup
 
Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!Walid Maalej
 
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxrafaelaj1
 
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docx
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docxENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docx
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docxSALU18
 
C5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeC5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeChereCoble417
 
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano Brunelli
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano BrunelliTrivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano Brunelli
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano BrunelliTrivadis
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
Chemistry Lab Manual 2012-13
Chemistry Lab Manual 2012-13Chemistry Lab Manual 2012-13
Chemistry Lab Manual 2012-13Stephen Taylor
 
Happy Birthday Writing Paper TMF- Freebies For
Happy Birthday Writing Paper TMF- Freebies ForHappy Birthday Writing Paper TMF- Freebies For
Happy Birthday Writing Paper TMF- Freebies ForErin Moore
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
 
Creating Effective Visuals for Teaching and Presentation
Creating Effective Visuals for Teaching and PresentationCreating Effective Visuals for Teaching and Presentation
Creating Effective Visuals for Teaching and PresentationKristen Sosulski
 

Similar to A Concentric-based Approach to Represent News Topics in Tweets (19)

Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptx
 
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx
1  EGR 150 - ASTM Report Guidelines Introduction Pleas.docx
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
Opinion Driven Decision Support System
Opinion Driven Decision Support SystemOpinion Driven Decision Support System
Opinion Driven Decision Support System
 
Measuring Content Effectiveness
Measuring Content EffectivenessMeasuring Content Effectiveness
Measuring Content Effectiveness
 
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...
Don't drive your Race car on a dirt track!! - Athresh Krishnappa, Scrum Banga...
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!
 
Dip
DipDip
Dip
 
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docxSTAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
STAT200 Assignment #2 - Descriptive Statistics Analysis and.docx
 
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docx
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docxENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docx
ENGL 102Research Paper Grading RubricCriteriaLevels of Achie.docx
 
C5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environmeC5Think of a dependent variable within your work environme
C5Think of a dependent variable within your work environme
 
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano Brunelli
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano BrunelliTrivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano Brunelli
Trivadis TechEvent 2017 Data Science in the Silicon Valley by Stefano Brunelli
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Chemistry Lab Manual 2012-13
Chemistry Lab Manual 2012-13Chemistry Lab Manual 2012-13
Chemistry Lab Manual 2012-13
 
Happy Birthday Writing Paper TMF- Freebies For
Happy Birthday Writing Paper TMF- Freebies ForHappy Birthday Writing Paper TMF- Freebies For
Happy Birthday Writing Paper TMF- Freebies For
 
Scott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SFScott Clark, Software Engineer, Yelp at MLconf SF
Scott Clark, Software Engineer, Yelp at MLconf SF
 
Creating Effective Visuals for Teaching and Presentation
Creating Effective Visuals for Teaching and PresentationCreating Effective Visuals for Teaching and Presentation
Creating Effective Visuals for Teaching and Presentation
 

Recently uploaded

GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisAreesha Ahmad
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...Sangram Sahoo
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxGlendelCaroz
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einsteinxgamestudios8
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semborkhotudu123
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfhoangquan21999
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...kevin8smith
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...Sérgio Sacani
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonAftabAhmedRahimoon
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfSuchita Rawat
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil RecordSangram Sahoo
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloChristian Robert
 
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...NoorulainMehmood1
 
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptTHE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptsinghnarendra5386
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxssusera4ec7b
 

Recently uploaded (20)

GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
A Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert EinsteinA Scientific PowerPoint on Albert Einstein
A Scientific PowerPoint on Albert Einstein
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
An Overview of Active and Passive Targeting Strategies to Improve the Nano-Ca...
 
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptTHE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
 

A Concentric-based Approach to Represent News Topics in Tweets

  • 1. A Concentric-based Approach to Represent News Topics in Tweets Presenter: Enya Nieland Supervisor: Oana Inel
  • 2. Problem Definition ● Tweets are short ○ max. 140 characters ● Redundant information ● A lot of the same information
  • 3. Can we contextualize topics in tweets by determining a relevance score for the event-related information contained in the tweets?
  • 4. Related Work Concentric Model ● Concentric Model for news videos ○ Core ■ Key entities ■ Summarizes main fact ■ Frequently mentioned entities ○ Crust ■ Describe particular details ■ Not necessarily frequent ■ Based on relations to Core Introducing the Concentric Model ● Relevancy Dimension ○ Rings in concentric model ■ each ring is a different level of relevancy ○ Relevancy depends on interpretation ● Finding Predicates to Entity Relations ○ Finding relations between entities ● Tracking Stories over Time ○ How does a news topic evolve over time José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy. “Capturing News Stories Once, Retelling a Thousand Ways”. José Luis Redondo, Giuseppe Rizzo, and Raphaël Troncy. “The Concentric Nature of News Semantic Snapshots”.
  • 5. Dataset Dataset ● 817 Tweets about the event whaling ● 2014 and 2015 ● Dataset contains: ○ Tweet text ○ Relevant Mentions ○ Scores: ■ Tweet Event Relevance Score ■ Relevant Mentions Score ■ Sentiment Score ■ Novelty Score ● Scores defined through crowdsourcing Oana Inel, Tommaso Caselli, and Lora Aroyo. “Crowdsourcing Salient Information from Tweets and News”. High Tweet Event Relevance Score (1.00) Japan Sets Off for First Whaling Since UN Court Ruling - See more at: http://t.co/5BiHSWqjYu ( #japancc live at http://t.co/MVOUQb5AwD) Low Tweet Event Relevance Score (0.24) #health Why Norway Needs to Let Whaling Die - Despite best industry efforts, the whaling industry in Norway is fai... http://t.co/kC2c8odoS9
  • 6. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 7. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 8. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 9. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 10. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 11. Approach 1. Take dataset from Inel et al. 2. Use the scores provided 3. Data analysis 4. Determine a Core and Crust 5. Combine Core and Crust to make Concentric Model 6. Evaluation of the results
  • 12. Baseline model Named Entity Expansion and Ranking 1. Generating list of entities → dataset Core Generation 2. Identify entities with higher level of representativeness → frequency of Relevant Mentions 3. Order entities (high → low) 4. Add top ranked entities to Core until one is found that is not semantically connected Crust Generation 5. Add entities with semantic relationship to core elements Replication of the approach of Redondo et al Fig 2. Scaled down representation of the Baseline model
  • 13. First approach ● Calculate frequency of the relevant mentions ● Calculate average Relevant Mention Score ● Determine Core and Crust, based on thresholds ● Core ○ average Relevant Mention Score >= 0.70 ○ number of mentions > 10 ● Crust ○ average Relevant Mention Score >= 0.50 ○ number of mentions > 10 Fig 3. Representation of the First approach
  • 14. Limitations First approach ● Same Relevant Mentions, but some contain symbols (#, :) ● Not all Relevant Mentions are lowercase Ways to improve ● Use stemming/lemmatization ○ Stemming works better ● Get rid of all symbols ○ Tweets and Relevant Mentions should only contain letters a-z ● Make better use of scores from the dataset Fig 3. Representation of the First approach
  • 15. 1. Only use a-z & Implement stemming 2. Filter on Tweet Relevance Score ≥ 0.5 3. Filter on Relevant Mention Score ≥ 0.5 Core 4. Find all single word Relevant Mentions 5. Count the occurrences in Tweets + order 6. Count the occurrences in other Relevant Mentions 7. Start from top add to core until occurrences in other Relevant Mentions = 0 Crust 8. Find Relevant Mentions that contain Core entities 9. Count the Core words in the Relevant Mentions 10. Filter out Relevant Mentions with 1 and 2 words 11. Filter out words that only contain ‘whale’ or ‘japan’ Final Approach Fig 5. Scaled down representation of the Final approach
  • 16. Evaluation Precision Recall F1-Score Baseline Model 0,56 0,17 0,26 Final Approach Relevance Score thresholds 0,3 0,64 0,71 0,67 0,4 0,83 0,51 0,64 0,5 0,85 0,43 0,57 0,6 0,97 0,55 0,70 0,7 0,72 0,33 0,46 0,8 0,50 0,64 0,56 Baseline Model True Relevance Total Positive Negative Examined Relevance Positive 94 75 169 Negative 463 285 648 Total 557 360 817 Final Approach - 0,60 True Relevance Total Positive Negative Examined Relevance Positive 35 1 36 Negative 29 55 84 Total 64 56 120
  • 17. Evaluation Precision Recall F1-Score Baseline Model 0,56 0,17 0,26 Final Approach Relevance Score thresholds 0,3 0,64 0,71 0,67 0,4 0,83 0,51 0,64 0,5 0,85 0,43 0,57 0,6 0,97 0,55 0,70 0,7 0,72 0,33 0,46 0,8 0,50 0,64 0,56 Baseline Model True Relevance Total Positive Negative Examined Relevance Positive 94 75 169 Negative 463 285 648 Total 557 360 817 Final Approach - 0,60 True Relevance Total Positive Negative Examined Relevance Positive 35 1 36 Negative 29 55 84 Total 64 56 120
  • 18. Conclusion ● Only following the approach from Redondo et al. does not work ● Relevance scores need to be taken into account ● First approach does not work ○ to many symbols and not all lowercase ○ stemming needed ● Final approach with Relevance Score threshold of 0.60 works best Research Question: Can we contextualize topics in tweets by determining a relevance score for the event-related information contained in the tweets?
  • 19. Ideas for further research ● Does the model also work on other data? ● Are Tweets with links (to articles) more relevant? ● Does implementing novelty score for every day in dataset give a better Concentric model? ● Does the model also work on news topics that are only mentioned during one day (e.g. sports)?