SlideShare a Scribd company logo
Named Entity Recognition in Twitter:
A Dataset and Analysis on Short-Term Temporal Shifts
Computer Science & Informatics, Cardiff University, Cardiff NLP
Snap Inc.,
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
https://huggingface.co/datasets/tner/tweetner7
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Named-Entity Recognition (NER)
2
Jacob Collier is an English artist.
BERT + Linear Projection
Jacob is English
an artist.
PJacob PCollier Pan PEnglish Partist
Pis
Collier
Tokenization
Location
Person
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
NER on Social Media
● Noisy Text
○ Less formal.
○ Different vocabulary.
○ Emoji, urls, etc.
● Temporal Shift
○ The meaning of words is constantly changing or evolving over time.
○ New entities in new period.
3
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
TweetNER7
TweetNER7 is a NER dataset on Twitter with 7 entity types.
● Person, Location, Corporation, Creative work, Group, Product.
● Tweets are collected from Sep 2019 to Aug 2021.
○ 2020-set: Sep 2019 ~ Aug 2020 (5,768 tweets)
○ 2021-set: Sep 2020 ~ Aug 2021 (5,612 tweets)
● Temporal shift setup
○ Training/Validation: 2020-set
○ Test: 2021-set
4
https://huggingface.co/datasets/tner/tweetner7
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Other Twitter NER Datasets
TweetNER7 is unique for
● Uniform distribution over months
○ 2,000 tweets in each month
● (Relatively) short term temporal shift
○ Over 2 years
● Large annotation (data size & entities)
○ More than 10k annotated tweets with 7 entities
5
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Dataset Collection
TweetTopic tweet collection:
● Query 50 tweets every two hours
from September 2019 to October
2021
● Pre-filtering & Near de-duplication
● Trend-filter
6
Tweet collection pipeline.
Trend-filter
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Preprocessing Tweets
● Special token for URLs.
● Special representation for
usernames.
7
Raw tweet.
Processed tweet.
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Dataset Annotation
● Mechanical Turk:
○ 3 annotators per tweet
● Agreement
○ 1 / 3: Disregard
○ 2 / 3: Manual check
8
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Experiment: Temporal Shift Setup
Setup: Train & validate on 2020-set & test
on 2021-set
Metric: Micro F1 / Macro F1
Result:
● RoBERTa is the best in 2021.
● BERTweet is the best in 2020.
● Performance: 2021 > 2020.
9
Model
Micro F1
(2021/2020)
Macro F1
(2021/2020)
BERTweet Base 64.1 / 66.4 59.4 / 62.4
BERTweet Large 64.0 / 65.9 59.5 / 62.6
RoBERTa Base 64.2 / 64.2 59.1 / 60.2
RoBERTa Large 64.8 / 65.7 60.0 / 61.9
TimeLM2019 64.3 / 65.4 59.3 / 61.1
TimeLM2020 62.9 / 64.4 58.3 / 60.3
TimeLM2021 64.2 / 65.4 59.5 / 61.1
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Results Breakdown
Easy entities:
Person, product, location
● Large number of entities
● Low diversity
Challenging entities:
Creative work, event
● Small number of entities
● High diversity
10
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Experiment: Continuous Fine-tuning
Setup: Use both of 2020 & 2021 training
set by concatenation or continuous
fine-tuning.
Result:
● Continuous fine-tuning achieves
the best results.
● TimeLM19 is better than TimeLM20
and TimeLM21.
11
Model Dataset
Micro F1
(2021/2020)
Macro F1
(2021/2020)
RoBERTa
LARGE
2020-set 64.8 / 65.7 60.0 / 61.9
Continuous 66.0 / 66.3 60.9 / 62.4
TimeLM
2019
2020-set 64.3 / 65.4 59.3 / 61.1
Continuous 65.9 / 64.8 61.1 / 60.6
TimeLM
2020
2020-set 62.9 / 64.4 58.3 / 60.3
Continuous 65.5 / 65.3 60.6 / 61.3
TimeLM
2021
2020-set 64.2 / 65.4 59.5 / 61.1
Continuous 65.1 / 64.9 60.0 / 60.7
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Experiment: Self-labeling
Setup: Collect additional 93,594 and
878,80 tweets from the period of
2020-set and 2021-set. Generate pseudo
annotation by fine-tuned RoBERTa Large
model.
Result: No significant improvements.
12
Dataset
Micro F1
(2021)
Macro F1
(2021)
2020-set 64.8 60
2020-self-labeling 64.6 59.3
2021-self-labeling 64.2 59.3
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Are the pseudo labels not useful at all?
Setup: For incorrect prediction,
retrieve the pseudo-labels for the
entity within the range of 7 days,
and take the most frequent
prediction as the contextualized
prediction.
13
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Contextualized prediction usually
matches the original one, meaning is
wrong 😩
But, the second most frequent
prediction is correct in average 🤔
There is a signal at least, but not easy
way to utilize it.
14
Result of Contextualized Prediction
Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts
Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados
Summary
● TweetNER7, a NER dataset on Twitter with temporal-shift and 7 entity types
● Report LM fine-tuning results
○ Temporal split vs Random split
○ Continuous fine-tuning
● Self-labeling
○ Contextual prediction
15
Thank you!!

More Related Content

Similar to 2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
ijtsrd
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
Neo4j
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
Greta Workman
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
taeseon ryu
 
Social Media - Shock and Awe - NC3C Conference 3-30-2011
Social Media - Shock and Awe - NC3C Conference 3-30-2011Social Media - Shock and Awe - NC3C Conference 3-30-2011
Social Media - Shock and Awe - NC3C Conference 3-30-2011
Lee Yount
 
bdSquared
bdSquaredbdSquared
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
Tunghai University
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Alexandre Sieira
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
Stanford University
 
Bitcoin Price Prediction and Forecasting
Bitcoin Price Prediction and ForecastingBitcoin Price Prediction and Forecasting
Bitcoin Price Prediction and Forecasting
IRJET Journal
 
Webinar: The Future of Data-driven Storytelling
Webinar: The Future of Data-driven StorytellingWebinar: The Future of Data-driven Storytelling
Webinar: The Future of Data-driven Storytelling
LEWIS Global Communications
 
The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream
Inside Analysis
 
Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF
Kaliya "Identity Woman" Young
 
Privacy and Security in Online Social Media : Trust and Credebillity on OSM
Privacy and Security in Online Social Media : Trust and Credebillity on OSMPrivacy and Security in Online Social Media : Trust and Credebillity on OSM
Privacy and Security in Online Social Media : Trust and Credebillity on OSM
IIIT Hyderabad
 
Frakture 4 fold webinar presentation 2015-04_23
Frakture 4 fold webinar presentation 2015-04_23Frakture 4 fold webinar presentation 2015-04_23
Frakture 4 fold webinar presentation 2015-04_23
Chris Lundberg
 
Predicting Business Confidence using News and Social Media
Predicting Business Confidence using News and Social MediaPredicting Business Confidence using News and Social Media
Predicting Business Confidence using News and Social Media
John Cai
 
How Data Visualization Enhances the News
How Data Visualization Enhances the NewsHow Data Visualization Enhances the News
How Data Visualization Enhances the News
Inside Analysis
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016
Stanford University
 
No estimates - 10 new principles for testing
No estimates  - 10 new principles for testingNo estimates  - 10 new principles for testing
No estimates - 10 new principles for testing
Vasco Duarte
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweets
mitsmit
 

Similar to 2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts (20)

Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 
Social Media - Shock and Awe - NC3C Conference 3-30-2011
Social Media - Shock and Awe - NC3C Conference 3-30-2011Social Media - Shock and Awe - NC3C Conference 3-30-2011
Social Media - Shock and Awe - NC3C Conference 3-30-2011
 
bdSquared
bdSquaredbdSquared
bdSquared
 
CML's Presentation at FengChia University
CML's Presentation at FengChia UniversityCML's Presentation at FengChia University
CML's Presentation at FengChia University
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
Bitcoin Price Prediction and Forecasting
Bitcoin Price Prediction and ForecastingBitcoin Price Prediction and Forecasting
Bitcoin Price Prediction and Forecasting
 
Webinar: The Future of Data-driven Storytelling
Webinar: The Future of Data-driven StorytellingWebinar: The Future of Data-driven Storytelling
Webinar: The Future of Data-driven Storytelling
 
The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream The New Simple: Predictive Analytics for the Mainstream
The New Simple: Predictive Analytics for the Mainstream
 
Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF Blockchain v Cryptocurrency: Talk for BridgeSF
Blockchain v Cryptocurrency: Talk for BridgeSF
 
Privacy and Security in Online Social Media : Trust and Credebillity on OSM
Privacy and Security in Online Social Media : Trust and Credebillity on OSMPrivacy and Security in Online Social Media : Trust and Credebillity on OSM
Privacy and Security in Online Social Media : Trust and Credebillity on OSM
 
Frakture 4 fold webinar presentation 2015-04_23
Frakture 4 fold webinar presentation 2015-04_23Frakture 4 fold webinar presentation 2015-04_23
Frakture 4 fold webinar presentation 2015-04_23
 
Predicting Business Confidence using News and Social Media
Predicting Business Confidence using News and Social MediaPredicting Business Confidence using News and Social Media
Predicting Business Confidence using News and Social Media
 
How Data Visualization Enhances the News
How Data Visualization Enhances the NewsHow Data Visualization Enhances the News
How Data Visualization Enhances the News
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016
 
No estimates - 10 new principles for testing
No estimates  - 10 new principles for testingNo estimates  - 10 new principles for testing
No estimates - 10 new principles for testing
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweets
 

More from asahiushio1

2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
asahiushio1
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
asahiushio1
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
asahiushio1
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
asahiushio1
 
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
asahiushio1
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
asahiushio1
 

More from asahiushio1 (7)

2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
2022-10, UCL NLP meetup, Toward a Better Understanding of Relational Knowledg...
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
2020-12, Cardiff NLP Reading Group, Commonsense Knowledge Probing
 
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
2021-05, ACL, BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language ...
 
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
2021-04, EACL, T-NER: An All-Round Python Library for Transformer-based Named...
 

Recently uploaded

Nereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.pptNereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.ppt
underratedsunrise
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
RDhivya6
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
muralinath2
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
 
Evaluation and Identification of J'BaFofi the Giant Spider of Congo and Moke...
Evaluation and Identification of J'BaFofi the Giant  Spider of Congo and Moke...Evaluation and Identification of J'BaFofi the Giant  Spider of Congo and Moke...
Evaluation and Identification of J'BaFofi the Giant Spider of Congo and Moke...
MrSproy
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
shubhijain836
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
PirithiRaju
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
FarhanaHussain18
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Sérgio Sacani
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
VasileiosMezaris
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
vimalveerammal
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 

Recently uploaded (20)

Nereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.pptNereis Type Study for BSc 1st semester.ppt
Nereis Type Study for BSc 1st semester.ppt
 
23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference23PH301 - Optics - Unit 2 - Interference
23PH301 - Optics - Unit 2 - Interference
 
acanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptxacanthocytes_causes_etiology_clinical sognificance-future.pptx
acanthocytes_causes_etiology_clinical sognificance-future.pptx
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
 
Evaluation and Identification of J'BaFofi the Giant Spider of Congo and Moke...
Evaluation and Identification of J'BaFofi the Giant  Spider of Congo and Moke...Evaluation and Identification of J'BaFofi the Giant  Spider of Congo and Moke...
Evaluation and Identification of J'BaFofi the Giant Spider of Congo and Moke...
 
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxTOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptx
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Firoozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed ProfessorFiroozeh Kashani-Sabet - An Esteemed Professor
Firoozeh Kashani-Sabet - An Esteemed Professor
 
Mites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdfMites,Slug,Snail_Infesting agricultural crops.pdf
Mites,Slug,Snail_Infesting agricultural crops.pdf
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
 
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
Compositions of iron-meteorite parent bodies constrainthe structure of the pr...
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5Quality assurance B.pharm 6th semester BP606T UNIT 5
Quality assurance B.pharm 6th semester BP606T UNIT 5
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 

2022-11, AACL, Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts

  • 1. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Computer Science & Informatics, Cardiff University, Cardiff NLP Snap Inc., Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados https://huggingface.co/datasets/tner/tweetner7
  • 2. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Named-Entity Recognition (NER) 2 Jacob Collier is an English artist. BERT + Linear Projection Jacob is English an artist. PJacob PCollier Pan PEnglish Partist Pis Collier Tokenization Location Person
  • 3. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados NER on Social Media ● Noisy Text ○ Less formal. ○ Different vocabulary. ○ Emoji, urls, etc. ● Temporal Shift ○ The meaning of words is constantly changing or evolving over time. ○ New entities in new period. 3
  • 4. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados TweetNER7 TweetNER7 is a NER dataset on Twitter with 7 entity types. ● Person, Location, Corporation, Creative work, Group, Product. ● Tweets are collected from Sep 2019 to Aug 2021. ○ 2020-set: Sep 2019 ~ Aug 2020 (5,768 tweets) ○ 2021-set: Sep 2020 ~ Aug 2021 (5,612 tweets) ● Temporal shift setup ○ Training/Validation: 2020-set ○ Test: 2021-set 4 https://huggingface.co/datasets/tner/tweetner7
  • 5. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Other Twitter NER Datasets TweetNER7 is unique for ● Uniform distribution over months ○ 2,000 tweets in each month ● (Relatively) short term temporal shift ○ Over 2 years ● Large annotation (data size & entities) ○ More than 10k annotated tweets with 7 entities 5
  • 6. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Dataset Collection TweetTopic tweet collection: ● Query 50 tweets every two hours from September 2019 to October 2021 ● Pre-filtering & Near de-duplication ● Trend-filter 6 Tweet collection pipeline. Trend-filter
  • 7. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Preprocessing Tweets ● Special token for URLs. ● Special representation for usernames. 7 Raw tweet. Processed tweet.
  • 8. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Dataset Annotation ● Mechanical Turk: ○ 3 annotators per tweet ● Agreement ○ 1 / 3: Disregard ○ 2 / 3: Manual check 8
  • 9. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Experiment: Temporal Shift Setup Setup: Train & validate on 2020-set & test on 2021-set Metric: Micro F1 / Macro F1 Result: ● RoBERTa is the best in 2021. ● BERTweet is the best in 2020. ● Performance: 2021 > 2020. 9 Model Micro F1 (2021/2020) Macro F1 (2021/2020) BERTweet Base 64.1 / 66.4 59.4 / 62.4 BERTweet Large 64.0 / 65.9 59.5 / 62.6 RoBERTa Base 64.2 / 64.2 59.1 / 60.2 RoBERTa Large 64.8 / 65.7 60.0 / 61.9 TimeLM2019 64.3 / 65.4 59.3 / 61.1 TimeLM2020 62.9 / 64.4 58.3 / 60.3 TimeLM2021 64.2 / 65.4 59.5 / 61.1
  • 10. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Results Breakdown Easy entities: Person, product, location ● Large number of entities ● Low diversity Challenging entities: Creative work, event ● Small number of entities ● High diversity 10
  • 11. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Experiment: Continuous Fine-tuning Setup: Use both of 2020 & 2021 training set by concatenation or continuous fine-tuning. Result: ● Continuous fine-tuning achieves the best results. ● TimeLM19 is better than TimeLM20 and TimeLM21. 11 Model Dataset Micro F1 (2021/2020) Macro F1 (2021/2020) RoBERTa LARGE 2020-set 64.8 / 65.7 60.0 / 61.9 Continuous 66.0 / 66.3 60.9 / 62.4 TimeLM 2019 2020-set 64.3 / 65.4 59.3 / 61.1 Continuous 65.9 / 64.8 61.1 / 60.6 TimeLM 2020 2020-set 62.9 / 64.4 58.3 / 60.3 Continuous 65.5 / 65.3 60.6 / 61.3 TimeLM 2021 2020-set 64.2 / 65.4 59.5 / 61.1 Continuous 65.1 / 64.9 60.0 / 60.7
  • 12. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Experiment: Self-labeling Setup: Collect additional 93,594 and 878,80 tweets from the period of 2020-set and 2021-set. Generate pseudo annotation by fine-tuned RoBERTa Large model. Result: No significant improvements. 12 Dataset Micro F1 (2021) Macro F1 (2021) 2020-set 64.8 60 2020-self-labeling 64.6 59.3 2021-self-labeling 64.2 59.3
  • 13. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Are the pseudo labels not useful at all? Setup: For incorrect prediction, retrieve the pseudo-labels for the entity within the range of 7 days, and take the most frequent prediction as the contextualized prediction. 13
  • 14. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Contextualized prediction usually matches the original one, meaning is wrong 😩 But, the second most frequent prediction is correct in average 🤔 There is a signal at least, but not easy way to utilize it. 14 Result of Contextualized Prediction
  • 15. Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Asahi Ushio, Leonardo Neves, Vitor Silva, Francesco Barbieri, Jose Camacho-Collados Summary ● TweetNER7, a NER dataset on Twitter with temporal-shift and 7 entity types ● Report LM fine-tuning results ○ Temporal split vs Random split ○ Continuous fine-tuning ● Self-labeling ○ Contextual prediction 15