SlideShare a Scribd company logo
1 of 76
Download to read offline
REVISITING AND RE-EVALUATING RUMOUR STANCE CLASSIFICATION
Queen Mary University London, 11th November 2020
Carolina Scarton
c.scarton@sheffield.ac.uk
carolscarton
A LITTLE BIT ABOUT MYSELF...
➢ UG and MSc from the University of São Paulo, Brazil (2013)
A LITTLE BIT ABOUT MYSELF...
➢ UG and MSc from the University of São Paulo, Brazil (2013)
➢ PhD from the University of Sheffield (2017)
A LITTLE BIT ABOUT MYSELF...
➢ UG and MSc from the University of São Paulo, Brazil (2013)
➢ PhD from the University of Sheffield (2017)
➢ Research interests:
• Machine Translation
• Text Simplification
• NLP for social media
• Multi-word expressions processing
• NLP evaluation
• Personalised NLP
• NLP for healthcare
• …
INTRODUCTION
ONLINE RUMOURS
“circulating story of questionable veracity,
which is apparently credible but hard to verify,
and produces sufficient skepticism and/or
anxiety so as to motivate finding out the actual
truth” (Zubiaga et al., 2015)
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010;
Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
RUMOUR STANCE CLASSIFICATION
➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010;
Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
➢ However,
• four-class classification problem
• support, deny, query, comment
• Highly imbalanced problem
• Support and denies
• most important classes
• Different from traditional stance classification task
RUMOUR STANCE CLASSIFICATION
➢ RumourEval 2017 and 2019 → most used datasets (PHEME project)
• Task A: rumour stance classification
RUMOUR STANCE CLASSIFICATION
➢ RumourEval 2017 and 2019 → most used datasets (PHEME project)
• Task A: rumour stance classification
• Current models and official evaluation metrics:
• not robust for four-class imbalanced problems
• not robust for problems where classes have different importance
RUMOUREVAL 2017 → ACCURACY SCORE
WINNER - ACC: 0.784
RUMOUREVAL 2017 → ACCURACY SCORE
Adjusted weights
RUMOUREVAL 2017 → ACCURACY SCORE
Two-step classification
RUMOUREVAL 2017 → ACCURACY SCORE
Over-sampling
DEALING WITH IMBALANCED DATA
FOR STANCE CLASSIFICATION
Yue Li and Carolina Scarton (to appear): Revisiting Rumour Stance Classification: Dealing with Imbalanced Data. RDSM 2020.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
• Textual features (Aker et al., 2017):
• sentiment analysis
• emoticon analysis
• has slang or curse word
• surprise/doubt scores, etc.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
• Textual features (Aker et al., 2017):
• sentiment analysis
• emoticon analysis
• has slang or curse word
• surprise/doubt scores, etc.
macro-F1: 0.486
… LOOKING INTO SOTA
➢ RumourEval 2017 data
➢ BERT model → fine-tuning BERT for stance classification task
macro-F1: 0.516
… LOOKING INTO SOTA
➢ RumourEval 2017 data
➢ BERT model → fine-tuning BERT for stance classification task
macro-F1: 0.516 macro-F1: 0.486
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
• Hybrid sampling: SMOTEEN → data cleaning
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
• Hybrid sampling: SMOTEEN → data cleaning
➢ Learning-based approach: threshold moving (TM) →
changing probabilities of predicted classes
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
➢ Feature-based classifiers: LR, RF, MLP
RESULTS
RESULTS
● RUS → improves the performance of feature-based classifiers
RESULTS
● TM is similar to RUS
● Best for two neural network models, BERT and MLP → good estimation of posterior
probabilities
RESULTS
● It is very important to assess and select model considering multiple metrics!
RESULTS - RUMOUREVAL2017 AND RUMOUREVAL2019
RESULTS - RUMOUREVAL2017
RESULTS - RUMOUREVAL2019
EXPLORING MORE DEEP LEARNING
EXPLORING MORE DEEP LEARNING
EXPLORING MORE DEEP LEARNING
RESULTS - DEEP LEARNING
CONCLUSIONS
➢ Feature-based approaches can still be competitive
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
➢ Clever ways of using thread information may help
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
➢ Clever ways of using thread information may help
➢ Evaluation needs to be more detailed
RE-EVALUATING STANCE
CLASSIFICATION TASK
Carolina Scarton, Diego Furtado Silva and Kalina Bontcheva (to appear): Measuring What Counts: The case of Rumour Stance
Classification. AACL 2020.
RUMOUREVAL 2017 → ACCURACY SCORE
WINNER - ACC: 0.784
RUMOUREVAL 2017 → ACCURACY SCORE
5th - ACC: 0.709 7th - ACC: 0.641
RUMOUREVAL 2019 → MACRO-F1
WINNER - macro-F1: 0.619
RUMOUREVAL 2019 → MACRO-F1
3rd - macro-F1: 0.578
RUMOUREVAL 2019 → MACRO-F1
7th - macro-F1: 0.370
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
weighted version of macro-Fβ
β = 1 → precision and recall have same importance
β > 1 → recall has more importance
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
weighted version of macro-Fβ
β = 1 → precision and recall have same importance
β > 1 → recall has more importance
Weights → empirically
defined
wsupport
= 0.40
wdeny
= 0.40
wquery
= 0.15
wcomment
= 0.05
RUMOUREVAL 2017
RUMOUREVAL 2017 → WF2
WINNER - wF2: 0.296 2nd - wF2: 0.294
RUMOUREVAL 2017 → WF2
7th - wF2: 0.230
RUMOUREVAL 2017 → ACCURACY SCORE
1st - wF2: 0.509 2nd - wF2: 0.506 3rd - wF2: 0.499
RUMOUREVAL 2019
RUMOUREVAL 2019 → WF2
WINNER - wF2: 0.602
RUMOUREVAL 2019 → WF2
4th - wF2: 0.325
RUMOUREVAL 2019 → WF2
2nd - wF2: 0.514 3rd - wF2: 0.505
WEIGHTS DISCUSSION
➢ Weights need to:
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
Weights only based only on data distribution:
Mama Edha:
- wsupport
= 0.157
- wdeny
= 0.396
- wquery
= 0.399
- wcomment
= 0.048
UPV:
- wsupport
= 0.200
- wdeny
= 0.350
- wquery
= 0.350
- wcomment
= 0.100
WEIGHTS DISCUSSION
➢ Weights need to:
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
Weights only based only on data distribution:
Mama Edha:
- wsupport
= 0.157
- wdeny
= 0.396
- wquery
= 0.399
- wcomment
= 0.048
UPV:
- wsupport
= 0.200
- wdeny
= 0.350
- wquery
= 0.350
- wcomment
= 0.100
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
➢ Weighted metrics → higher priority to most important classes
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
➢ Weighted metrics → higher priority to most important classes
Ideal evaluation: takes into account multiple metrics!
THANK YOU FOR YOUR ATTENTION!
www.weverify.eu
@WeVerify
Thanks to Yue Li for a lot of the slides (and work done!)
Collaboration with Kalina Bontcheva and Diego Silva

More Related Content

Similar to Stance classification - Presentation QMUL by Carolina Scarton, USFD

About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveLEARN Project
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabusBora Yüret
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong
 
Resume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsResume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsJinho Choi
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentationMithul Murugaadev
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
 
Presentation slides 2
Presentation slides 2Presentation slides 2
Presentation slides 2HanimHanem
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6Roger Barga
 
How to Use Machine Learning as a Product Manager by Wework PM
 How to Use Machine Learning as a Product Manager by Wework PM How to Use Machine Learning as a Product Manager by Wework PM
How to Use Machine Learning as a Product Manager by Wework PMProduct School
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in PracticeHobsons
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)Kira
 

Similar to Stance classification - Presentation QMUL by Carolina Scarton, USFD (20)

About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning Perspective
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
RBM Links - Sep15
RBM Links - Sep15RBM Links - Sep15
RBM Links - Sep15
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabus
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
Resume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsResume Classification with Term Attention Embeddings
Resume Classification with Term Attention Embeddings
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Lead Media Manager - Alex Sofronas, DirecTV
Lead Media Manager  - Alex Sofronas, DirecTVLead Media Manager  - Alex Sofronas, DirecTV
Lead Media Manager - Alex Sofronas, DirecTV
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
 
Presentation slides 2
Presentation slides 2Presentation slides 2
Presentation slides 2
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
User Personality and the New User Problem in a Context-­‐Aware POI Recommende...
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
 
How to Use Machine Learning as a Product Manager by Wework PM
 How to Use Machine Learning as a Product Manager by Wework PM How to Use Machine Learning as a Product Manager by Wework PM
How to Use Machine Learning as a Product Manager by Wework PM
 
Predictive Analytics in Practice
Predictive Analytics in PracticePredictive Analytics in Practice
Predictive Analytics in Practice
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Tutorial 12 (click models)
Tutorial 12 (click models)Tutorial 12 (click models)
Tutorial 12 (click models)
 

More from Weverify

Operation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusionOperation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusionWeverify
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detectionWeverify
 
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...Weverify
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detectionWeverify
 
Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.Weverify
 
L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020Weverify
 
We verify balkan disinformation panel
We verify balkan disinformation panelWe verify balkan disinformation panel
We verify balkan disinformation panelWeverify
 
Text analysis for disinformation detection 17 dec 2020
Text analysis for disinformation detection    17 dec 2020Text analysis for disinformation detection    17 dec 2020
Text analysis for disinformation detection 17 dec 2020Weverify
 
20200112 EUDL training for adult learners
20200112 EUDL training for adult learners20200112 EUDL training for adult learners
20200112 EUDL training for adult learnersWeverify
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020Weverify
 
2nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 20212nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 2021Weverify
 
EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021Weverify
 
Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel Weverify
 
Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020Weverify
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021Weverify
 
We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020Weverify
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020Weverify
 
Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020Weverify
 
WeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFPWeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFPWeverify
 
WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020Weverify
 

More from Weverify (20)

Operation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusionOperation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusion
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
 
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
 
Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.
 
L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020
 
We verify balkan disinformation panel
We verify balkan disinformation panelWe verify balkan disinformation panel
We verify balkan disinformation panel
 
Text analysis for disinformation detection 17 dec 2020
Text analysis for disinformation detection    17 dec 2020Text analysis for disinformation detection    17 dec 2020
Text analysis for disinformation detection 17 dec 2020
 
20200112 EUDL training for adult learners
20200112 EUDL training for adult learners20200112 EUDL training for adult learners
20200112 EUDL training for adult learners
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
 
2nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 20212nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 2021
 
EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021
 
Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel
 
Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021
 
We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
 
Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020
 
WeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFPWeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFP
 
WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020
 

Recently uploaded

Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...
Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...
Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...makika9823
 
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsPooja Nehwal
 
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...AJHSSR Journal
 
Angela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一ra6e69ou
 
Online Social Shopping Motivation: A Preliminary Study
Online Social Shopping Motivation: A Preliminary StudyOnline Social Shopping Motivation: A Preliminary Study
Online Social Shopping Motivation: A Preliminary StudyAJHSSR Journal
 
9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings RepublikGenuineGirls
 
Spotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of FloridaSpotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of Floridajorirz24
 
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeMs Riya
 
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...baharayali
 
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comUnlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comSagar Sinha
 
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecCall Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecSapana Sha
 
Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...AJHSSR Journal
 
Dubai Call Girls O528786472 Diabolic Call Girls In Dubai
Dubai Call Girls O528786472 Diabolic Call Girls In DubaiDubai Call Girls O528786472 Diabolic Call Girls In Dubai
Dubai Call Girls O528786472 Diabolic Call Girls In Dubaihf8803863
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceSapana Sha
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfTirupati Social Media
 

Recently uploaded (20)

Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...
Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...
Independent Escorts Lucknow 8923113531 WhatsApp luxurious locale in your city...
 
9953056974 Young Call Girls In Kirti Nagar Indian Quality Escort service
9953056974 Young Call Girls In  Kirti Nagar Indian Quality Escort service9953056974 Young Call Girls In  Kirti Nagar Indian Quality Escort service
9953056974 Young Call Girls In Kirti Nagar Indian Quality Escort service
 
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9167673311 Book Hot And Sexy Girls
 
young Call girls in Dwarka sector 23🔝 9953056974 🔝 Delhi escort Service
young Call girls in Dwarka sector 23🔝 9953056974 🔝 Delhi escort Serviceyoung Call girls in Dwarka sector 23🔝 9953056974 🔝 Delhi escort Service
young Call girls in Dwarka sector 23🔝 9953056974 🔝 Delhi escort Service
 
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
IMPACT OF FISCAL POLICY AND MONETARY POLICY ON THE ECONOMIC GROWTH OF NIGERIA...
 
Angela Killian | Operations Director | Dallas
Angela Killian | Operations Director | DallasAngela Killian | Operations Director | Dallas
Angela Killian | Operations Director | Dallas
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
 
Online Social Shopping Motivation: A Preliminary Study
Online Social Shopping Motivation: A Preliminary StudyOnline Social Shopping Motivation: A Preliminary Study
Online Social Shopping Motivation: A Preliminary Study
 
9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik9990611130 Find & Book Russian Call Girls In Crossings Republik
9990611130 Find & Book Russian Call Girls In Crossings Republik
 
FULL ENJOY Call Girls In Mohammadpur (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Mohammadpur  (Delhi) Call Us 9953056974FULL ENJOY Call Girls In Mohammadpur  (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Mohammadpur (Delhi) Call Us 9953056974
 
Spotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of FloridaSpotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of Florida
 
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
 
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
 
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.comUnlock Your Social Media Potential with IndianLikes - IndianLikes.com
Unlock Your Social Media Potential with IndianLikes - IndianLikes.com
 
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts ServiecCall Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
Call Girls In Noida Mall Of Noida O9654467111 Escorts Serviec
 
Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...Impact Of Educational Resources on Students' Academic Performance in Economic...
Impact Of Educational Resources on Students' Academic Performance in Economic...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
Delhi  99530 vip 56974  Genuine Escort Service Call Girls in MasudpurDelhi  99530 vip 56974  Genuine Escort Service Call Girls in Masudpur
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
 
Dubai Call Girls O528786472 Diabolic Call Girls In Dubai
Dubai Call Girls O528786472 Diabolic Call Girls In DubaiDubai Call Girls O528786472 Diabolic Call Girls In Dubai
Dubai Call Girls O528786472 Diabolic Call Girls In Dubai
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdf
 

Stance classification - Presentation QMUL by Carolina Scarton, USFD

  • 1. REVISITING AND RE-EVALUATING RUMOUR STANCE CLASSIFICATION Queen Mary University London, 11th November 2020 Carolina Scarton c.scarton@sheffield.ac.uk carolscarton
  • 2. A LITTLE BIT ABOUT MYSELF... ➢ UG and MSc from the University of São Paulo, Brazil (2013)
  • 3. A LITTLE BIT ABOUT MYSELF... ➢ UG and MSc from the University of São Paulo, Brazil (2013) ➢ PhD from the University of Sheffield (2017)
  • 4. A LITTLE BIT ABOUT MYSELF... ➢ UG and MSc from the University of São Paulo, Brazil (2013) ➢ PhD from the University of Sheffield (2017) ➢ Research interests: • Machine Translation • Text Simplification • NLP for social media • Multi-word expressions processing • NLP evaluation • Personalised NLP • NLP for healthcare • …
  • 6. ONLINE RUMOURS “circulating story of questionable veracity, which is apparently credible but hard to verify, and produces sufficient skepticism and/or anxiety so as to motivate finding out the actual truth” (Zubiaga et al., 2015)
  • 7. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 8. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 9. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 10. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 11. RUMOUR STANCE CLASSIFICATION ➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010; Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
  • 12. RUMOUR STANCE CLASSIFICATION ➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010; Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016) ➢ However, • four-class classification problem • support, deny, query, comment • Highly imbalanced problem • Support and denies • most important classes • Different from traditional stance classification task
  • 13. RUMOUR STANCE CLASSIFICATION ➢ RumourEval 2017 and 2019 → most used datasets (PHEME project) • Task A: rumour stance classification
  • 14. RUMOUR STANCE CLASSIFICATION ➢ RumourEval 2017 and 2019 → most used datasets (PHEME project) • Task A: rumour stance classification • Current models and official evaluation metrics: • not robust for four-class imbalanced problems • not robust for problems where classes have different importance
  • 15. RUMOUREVAL 2017 → ACCURACY SCORE WINNER - ACC: 0.784
  • 16. RUMOUREVAL 2017 → ACCURACY SCORE Adjusted weights
  • 17. RUMOUREVAL 2017 → ACCURACY SCORE Two-step classification
  • 18. RUMOUREVAL 2017 → ACCURACY SCORE Over-sampling
  • 19. DEALING WITH IMBALANCED DATA FOR STANCE CLASSIFICATION Yue Li and Carolina Scarton (to appear): Revisiting Rumour Stance Classification: Dealing with Imbalanced Data. RDSM 2020.
  • 20. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding)
  • 21. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc.
  • 22. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc. • Textual features (Aker et al., 2017): • sentiment analysis • emoticon analysis • has slang or curse word • surprise/doubt scores, etc.
  • 23. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc. • Textual features (Aker et al., 2017): • sentiment analysis • emoticon analysis • has slang or curse word • surprise/doubt scores, etc. macro-F1: 0.486
  • 24. … LOOKING INTO SOTA ➢ RumourEval 2017 data ➢ BERT model → fine-tuning BERT for stance classification task macro-F1: 0.516
  • 25. … LOOKING INTO SOTA ➢ RumourEval 2017 data ➢ BERT model → fine-tuning BERT for stance classification task macro-F1: 0.516 macro-F1: 0.486
  • 26. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS
  • 27. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation
  • 28. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation • Hybrid sampling: SMOTEEN → data cleaning
  • 29. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation • Hybrid sampling: SMOTEEN → data cleaning ➢ Learning-based approach: threshold moving (TM) → changing probabilities of predicted classes
  • 30. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set
  • 31. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling
  • 32. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability
  • 33. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability ➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
  • 34. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability ➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR) ➢ Feature-based classifiers: LR, RF, MLP
  • 36. RESULTS ● RUS → improves the performance of feature-based classifiers
  • 37. RESULTS ● TM is similar to RUS ● Best for two neural network models, BERT and MLP → good estimation of posterior probabilities
  • 38. RESULTS ● It is very important to assess and select model considering multiple metrics!
  • 39. RESULTS - RUMOUREVAL2017 AND RUMOUREVAL2019
  • 45. RESULTS - DEEP LEARNING
  • 46. CONCLUSIONS ➢ Feature-based approaches can still be competitive
  • 47. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches
  • 48. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies
  • 49. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies ➢ Clever ways of using thread information may help
  • 50. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies ➢ Clever ways of using thread information may help ➢ Evaluation needs to be more detailed
  • 51. RE-EVALUATING STANCE CLASSIFICATION TASK Carolina Scarton, Diego Furtado Silva and Kalina Bontcheva (to appear): Measuring What Counts: The case of Rumour Stance Classification. AACL 2020.
  • 52. RUMOUREVAL 2017 → ACCURACY SCORE WINNER - ACC: 0.784
  • 53. RUMOUREVAL 2017 → ACCURACY SCORE 5th - ACC: 0.709 7th - ACC: 0.641
  • 54. RUMOUREVAL 2019 → MACRO-F1 WINNER - macro-F1: 0.619
  • 55. RUMOUREVAL 2019 → MACRO-F1 3rd - macro-F1: 0.578
  • 56. RUMOUREVAL 2019 → MACRO-F1 7th - macro-F1: 0.370
  • 57. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny
  • 58. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class
  • 59. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR
  • 60. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR weighted version of macro-Fβ β = 1 → precision and recall have same importance β > 1 → recall has more importance
  • 61. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR weighted version of macro-Fβ β = 1 → precision and recall have same importance β > 1 → recall has more importance Weights → empirically defined wsupport = 0.40 wdeny = 0.40 wquery = 0.15 wcomment = 0.05
  • 63. RUMOUREVAL 2017 → WF2 WINNER - wF2: 0.296 2nd - wF2: 0.294
  • 64. RUMOUREVAL 2017 → WF2 7th - wF2: 0.230
  • 65. RUMOUREVAL 2017 → ACCURACY SCORE 1st - wF2: 0.509 2nd - wF2: 0.506 3rd - wF2: 0.499
  • 67. RUMOUREVAL 2019 → WF2 WINNER - wF2: 0.602
  • 68. RUMOUREVAL 2019 → WF2 4th - wF2: 0.325
  • 69. RUMOUREVAL 2019 → WF2 2nd - wF2: 0.514 3rd - wF2: 0.505
  • 70. WEIGHTS DISCUSSION ➢ Weights need to: • Deal with data imbalance • Give higher value to the most important classes: support and deny Weights only based only on data distribution: Mama Edha: - wsupport = 0.157 - wdeny = 0.396 - wquery = 0.399 - wcomment = 0.048 UPV: - wsupport = 0.200 - wdeny = 0.350 - wquery = 0.350 - wcomment = 0.100
  • 71. WEIGHTS DISCUSSION ➢ Weights need to: • Deal with data imbalance • Give higher value to the most important classes: support and deny Weights only based only on data distribution: Mama Edha: - wsupport = 0.157 - wdeny = 0.396 - wquery = 0.399 - wcomment = 0.048 UPV: - wsupport = 0.200 - wdeny = 0.350 - wquery = 0.350 - wcomment = 0.100
  • 72. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem
  • 73. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes
  • 74. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes ➢ Weighted metrics → higher priority to most important classes
  • 75. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes ➢ Weighted metrics → higher priority to most important classes Ideal evaluation: takes into account multiple metrics!
  • 76. THANK YOU FOR YOUR ATTENTION! www.weverify.eu @WeVerify Thanks to Yue Li for a lot of the slides (and work done!) Collaboration with Kalina Bontcheva and Diego Silva