SlideShare a Scribd company logo
REVISITING AND RE-EVALUATING RUMOUR STANCE CLASSIFICATION
University of Cambridge, 22nd January 2021
Carolina Scarton
c.scarton@sheffield.ac.uk
carolscarton
INTRODUCTION
ONLINE RUMOURS
“circulating story of questionable veracity,
which is apparently credible but hard to verify,
and produces sufficient skepticism and/or
anxiety so as to motivate finding out the actual
truth” (Zubiaga et al., 2015)
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ What is being said about a rumour?
RUMOUR STANCE CLASSIFICATION
➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010;
Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
RUMOUR STANCE CLASSIFICATION
➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010;
Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
➢ However,
• four-class classification problem
• support, deny, query, comment
• Highly imbalanced problem
• Support and denies
• most important classes
• Different from traditional stance classification task
RUMOUR STANCE CLASSIFICATION
➢ RumourEval 2017 and 2019 → most used datasets (PHEME project)
• Task A: rumour stance classification
RUMOUR STANCE CLASSIFICATION
➢ RumourEval 2017 and 2019 → most used datasets (PHEME project)
• Task A: rumour stance classification
• Current models and official evaluation metrics:
• not robust for four-class imbalanced problems
• not robust for problems where classes have different importance
RUMOUREVAL 2017 → ACCURACY SCORE
WINNER - ACC: 0.784
RUMOUREVAL 2017 → ACCURACY SCORE
Adjusted weights
RUMOUREVAL 2017 → ACCURACY SCORE
Two-step classification
RUMOUREVAL 2017 → ACCURACY SCORE
Over-sampling
DEALING WITH IMBALANCED DATA
FOR STANCE CLASSIFICATION
Yue Li and Carolina Scarton (2020): Revisiting Rumour Stance Classification: Dealing with Imbalanced Data. RDSM 2020.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
• Textual features (Aker et al., 2017):
• sentiment analysis
• emoticon analysis
• has slang or curse word
• surprise/doubt scores, etc.
GOING BACK TO BASICS...
➢ RumourEval 2017 data
➢ Feature-based classifier:
• Glove word embeddings (average for Twitter embedding)
• Features from Twitter metadata (Aker et al., 2017):
• number of replies
• has URL
• verified account
• number of followers, etc.
• Textual features (Aker et al., 2017):
• sentiment analysis
• emoticon analysis
• has slang or curse word
• surprise/doubt scores, etc.
macro-F1: 0.486
… LOOKING INTO SOTA
➢ RumourEval 2017 data
➢ BERT model → fine-tuning BERT for stance classification task
macro-F1: 0.516
… LOOKING INTO SOTA
➢ RumourEval 2017 data
➢ BERT model → fine-tuning BERT for stance classification task
macro-F1: 0.516 macro-F1: 0.486
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
• Hybrid sampling: SMOTEEN → data cleaning
DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS)
➢ Data-based approaches:
• Random over and undersampling: ROS and RUS
• Synthetic over-sampling:
• SMOTE: k-nearest neighbours of each observation in the
minority class
• ADASYN: level of hardness of learning the data
observation
• Hybrid sampling: SMOTEEN → data cleaning
➢ Learning-based approach: threshold moving (TM) →
changing probabilities of predicted classes
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
METHODOLOGY - MODEL SELECTION
➢ Training data: RumourEval 2017 training set
➢ Evaluation: RumourEval 2017 test set
➢ Training Process: 4-fold cross validation for hyperparameter
tuning, including the parameter in synthetic over-sampling
➢ Each experiment is run 10 times to assess the model stability
➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
➢ Feature-based classifiers: LR, RF, MLP
RESULTS
RESULTS
● RUS → improves the performance of feature-based classifiers
RESULTS
● TM is similar to RUS
● Best for two neural network models, BERT and MLP → good estimation of posterior
probabilities
RESULTS
● It is very important to assess and select model considering multiple metrics!
RESULTS - RUMOUREVAL2017 AND RUMOUREVAL2019
RESULTS - RUMOUREVAL2017
RESULTS - RUMOUREVAL2019
EXPLORING MORE DEEP LEARNING
EXPLORING MORE DEEP LEARNING
EXPLORING MORE DEEP LEARNING
RESULTS - DEEP LEARNING
CONCLUSIONS
➢ Feature-based approaches can still be competitive
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
➢ Clever ways of using thread information may help
CONCLUSIONS
➢ Feature-based approaches can still be competitive
➢ Traditional methods for dealing with imbalanced data improve both
feature-based and BERT-based approaches
➢ BERT-based approaches → SOTA
• Still room for improvements → support and denies
➢ Clever ways of using thread information may help
➢ Evaluation needs to be more detailed
RE-EVALUATING STANCE
CLASSIFICATION TASK
Carolina Scarton, Diego Furtado Silva and Kalina Bontcheva (2020): Measuring What Counts: The case of Rumour Stance
Classification. AACL 2020.
RUMOUREVAL 2017 → ACCURACY SCORE
WINNER - ACC: 0.784
RUMOUREVAL 2017 → ACCURACY SCORE
5th - ACC: 0.709 7th - ACC: 0.641
RUMOUREVAL 2019 → MACRO-F1
WINNER - macro-F1: 0.619
RUMOUREVAL 2019 → MACRO-F1
3rd - macro-F1: 0.578
RUMOUREVAL 2019 → MACRO-F1
7th - macro-F1: 0.370
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
weighted version of macro-Fβ
β = 1 → precision and recall have same importance
β > 1 → recall has more importance
RUMOUR STANCE CLASSIFICATION EVALUATION
➢ New metrics are needed to reliably evaluate models
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
heavily penalises models that achieves a low score
for a given class
weighted version of AUC
ROC → relationship between R and FPR
weighted version of macro-Fβ
β = 1 → precision and recall have same importance
β > 1 → recall has more importance
Weights → empirically
defined
wsupport
= 0.40
wdeny
= 0.40
wquery
= 0.15
wcomment
= 0.05
RUMOUREVAL 2017
RUMOUREVAL 2017 → WF2
WINNER - wF2: 0.296 2nd - wF2: 0.294
RUMOUREVAL 2017 → WF2
7th - wF2: 0.230
RUMOUREVAL 2017 → ACCURACY SCORE
1st - wF2: 0.509 2nd - wF2: 0.506 3rd - wF2: 0.499
RUMOUREVAL 2019
RUMOUREVAL 2019 → WF2
WINNER - wF2: 0.602
RUMOUREVAL 2019 → WF2
4th - wF2: 0.325
RUMOUREVAL 2019 → WF2
2nd - wF2: 0.514 3rd - wF2: 0.505
WEIGHTS DISCUSSION
➢ Weights need to:
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
Weights only based only on data distribution:
Mama Edha:
- wsupport
= 0.157
- wdeny
= 0.396
- wquery
= 0.399
- wcomment
= 0.048
UPV:
- wsupport
= 0.200
- wdeny
= 0.350
- wquery
= 0.350
- wcomment
= 0.100
WEIGHTS DISCUSSION
➢ Weights need to:
• Deal with data imbalance
• Give higher value to the most important classes: support and deny
Weights only based only on data distribution:
Mama Edha:
- wsupport
= 0.157
- wdeny
= 0.396
- wquery
= 0.399
- wcomment
= 0.048
UPV:
- wsupport
= 0.200
- wdeny
= 0.350
- wquery
= 0.350
- wcomment
= 0.100
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
➢ Weighted metrics → higher priority to most important classes
CONCLUSION
➢ Evaluation needs to take into account the task purposes:
• Rumour Stance Classification → improve veracity classification / rumour analysis
• Most informative classes: support and deny
• Highly imbalanced four-class classification problem
➢ Recall based metrics → higher priority to minority classes
➢ Weighted metrics → higher priority to most important classes
Ideal evaluation: takes into account multiple metrics!
THANK YOU FOR YOUR ATTENTION!
www.weverify.eu
@WeVerify
Try yourself: https://cloud.gate.ac.uk/shopfront#tagged=WeVerify
Thanks to Yue Li for a lot of the slides (and work done!)
Collaboration with Kalina Bontcheva and Diego Silva

More Related Content

Similar to Stance classification. Uni Cambridge 22 Jan 2021

Resume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsResume Classification with Term Attention Embeddings
Resume Classification with Term Attention Embeddings
Jinho Choi
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
Mixed methods research2012
Mixed methods research2012Mixed methods research2012
Mixed methods research2012
Gus Cons
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
CasylouMendozaBorqui
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabusBora Yüret
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentationArun Kumar
 
Anukriti Katiyar MBA-2.pptx
Anukriti Katiyar MBA-2.pptxAnukriti Katiyar MBA-2.pptx
Anukriti Katiyar MBA-2.pptx
qy4015807
 
NIHR Complex Reviews Support Unit (CRSU) - An Introduction
NIHR Complex Reviews Support Unit (CRSU) - An IntroductionNIHR Complex Reviews Support Unit (CRSU) - An Introduction
NIHR Complex Reviews Support Unit (CRSU) - An Introduction
HEHTAslides
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
John Kinmonth
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
Besnik Fetahu
 
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&DCharles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter, PhD
 
Data Analysis
Data AnalysisData Analysis
Designing Rubrics for Competency-based Education
Designing Rubrics for Competency-based EducationDesigning Rubrics for Competency-based Education
Designing Rubrics for Competency-based Education
Kyle Peck
 
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in MalwareMeasuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
Priyanka Aash
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
Ludovik Coba
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
Ahmad Jawdat
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
saurami
 

Similar to Stance classification. Uni Cambridge 22 Jan 2021 (20)

Resume Classification with Term Attention Embeddings
Resume Classification with Term Attention EmbeddingsResume Classification with Term Attention Embeddings
Resume Classification with Term Attention Embeddings
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Mixed methods research2012
Mixed methods research2012Mixed methods research2012
Mixed methods research2012
 
RBM Links - Sep15
RBM Links - Sep15RBM Links - Sep15
RBM Links - Sep15
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
 
udacity-dandsyllabus
udacity-dandsyllabusudacity-dandsyllabus
udacity-dandsyllabus
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentation
 
Anukriti Katiyar MBA-2.pptx
Anukriti Katiyar MBA-2.pptxAnukriti Katiyar MBA-2.pptx
Anukriti Katiyar MBA-2.pptx
 
NIHR Complex Reviews Support Unit (CRSU) - An Introduction
NIHR Complex Reviews Support Unit (CRSU) - An IntroductionNIHR Complex Reviews Support Unit (CRSU) - An Introduction
NIHR Complex Reviews Support Unit (CRSU) - An Introduction
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Charles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&DCharles Cotter's PhD research findings & recommendations_Strategic L&D
Charles Cotter's PhD research findings & recommendations_Strategic L&D
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Designing Rubrics for Competency-based Education
Designing Rubrics for Competency-based EducationDesigning Rubrics for Competency-based Education
Designing Rubrics for Competency-based Education
 
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in MalwareMeasuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
Measuring the Speed of the Red Queen's Race; Adaption and Evasion in Malware
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
 
Reputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender SystemsReputation Model Based on Rating Data and Application in Recommender Systems
Reputation Model Based on Rating Data and Application in Recommender Systems
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 

More from Weverify

Operation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusionOperation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusion
Weverify
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
Weverify
 
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
Weverify
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
Weverify
 
Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.
Weverify
 
L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020
Weverify
 
We verify balkan disinformation panel
We verify balkan disinformation panelWe verify balkan disinformation panel
We verify balkan disinformation panel
Weverify
 
Text analysis for disinformation detection 17 dec 2020
Text analysis for disinformation detection    17 dec 2020Text analysis for disinformation detection    17 dec 2020
Text analysis for disinformation detection 17 dec 2020
Weverify
 
20200112 EUDL training for adult learners
20200112 EUDL training for adult learners20200112 EUDL training for adult learners
20200112 EUDL training for adult learners
Weverify
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
Weverify
 
2nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 20212nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 2021
Weverify
 
EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021
Weverify
 
Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel
Weverify
 
Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020
Weverify
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021
Weverify
 
We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020
Weverify
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
Weverify
 
Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020
Weverify
 
WeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFPWeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFP
Weverify
 
WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020
Weverify
 

More from Weverify (20)

Operation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusionOperation wise attention network for tampering localization fusion
Operation wise attention network for tampering localization fusion
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
 
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...TTO2021: Cross-Lingual Rumour Stance Classification:  a First Study with BERT...
TTO2021: Cross-Lingual Rumour Stance Classification: a First Study with BERT...
 
MeVer tools for disinformation detection
MeVer tools for disinformation detectionMeVer tools for disinformation detection
MeVer tools for disinformation detection
 
Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.Operation-wise Attention Network for Tampering Localization Fusion.
Operation-wise Attention Network for Tampering Localization Fusion.
 
L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020L tvs disinfo - 24 nov 2020
L tvs disinfo - 24 nov 2020
 
We verify balkan disinformation panel
We verify balkan disinformation panelWe verify balkan disinformation panel
We verify balkan disinformation panel
 
Text analysis for disinformation detection 17 dec 2020
Text analysis for disinformation detection    17 dec 2020Text analysis for disinformation detection    17 dec 2020
Text analysis for disinformation detection 17 dec 2020
 
20200112 EUDL training for adult learners
20200112 EUDL training for adult learners20200112 EUDL training for adult learners
20200112 EUDL training for adult learners
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
 
2nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 20212nd workshop em data science 08 02 2021
2nd workshop em data science 08 02 2021
 
EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021EDMO workshop 17 Feb 2021
EDMO workshop 17 Feb 2021
 
Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel
 
Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020Qurator keynote berlin 2101 2020
Qurator keynote berlin 2101 2020
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021
 
We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020We verify @ meta forum 2020 - Dec 2 2020
We verify @ meta forum 2020 - Dec 2 2020
 
#Semiform2020 02 11 2020
#Semiform2020 02 11 2020#Semiform2020 02 11 2020
#Semiform2020 02 11 2020
 
Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020Rumour Stance Classification. Presentation at AACL 2020
Rumour Stance Classification. Presentation at AACL 2020
 
WeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFPWeVerify at EBU by Denis Teyssou, AFP
WeVerify at EBU by Denis Teyssou, AFP
 
WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020WeVerify at DeepTech2020, March 2020
WeVerify at DeepTech2020, March 2020
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 

Stance classification. Uni Cambridge 22 Jan 2021

  • 1. REVISITING AND RE-EVALUATING RUMOUR STANCE CLASSIFICATION University of Cambridge, 22nd January 2021 Carolina Scarton c.scarton@sheffield.ac.uk carolscarton
  • 3. ONLINE RUMOURS “circulating story of questionable veracity, which is apparently credible but hard to verify, and produces sufficient skepticism and/or anxiety so as to motivate finding out the actual truth” (Zubiaga et al., 2015)
  • 4. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 5. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 6. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 7. RUMOUR STANCE CLASSIFICATION ➢ What is being said about a rumour?
  • 8. RUMOUR STANCE CLASSIFICATION ➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010; Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016)
  • 9. RUMOUR STANCE CLASSIFICATION ➢ Stance of replies can help in predicting veracity (Mendoza et al., 2010; Kumar and Carley, 2019) → specially denies (Zubiaga et al., 2016) ➢ However, • four-class classification problem • support, deny, query, comment • Highly imbalanced problem • Support and denies • most important classes • Different from traditional stance classification task
  • 10. RUMOUR STANCE CLASSIFICATION ➢ RumourEval 2017 and 2019 → most used datasets (PHEME project) • Task A: rumour stance classification
  • 11. RUMOUR STANCE CLASSIFICATION ➢ RumourEval 2017 and 2019 → most used datasets (PHEME project) • Task A: rumour stance classification • Current models and official evaluation metrics: • not robust for four-class imbalanced problems • not robust for problems where classes have different importance
  • 12. RUMOUREVAL 2017 → ACCURACY SCORE WINNER - ACC: 0.784
  • 13. RUMOUREVAL 2017 → ACCURACY SCORE Adjusted weights
  • 14. RUMOUREVAL 2017 → ACCURACY SCORE Two-step classification
  • 15. RUMOUREVAL 2017 → ACCURACY SCORE Over-sampling
  • 16. DEALING WITH IMBALANCED DATA FOR STANCE CLASSIFICATION Yue Li and Carolina Scarton (2020): Revisiting Rumour Stance Classification: Dealing with Imbalanced Data. RDSM 2020.
  • 17. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding)
  • 18. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc.
  • 19. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc. • Textual features (Aker et al., 2017): • sentiment analysis • emoticon analysis • has slang or curse word • surprise/doubt scores, etc.
  • 20. GOING BACK TO BASICS... ➢ RumourEval 2017 data ➢ Feature-based classifier: • Glove word embeddings (average for Twitter embedding) • Features from Twitter metadata (Aker et al., 2017): • number of replies • has URL • verified account • number of followers, etc. • Textual features (Aker et al., 2017): • sentiment analysis • emoticon analysis • has slang or curse word • surprise/doubt scores, etc. macro-F1: 0.486
  • 21. … LOOKING INTO SOTA ➢ RumourEval 2017 data ➢ BERT model → fine-tuning BERT for stance classification task macro-F1: 0.516
  • 22. … LOOKING INTO SOTA ➢ RumourEval 2017 data ➢ BERT model → fine-tuning BERT for stance classification task macro-F1: 0.516 macro-F1: 0.486
  • 23. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS
  • 24. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation
  • 25. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation • Hybrid sampling: SMOTEEN → data cleaning
  • 26. DEALING WITH IMBALANCED DATA (TRADITIONAL METHODS) ➢ Data-based approaches: • Random over and undersampling: ROS and RUS • Synthetic over-sampling: • SMOTE: k-nearest neighbours of each observation in the minority class • ADASYN: level of hardness of learning the data observation • Hybrid sampling: SMOTEEN → data cleaning ➢ Learning-based approach: threshold moving (TM) → changing probabilities of predicted classes
  • 27. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set
  • 28. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling
  • 29. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability
  • 30. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability ➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR)
  • 31. METHODOLOGY - MODEL SELECTION ➢ Training data: RumourEval 2017 training set ➢ Evaluation: RumourEval 2017 test set ➢ Training Process: 4-fold cross validation for hyperparameter tuning, including the parameter in synthetic over-sampling ➢ Each experiment is run 10 times to assess the model stability ➢ Evaluation metrics: Macro-F1, geometric mean of Recall (GMR) ➢ Feature-based classifiers: LR, RF, MLP
  • 33. RESULTS ● RUS → improves the performance of feature-based classifiers
  • 34. RESULTS ● TM is similar to RUS ● Best for two neural network models, BERT and MLP → good estimation of posterior probabilities
  • 35. RESULTS ● It is very important to assess and select model considering multiple metrics!
  • 36. RESULTS - RUMOUREVAL2017 AND RUMOUREVAL2019
  • 42. RESULTS - DEEP LEARNING
  • 43. CONCLUSIONS ➢ Feature-based approaches can still be competitive
  • 44. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches
  • 45. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies
  • 46. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies ➢ Clever ways of using thread information may help
  • 47. CONCLUSIONS ➢ Feature-based approaches can still be competitive ➢ Traditional methods for dealing with imbalanced data improve both feature-based and BERT-based approaches ➢ BERT-based approaches → SOTA • Still room for improvements → support and denies ➢ Clever ways of using thread information may help ➢ Evaluation needs to be more detailed
  • 48. RE-EVALUATING STANCE CLASSIFICATION TASK Carolina Scarton, Diego Furtado Silva and Kalina Bontcheva (2020): Measuring What Counts: The case of Rumour Stance Classification. AACL 2020.
  • 49. RUMOUREVAL 2017 → ACCURACY SCORE WINNER - ACC: 0.784
  • 50. RUMOUREVAL 2017 → ACCURACY SCORE 5th - ACC: 0.709 7th - ACC: 0.641
  • 51. RUMOUREVAL 2019 → MACRO-F1 WINNER - macro-F1: 0.619
  • 52. RUMOUREVAL 2019 → MACRO-F1 3rd - macro-F1: 0.578
  • 53. RUMOUREVAL 2019 → MACRO-F1 7th - macro-F1: 0.370
  • 54. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny
  • 55. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class
  • 56. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR
  • 57. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR weighted version of macro-Fβ β = 1 → precision and recall have same importance β > 1 → recall has more importance
  • 58. RUMOUR STANCE CLASSIFICATION EVALUATION ➢ New metrics are needed to reliably evaluate models • Deal with data imbalance • Give higher value to the most important classes: support and deny heavily penalises models that achieves a low score for a given class weighted version of AUC ROC → relationship between R and FPR weighted version of macro-Fβ β = 1 → precision and recall have same importance β > 1 → recall has more importance Weights → empirically defined wsupport = 0.40 wdeny = 0.40 wquery = 0.15 wcomment = 0.05
  • 60. RUMOUREVAL 2017 → WF2 WINNER - wF2: 0.296 2nd - wF2: 0.294
  • 61. RUMOUREVAL 2017 → WF2 7th - wF2: 0.230
  • 62. RUMOUREVAL 2017 → ACCURACY SCORE 1st - wF2: 0.509 2nd - wF2: 0.506 3rd - wF2: 0.499
  • 64. RUMOUREVAL 2019 → WF2 WINNER - wF2: 0.602
  • 65. RUMOUREVAL 2019 → WF2 4th - wF2: 0.325
  • 66. RUMOUREVAL 2019 → WF2 2nd - wF2: 0.514 3rd - wF2: 0.505
  • 67. WEIGHTS DISCUSSION ➢ Weights need to: • Deal with data imbalance • Give higher value to the most important classes: support and deny Weights only based only on data distribution: Mama Edha: - wsupport = 0.157 - wdeny = 0.396 - wquery = 0.399 - wcomment = 0.048 UPV: - wsupport = 0.200 - wdeny = 0.350 - wquery = 0.350 - wcomment = 0.100
  • 68. WEIGHTS DISCUSSION ➢ Weights need to: • Deal with data imbalance • Give higher value to the most important classes: support and deny Weights only based only on data distribution: Mama Edha: - wsupport = 0.157 - wdeny = 0.396 - wquery = 0.399 - wcomment = 0.048 UPV: - wsupport = 0.200 - wdeny = 0.350 - wquery = 0.350 - wcomment = 0.100
  • 69. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem
  • 70. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes
  • 71. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes ➢ Weighted metrics → higher priority to most important classes
  • 72. CONCLUSION ➢ Evaluation needs to take into account the task purposes: • Rumour Stance Classification → improve veracity classification / rumour analysis • Most informative classes: support and deny • Highly imbalanced four-class classification problem ➢ Recall based metrics → higher priority to minority classes ➢ Weighted metrics → higher priority to most important classes Ideal evaluation: takes into account multiple metrics!
  • 73. THANK YOU FOR YOUR ATTENTION! www.weverify.eu @WeVerify Try yourself: https://cloud.gate.ac.uk/shopfront#tagged=WeVerify Thanks to Yue Li for a lot of the slides (and work done!) Collaboration with Kalina Bontcheva and Diego Silva