Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2018 @SEPLN 2018
1. Multimodal Stance Detection in Tweets on
Catalan #1Oct Referendum
@Ibereval 2018
Mariona Taulé, M. Antònia Martí, Universitat de Barcelona
Francisco Rangel, Autoritas Consulting &
Universitat Politècnica de València
Paolo Rosso, Universitat Politècnica de València
2. MultiStanceCat
• Introduction
• MultiStanceCat: Task Description
• TW-CaSe corpus
• Evaluation Framework
• Overview of the submitted approaches
• Conclusions
http://www.autoritas.net/MultiStanceCat-IberEval2018/
3. → Semeval-2016 task 6: Detecting stance in tweets → English
(Mohammad, S.M., et al. 2016)
→ IberEval-2017 StanceCat task 7→ Catalan and Spanish
(Taulé et al. (2017)
IberEval-2018: MultiModal Stance Detection in tweets on
Catalan #1Oct Referendum task (MultiStanceCat )
To detect the authors stances with respect to the 1October
Referendum (2017) in tweets written in Catalan and Spanish from a
multimodal perspective
Multimodality: images from author’s timeline
Contextual information: tweet before and after
Text of the tweet + link
MultiStanceCat: Introduction
4. MultiModal Stance Detection in tweets on Catalan
#1Oct Referendum task (MultiStanceCat )
Task related to Sentiment Analysis: the systems detect the positive, negative
or neutral polarity of the text BUT
stance detection: the systems detect whether a text message is
favorable or unfavorable to a topic of discussion, usually controversial,
and which may or may not be explicitly mentioned in the text message
1Oct Referendum: heated debate
→ Legitimate referendum (favor)
→ Illegal referendum (against)
MultiStanceCat: Introduction
5. MultiStanceCat: Task Description
MultiStanceCat Task
Deciding whether each message is neutral, in favor or
against the target: ‘Catalan first of October
Referendum’ from a multimodal perspective
Languages: Catalan and Spanish
6. MultiStanceCat: Corpus
• TW-1O Referendum corpus → 11,398 tweets
#1oct 1O
#oct2017 1oct16
[20/09/2017-30/09/2017]
TW-1OReferendum Training Test
Catalan 5,853 4,684 1,169
Spanish 5,545 4,437 1,108
Total 11,398 9,121 2,277
Cosmos tool (by Autoritas) 80% 20%
TW-1OReferendum
Catalan 87,449
Spanish 132,699
Total 220,148
8. MultiStanceCat: Corpus
Tweet: Res ni ningú, ens aturarà #Votarem #DretaDecidir #1Oct
#CatalunyaLliure #defensemlademocracia http://t.co/PgVLYH8AgN
Stance: FAVOR
'Nothing and nobody will stop us #Votarem #DretaDecidir #1Oct
#CatalunyaLliure #defensemlademocracia http://t.co/PgVLYH8AgN'
Tweet: Más q votos creo q estais usando personas jugando con sus
sentimientos SABIAIS q el #1Oct ES ILEGAL https://t.co/1SJcwn7LHd
Stance: AGAINST
'You know that more than votes you are using persons playing with their sentiments YOU KNOW that the
#1Oct IS ILLEGAL https://t.co/1SJcwn7LHd'
Tweet: Voteu! #1Oct ¿Crees que la respuesta del Estado al desafio
independentista catalán está siendo adecuada? https://t.co/LlZrkd20gh via
@20
Stance: NEUTRAL
'Vote! #1Oct Do you think that the State’s response to the Catalan pro-independence challenge is
appropriate? https://t.co/LlZrkd20gh vía @20m'
9. MultiStanceCat: Corpus
• Annotation procedure
– 1st
stage: Automatic annotation
List of preselected authors (0.32% of the total annotated tweets)
– 2nd stage: Manual annotation
1) 2 annotators tagged the stance in 500 Catalan tweets and in
500 Spanish tweets in parallel
2) 1st Interannotator Agreement Test (IAT)
3) annotators tagged 1,300 tweets in each language
4) 2nd IAT
•Annotation of the whole corpus individually
Annotators: 2 trained annotators + 3 seniors researchers
Meetings: once a week → problematic cases solved by common consensus
10. MultiStanceCat: Corpus
• Criteria:
– Writing text: emoticons, @mentions and #hashtags ✓
– Links (webpages, photographs, videos…) ✓
– Images on the authors timeline ✓
+Pragmatic information (knowledge about this topic)
11. MultiStanceCat: Corpus
• Interannotator Agreement Test: Results
Stance (N= 500) Text Text+Link
TW-1OReferendum-C
A
%Agreement 81.8% 86.2%
Kappa 0.63 0.76
TW-1OReferendum-E
S
%Agreement 67.3% 81.2%
Kappa 0.54 0.68
Stance (N=1,300) Text Text+Link
TW-1OReferendum-C
A
%Agreement 86.9% 89.4%
Kappa 0.73 0.82
TW-1OReferendum-E
S
%Agreement 68.1% 83.3%
Kappa 0.57 0.65
1stIAT2ndIAT
12. MultiStanceCat: Corpus
• Disagreements: Assignment of NEUTRAL tag unclear
Tweet: Coscubielibers! El nostre idol esta La Sexta! Parlara del Daniel?#1octL6
Stance: NEUTRAL
'Coscubielibers! Our idol is on La Sexta (TV Channel). Will he talk about
Daniel? #1octL6’
A= NEUTRAL B=AGAINST
13. MultiStanceCat: Corpus
• Disagreements: Irony
Tweet: Els RADIKALS abduits i antidemocratics que provoquen el TUMULTO
certament fan bastanta por... #referendumCAt #1O…https://t.co/nlEa8rkXTT
Stance: FAVOR
'These brainwashed,anti-democratic RADIKALS who caused this TUMULT
certainly generate fear...'#referendumCAt #1O…https://t.co/nlEa8rkXTT
A= FAVOR B=AGAINST
14. MultiStanceCat: Corpus
• Format and distribution: xml files
Training set: 80% of TW-1OReferendum
– The ID of the tweet
– The text of the tweet to be evaluated
– the contextual information: the tweet before and after the
tweet under evaluation
– the name of the image (up to 10 images) obtained from the
author's timeline.
Test set: 20% of TW-1OReferendum
– Xml files without truth values
15. • Distribution of stance labels
MultiStanceCat: Corpus
Stance
TW-1OReferendum-CA TW-1OReferendum-ES
Training Test Total Training Test Total
FAVOR 4,085 1,021 5,106 1,680 419 2,099
AGAINST 120 29 149 1,785 446 2,231
NEUTRAL 479 119 598 972 243 1,215
Total 4,684 1,169 5,853 4,437 1,108 5,545
18. StanceCat: Approaches
TEAM MODE APPROACH
Casacufans T & C Hashing Vectorized from scikit-learn + SVM
I CNN (the authors did not send a working note)
CriCa T & C Bag-of-Words, stemming and TF-IDF + Linear SVM
ELiRF T Lowercase, remove accents and dieresis, normalized
Twitter elements:
● RUN 1: Word Embeddings + CNN
● RUN 2:Character word n-grams + Linear SVM
uc3m T & C Bag-of-Words, TF-IDF + Linear SVM
21. StanceCat: Error Analysis
● In Catalan, more errors from Against to Favor. In Spanish, more errors
from Favor to Against
● In Catalan, errors from Favor to Against are minimal (0.08%)
26. StanceCat: Social Network Analysis
STANCE SEED NETWORK %
IN FAVOR 4,510 808,549 51.44%
AGAINST 1,478 393,405 25.03%
BOTH 27 214,411 13.64%
NEUTRAL 1,041 155,522 9.89%
TOTAL 7,056 1,571,887 100%
• Almost disconnected communities (13.64%)
– Independents more closed community (51.44% vs. 25.03%)
• Few neutral people (9.89%)
27. StanceCat: Conclusions
• Multimodal Stance Identification task:
– Only with text, text + context, text + context + images
– Catalan and Spanish
• Low participation (only one participant used images
• Challenging task (imbalanced data):
– In Catalan, most systems performed below the baseline
– In Spanish, the best performing system improved in 9% the
baseline
• The use of context:
– More than 30% in Catalan
– More than 20% in Spanish
• Echo chamber effect:
– There is a lack of interest in communicating with the other
community