Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2018 @SEPLN 2018

Multimodal Stance Detection in Tweets on
Catalan #1Oct Referendum
@Ibereval 2018
Mariona Taulé, M. Antònia Martí, Universitat de Barcelona
Francisco Rangel, Autoritas Consulting &
Universitat Politècnica de València
Paolo Rosso, Universitat Politècnica de València

MultiStanceCat
• Introduction
• MultiStanceCat: Task Description
• TW-CaSe corpus
• Evaluation Framework
• Overview of the submitted approaches
• Conclusions
http://www.autoritas.net/MultiStanceCat-IberEval2018/

→ Semeval-2016 task 6: Detecting stance in tweets → English
(Mohammad, S.M., et al. 2016)
→ IberEval-2017 StanceCat task 7→ Catalan and Spanish
(Taulé et al. (2017)
IberEval-2018: MultiModal Stance Detection in tweets on
Catalan #1Oct Referendum task (MultiStanceCat )
To detect the authors stances with respect to the 1October
Referendum (2017) in tweets written in Catalan and Spanish from a
multimodal perspective
Multimodality: images from author’s timeline
Contextual information: tweet before and after
Text of the tweet + link
MultiStanceCat: Introduction

MultiModal Stance Detection in tweets on Catalan
#1Oct Referendum task (MultiStanceCat )
Task related to Sentiment Analysis: the systems detect the positive, negative
or neutral polarity of the text BUT
stance detection: the systems detect whether a text message is
favorable or unfavorable to a topic of discussion, usually controversial,
and which may or may not be explicitly mentioned in the text message
1Oct Referendum: heated debate
→ Legitimate referendum (favor)
→ Illegal referendum (against)
MultiStanceCat: Introduction

MultiStanceCat: Task Description
MultiStanceCat Task
Deciding whether each message is neutral, in favor or
against the target: ‘Catalan first of October
Referendum’ from a multimodal perspective
Languages: Catalan and Spanish

MultiStanceCat: Corpus
• TW-1O Referendum corpus → 11,398 tweets
#1oct 1O
#oct2017 1oct16
[20/09/2017-30/09/2017]
TW-1OReferendum Training Test
Catalan 5,853 4,684 1,169
Spanish 5,545 4,437 1,108
Total 11,398 9,121 2,277
Cosmos tool (by Autoritas) 80% 20%
TW-1OReferendum
Catalan 87,449
Spanish 132,699
Total 220,148

•Annotation Scheme:
MultiStance Tags
–AGAINST: Negative stance
–FAVOR: Positive stance
–NEUTRAL: Neutral stance informative/reporting tweets
stance cannot be inferred

Tweet: Res ni ningú, ens aturarà #Votarem #DretaDecidir #1Oct
#CatalunyaLliure #defensemlademocracia http://t.co/PgVLYH8AgN
Stance: FAVOR
'Nothing and nobody will stop us #Votarem #DretaDecidir #1Oct
#CatalunyaLliure #defensemlademocracia http://t.co/PgVLYH8AgN'
Tweet: Más q votos creo q estais usando personas jugando con sus
sentimientos SABIAIS q el #1Oct ES ILEGAL https://t.co/1SJcwn7LHd
Stance: AGAINST
'You know that more than votes you are using persons playing with their sentiments YOU KNOW that the
#1Oct IS ILLEGAL https://t.co/1SJcwn7LHd'
Tweet: Voteu! #1Oct ¿Crees que la respuesta del Estado al desafio
independentista catalán está siendo adecuada? https://t.co/LlZrkd20gh via
@20
Stance: NEUTRAL
'Vote! #1Oct Do you think that the State’s response to the Catalan pro-independence challenge is
appropriate? https://t.co/LlZrkd20gh vía @20m'

• Annotation procedure
– 1st
stage: Automatic annotation
List of preselected authors (0.32% of the total annotated tweets)
– 2nd stage: Manual annotation
1) 2 annotators tagged the stance in 500 Catalan tweets and in
500 Spanish tweets in parallel
2) 1st Interannotator Agreement Test (IAT)
3) annotators tagged 1,300 tweets in each language
4) 2nd IAT
•Annotation of the whole corpus individually
Annotators: 2 trained annotators + 3 seniors researchers
Meetings: once a week → problematic cases solved by common consensus

• Criteria:
– Writing text: emoticons, @mentions and #hashtags ✓
– Links (webpages, photographs, videos…) ✓
– Images on the authors timeline ✓
+Pragmatic information (knowledge about this topic)

• Interannotator Agreement Test: Results
Stance (N= 500) Text Text+Link
TW-1OReferendum-C
A
%Agreement 81.8% 86.2%
Kappa 0.63 0.76
TW-1OReferendum-E
S
Kappa 0.54 0.68
Stance (N=1,300) Text Text+Link
TW-1OReferendum-C
A
Kappa 0.73 0.82
TW-1OReferendum-E
S
Kappa 0.57 0.65
1stIAT2ndIAT

• Disagreements: Assignment of NEUTRAL tag unclear
Tweet: Coscubielibers! El nostre idol esta La Sexta! Parlara del Daniel?#1octL6
Stance: NEUTRAL
'Coscubielibers! Our idol is on La Sexta (TV Channel). Will he talk about
Daniel? #1octL6’
A= NEUTRAL B=AGAINST

• Disagreements: Irony
Tweet: Els RADIKALS abduits i antidemocratics que provoquen el TUMULTO
certament fan bastanta por... #referendumCAt #1O…https://t.co/nlEa8rkXTT
Stance: FAVOR
'These brainwashed,anti-democratic RADIKALS who caused this TUMULT
certainly generate fear...'#referendumCAt #1O…https://t.co/nlEa8rkXTT
A= FAVOR B=AGAINST

• Format and distribution: xml files
Training set: 80% of TW-1OReferendum
– The ID of the tweet
– The text of the tweet to be evaluated
– the contextual information: the tweet before and after the
tweet under evaluation
– the name of the image (up to 10 images) obtained from the
author's timeline.
Test set: 20% of TW-1OReferendum
– Xml files without truth values

• Distribution of stance labels
Stance
TW-1OReferendum-CA TW-1OReferendum-ES
Training Test Total Training Test Total
FAVOR 4,085 1,021 5,106 1,680 419 2,099
AGAINST 120 29 149 1,785 446 2,231
NEUTRAL 479 119 598 972 243 1,215
Total 4,684 1,169 5,853 4,437 1,108 5,545

StanceCat: Evaluation Metrics & baseline
• Macro-average on F-score:
– Favor, Against, Neutral
– Semeval 2016 Task-6 &
StanceCat@IberEval 2017
• Majority-class baseline

StanceCat: Participation
TEAM CATALAN SPANISH
Casacufans T
T + C
T + C + I
T
T + C
T + C + I
CriCa T
T + C
T
C
ELiRF - T
uc3m T
T + C
T
T + C

StanceCat: Approaches
TEAM MODE APPROACH
Casacufans T & C Hashing Vectorized from scikit-learn + SVM
I CNN (the authors did not send a working note)
CriCa T & C Bag-of-Words, stemming and TF-IDF + Linear SVM
ELiRF T Lowercase, remove accents and dieresis, normalized
Twitter elements:
● RUN 1: Word Embeddings + CNN
● RUN 2:Character word n-grams + Linear SVM
uc3m T & C Bag-of-Words, TF-IDF + Linear SVM

StanceCat: Features Analysis
CATALAN SPANISH
TEXT + CONTEXT + IMAGES TEXT + CONTEXT + IMAGES
22.47 29.33 29.13 21.94 26.98 27.09

StanceCat: Error Analysis
● In Catalan, more errors from Against to Favor. In Spanish, more errors
from Favor to Against
● In Catalan, errors from Favor to Against are minimal (0.08%)

StanceCat: Error Analysis (CA)

StanceCat: Error Analysis (ES)

StanceCat: Social Network Analysis
STANCE SEED NETWORK %
IN FAVOR 4,510 808,549 51.44%
AGAINST 1,478 393,405 25.03%
BOTH 27 214,411 13.64%
NEUTRAL 1,041 155,522 9.89%
TOTAL 7,056 1,571,887 100%
• Almost disconnected communities (13.64%)
– Independents more closed community (51.44% vs. 25.03%)
• Few neutral people (9.89%)

StanceCat: Conclusions
• Multimodal Stance Identification task:
– Only with text, text + context, text + context + images
– Catalan and Spanish
• Low participation (only one participant used images
• Challenging task (imbalanced data):
– In Catalan, most systems performed below the baseline
– In Spanish, the best performing system improved in 9% the
baseline
• The use of context:
– More than 30% in Catalan
– More than 20% in Spanish
• Echo chamber effect:
– There is a lack of interest in communicating with the other
community

StanceCat: Credits
Programa I+D: TIN2015-71147

Thank you!
francisco.rangel@autoritas.es
prosso@dsic.upv.esamarti@ub.edu
mtaule@ub.edu

Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2018 @SEPLN 2018

Recommended

Recommended

More Related Content

More from Francisco Manuel Rangel Pardo

More from Francisco Manuel Rangel Pardo (20)

Recently uploaded

Recently uploaded (20)

Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2018 @SEPLN 2018