Stance and Gender Detection in
Tweets on Catalan Independence
@Ibereval 2017
Viviana Patti, Cristina Bosco, Università degli Studi di Torino
Mariona Taulé, M. Antònia Martí, Universitat de Barcelona
Francisco Rangel, Autoritas & Universitat Politècnica de València
Paolo Rosso, Universitat Politècnica de València
StanceCat
• Introduction and Motivations:
Stance vs. Polarity detection
• StanceCat: Task Description
• TW-CaSe corpus
• Evaluation Metrics
• Overview of the submitted approaches
http://stel.ub.edu/Stance-IberEval2017/index.html
StanceCat: Introduction & Motivation
The rise of social media is encouraging users to voice and
share their views, generating a large amount of
social data
They can be a great opportunity to investigate
communicative behaviors and conversational contexts,
for extracting knowledge about
all real life domains
The proposal of the shared task is collocated within the
wider context of a research about communication in
online political debates in Twitter
To develop resources and tools for under-resourced
languages
StanceCat: Introduction & Motivation
Online debates are a large source of informal and opinion-
sharing dialogue on current socio-political issues
Several works rely on finer-grained sentiment analysis
techniques to analyze such debates.
Among these works some is dedicated to the classification of
users’ stance, i.e. the detection of positions pro or con a
particular target entity that users assume within debates
Dual-sided debates where two possible polarizing sides
can be taken by participants
StanceCat: Introduction & Motivation
Stance detection, formalized as the task of identifying the
speaker’s opinion towards a particular target, has recently
attracted the attention of researchers in sentiment analysis
Applied to data from microblogging platforms such as Twitter
Monitoring sentiment in a specific political debate
Stance detection does not only provide information for improving
the performance of a sentiment analysis system, but can help
to better understand the way in which people communicate
ideas to highlight their point of view towards a target entity.
StanceCat: Introduction & Motivation
Being able to detect stance in user-generated content can
provide useful insights to discover novel information about
social network structures (Lai et al. CLEF 2017)
Detecting stance in social media could become a helpful tool for
journalism, companies, government
Politics is an especially good application domain: focusing on
stance is interesting when the target entity is a controversial
issue, e.g., political reforms, or a polarizing person, e.g.,
candidates in political elections, and we observe the
interaction between polarized communities.
StanceCat: Introduction & Motivation
Semeval 2016 - Track II. Sentiment analysis Task 6
Detecting Stance in Tweets
http://alt.qcri.org/semeval2016/task6/
“Given a tweet text and a target entity (person, organization,
movement, policy, etc.), automatic natural language systems must
determine whether the tweeter is in favor of the target, against the
given target, or whether neither inference is likely”
Stance detection: automatically determining from text whether
the author is in favor / against / neutral-none w.r.t. a target
StanceCat: Stance vs Sentiment
Stance detection is of course related to sentiment analysis
BUT there are significant differences.
In a classical sentiment analysis tasks systems have to
determine if a piece of text is positive, negative or
neutral.
In stance detection systems have to determine the
favorability towards a given target entity of interest,
where the target may not be explicitly mentioned in the
text.
StanceCat: Stance vs Sentiment
Example [source: training set of SemEval-2016 Task 6]
Support #independent #BernieSanders because he’s not a liar.
#POTUS #libcrib #democrats #tlot #republicans #WakeUpAmerica
#SemST.
• Target: Hillary Clinton [context: Party presidential primaries
for Democratic and Rapublican parties in US]
• The tweeter expresses a positive opinion towards an adversary of
the target (Sanders)
• We can infer that the tweeter expresses a negative stance towards
the target, i.e. she/he is likely unfavorable towards Hillary Clinton
• Important: tweet does not contain any explicit clue to find the target
• In many cases the stance must be inferred
• For a deeper exploration of the relation between sentiment and
stance in the Semeval dataset see:
Saif M. Mohammad, Parinaz Sobhani, Svetlana Kiritchenko:
Stance and Sentiment in Tweets. ACM Trans. Internet Techn. 17(3):
26:1-26:23 (2017)
• An interactive visualization of the dataset is available at:
http://www.saifmohammad.com/WebPages/StanceDataset.htm
a useful tool to explore the stance-target combinations present in
the annotated dataset and the relations between stance and
sentiment.
StanceCat: Stance vs Sentiment
Our focus and stance target: Catalan Independence
corpus from Twitter, filtering by the hashtag #independencia #27S
timelapse: end of September 2015 – December 2015
27S: September 27, 2015
Regional elections in Catalonia
de facto
referendum on independence
#independencia #27S
two of the hashtags which has been accepted within the dialogical
and social context growing around the topic; largely exploited in the
debate
StanceCat: Introduction & Motivation
Multilingual perspective
different socio-political debates
French: #mariagepourtous
Debate on the homosexual wedding
in France (Bosco et al. @LREC2016)
Italian: #labuonascuola
Debate on the reform of the education sector
in Italy (Stranisci et al. @LREC2016)
StanceCat: Introduction & Motivation
Multilingual perspective
different socio-political debates
Engish: #brexit
Debate on British Exit from EU
(Lai et al. @CLEF2017)
StanceCat: Introduction & Motivation
• http://stel.ub.edu/Stance-
IberEval2017/index.html
StanceCat: Task Description
StanceCat: Task Description
StanceCat Task
SubTask 1- Stance Detection
Deciding whether each message
is neutral, in favor or against the
target: ‘Catalan Independence’
SubTask 2- Gender
Detection
Identification of the gender of
the author of the message
Languages: Catalan and Spanish
StanceCat: Task Description
StanceCat Task
SubTask 1- Stance Detection
Deciding whether each message
is neutral, in favor or against the
target: ‘Catalan Independence’
SubTask 2- Gender
Detection
Identification of the gender of
the author of the message
Languages: Catalan and Spanish
Stance detection: SemEval-2016, Task-6; author profiling: PAN@CLEF.
Novelty: The two tasks have never been performed together for Spanish
and Catalan as part of one single task.
Results will be of interest not only for sentiment analysis but also for
author profiling and for socio-political studies
StanceCat: Task Description
StanceCat: Corpus
• TW-CaSe corpus  10.800 tweets
#Independencia
#27S
TW-
CaSe
Female Male Total
Catalan 2,700 2,700 5,400
Spanish 2,700 2,700 5,400
Cosmos tool (by Autoritas)
Training Test
4,319 1,081
4,319 1,081
80% 20%
StanceCat: Corpus
• Annotation Scheme:
Stance Tags
–AGAINST: Negative stance
–FAVOR: Positive stance
–NONE: Neutral stance + stance cannot be inferred
Gender Tags
–FEMALE
–MALE
StanceCat: Corpus
• Example:
Language: Catalan
Stance: FAVOR
Gender: FEMALE
Tweet: 15 diplomàtics internacional observen les plesbiscitàries, será que
interessen a tothom menys a Espanya #27
‘ 15 international diplomats observe the plebiscite, perhaps it is of
interest to everybody except to Spain #27’
StanceCat: Corpus
• Criteria:
– Writing text: emoticons, @mentions and #hashtags √
– Links (webpages, photographs, videos…) (NO, in TW-CaSe 0.1)
(YES, in TW-CaSe 1.0)
StanceCat: Corpus
• Annotation procedure:
1.3 trained annotators tagged the stance in 500 Catalan tweets and in
500 Spanish tweets in parallel
2.Interannotator Agreement Test (IAT)
3.Annotation of the whole corpus individually.
Annotators: 3 trained annotators + 2 seniors researchers
Meetings: once a week  problematic cases solved by common
consensus
StanceCat: Corpus
• Interannotator Agreement Test: Results
Annotator pairs
Pairwise agreement
TW-CaSe-CA TW-CaSe-ES
A-B 75.78% 76.40%
A-C 79.54% 77.80%
B-C 82.46% 81%
Average Agreement 79.26% 78.40%
Fleiss’ Kappa 0.60 0.60
StanceCat: Corpus
• Disagreements: Communicative intentions are unclear
Language: Spanish
Stance: NONE A= AGAINST B and C = NONE
Gender: MALE
Tweet: #27 voy a denunciar a todo aquel q me siga insultando usando ls
red. Yo no soy imbécil, ni mi bandera es n trapo
‘#27 I’m going to denounce anyone who continues to insult me using the web. I’m
not stupid, neither my flag is a rag’
StanceCat: Corpus
• Disagreements: Communicative intentions are unclear
Language: Catalan
Stance: NONE A= AGAINST B= FAVOR C = NONE
Gender: MALE
Tweet: La @cupnacional t la clau de Matrix
‘The @cupnacional has the key of Matrix’
• Distribution of labels for stance, gender and
language
StanceCat: Corpus
Female Male
Total Dataset
favor against none favor against none
Cat
1,456 57 646 1,192 74 894 4,319 training
365 14 162 298 18 224 1,081 test
Spa
145 693 1,322 190 753 1,216 4,319 training
36 173 331 48 188 305 1,081 test
StanceCat: Evaluation Metrics
• Macro-average on F-score (FAVOR &
AGAINST) to evaluate Stance
(Semeval 2016)
• Accuracy to evaluate Gender
(PAN@CLEF)
StanceCat: Baselines
• Majority class: A random basis approach
that returns the majority class.
• LDR (Low Dimensionality Representation):
– The key concept is the probability of occurrence
(weight) of each word in the training set in
each of the possible classes.
– The distribution of weights for a document
should be more similar to the distribution of
weights of its corresponding class.
StanceCat: Participation
10 PARTICIPANTS 31 RUNS
STANCE GENDER
CA ES CA ES
9 10 4 5
StanceCat: Approaches
CLASSIFICATION APPROACHES PARTICIPANT FEATURES
SVM
DECISION TREES
RANDOM FOREST
LOGISTIC REGRESSION
MULTINOMIAL NB
NEURAL NETWORKS
MULTILAYER PERCEPTRON
LSTM
CNN
MLP
FASTTEXT
KIM
BI-LSTM
ltl_uni_due
iTACOS
ARA1337
ELiRF-UPV
LTRC_IIITH
atoppe
LuSer
deepCybErNet
Word n-grams
Character n-grams
POS
Hashtags
Stylistic features (number of
hashtags, number of words…)
(Stance&gender) Specific tokens
Word embeddings
N-gram embeddings
One-hot vectors
StanceCat: Stance Results
StanceCat: Gender Results
StanceCat: Stance vs. Gender (Catalan)
StanceCat: Stance vs. Gender (Spanish)
StanceCat: Error Analysis
• More errors in case of males.
• In Catalan, more errors from Against to
Favor. In Spanish, more errors from Favor
to Against.
• In Spanish, errors from Against to Favor
are minimal (2%).
StanceCat: Error Analysis
females
males
StanceCat: Error Analysis
females
males
StanceCat: Conclusions
• Stance and gender identification shared
task
• High participation
– 10 teams, 5 countries, 31 runs
• Challenging task
– F-measures below 50%
• Dataset released to the community
– Spanish and Catalan
StanceCat: Credits
Programa I+D: TIN2015-71147
Thank you!
patti@di.unito.it
bosco@di.unito.it
francisco.rangel@autoritas.es prosso@dsic.upv.es
amarti@ub.edu
mtaule@ub.edu

Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN 2017

  • 1.
    Stance and GenderDetection in Tweets on Catalan Independence @Ibereval 2017 Viviana Patti, Cristina Bosco, Università degli Studi di Torino Mariona Taulé, M. Antònia Martí, Universitat de Barcelona Francisco Rangel, Autoritas & Universitat Politècnica de València Paolo Rosso, Universitat Politècnica de València
  • 2.
    StanceCat • Introduction andMotivations: Stance vs. Polarity detection • StanceCat: Task Description • TW-CaSe corpus • Evaluation Metrics • Overview of the submitted approaches http://stel.ub.edu/Stance-IberEval2017/index.html
  • 3.
    StanceCat: Introduction &Motivation The rise of social media is encouraging users to voice and share their views, generating a large amount of social data They can be a great opportunity to investigate communicative behaviors and conversational contexts, for extracting knowledge about all real life domains The proposal of the shared task is collocated within the wider context of a research about communication in online political debates in Twitter To develop resources and tools for under-resourced languages
  • 4.
    StanceCat: Introduction &Motivation Online debates are a large source of informal and opinion- sharing dialogue on current socio-political issues Several works rely on finer-grained sentiment analysis techniques to analyze such debates. Among these works some is dedicated to the classification of users’ stance, i.e. the detection of positions pro or con a particular target entity that users assume within debates Dual-sided debates where two possible polarizing sides can be taken by participants
  • 5.
    StanceCat: Introduction &Motivation Stance detection, formalized as the task of identifying the speaker’s opinion towards a particular target, has recently attracted the attention of researchers in sentiment analysis Applied to data from microblogging platforms such as Twitter Monitoring sentiment in a specific political debate Stance detection does not only provide information for improving the performance of a sentiment analysis system, but can help to better understand the way in which people communicate ideas to highlight their point of view towards a target entity.
  • 6.
    StanceCat: Introduction &Motivation Being able to detect stance in user-generated content can provide useful insights to discover novel information about social network structures (Lai et al. CLEF 2017) Detecting stance in social media could become a helpful tool for journalism, companies, government Politics is an especially good application domain: focusing on stance is interesting when the target entity is a controversial issue, e.g., political reforms, or a polarizing person, e.g., candidates in political elections, and we observe the interaction between polarized communities.
  • 7.
    StanceCat: Introduction &Motivation Semeval 2016 - Track II. Sentiment analysis Task 6 Detecting Stance in Tweets http://alt.qcri.org/semeval2016/task6/ “Given a tweet text and a target entity (person, organization, movement, policy, etc.), automatic natural language systems must determine whether the tweeter is in favor of the target, against the given target, or whether neither inference is likely” Stance detection: automatically determining from text whether the author is in favor / against / neutral-none w.r.t. a target
  • 8.
    StanceCat: Stance vsSentiment Stance detection is of course related to sentiment analysis BUT there are significant differences. In a classical sentiment analysis tasks systems have to determine if a piece of text is positive, negative or neutral. In stance detection systems have to determine the favorability towards a given target entity of interest, where the target may not be explicitly mentioned in the text.
  • 9.
    StanceCat: Stance vsSentiment Example [source: training set of SemEval-2016 Task 6] Support #independent #BernieSanders because he’s not a liar. #POTUS #libcrib #democrats #tlot #republicans #WakeUpAmerica #SemST. • Target: Hillary Clinton [context: Party presidential primaries for Democratic and Rapublican parties in US] • The tweeter expresses a positive opinion towards an adversary of the target (Sanders) • We can infer that the tweeter expresses a negative stance towards the target, i.e. she/he is likely unfavorable towards Hillary Clinton • Important: tweet does not contain any explicit clue to find the target • In many cases the stance must be inferred
  • 10.
    • For adeeper exploration of the relation between sentiment and stance in the Semeval dataset see: Saif M. Mohammad, Parinaz Sobhani, Svetlana Kiritchenko: Stance and Sentiment in Tweets. ACM Trans. Internet Techn. 17(3): 26:1-26:23 (2017) • An interactive visualization of the dataset is available at: http://www.saifmohammad.com/WebPages/StanceDataset.htm a useful tool to explore the stance-target combinations present in the annotated dataset and the relations between stance and sentiment. StanceCat: Stance vs Sentiment
  • 11.
    Our focus andstance target: Catalan Independence corpus from Twitter, filtering by the hashtag #independencia #27S timelapse: end of September 2015 – December 2015 27S: September 27, 2015 Regional elections in Catalonia de facto referendum on independence #independencia #27S two of the hashtags which has been accepted within the dialogical and social context growing around the topic; largely exploited in the debate StanceCat: Introduction & Motivation
  • 12.
    Multilingual perspective different socio-politicaldebates French: #mariagepourtous Debate on the homosexual wedding in France (Bosco et al. @LREC2016) Italian: #labuonascuola Debate on the reform of the education sector in Italy (Stranisci et al. @LREC2016) StanceCat: Introduction & Motivation
  • 13.
    Multilingual perspective different socio-politicaldebates Engish: #brexit Debate on British Exit from EU (Lai et al. @CLEF2017) StanceCat: Introduction & Motivation
  • 14.
  • 15.
    StanceCat: Task Description StanceCatTask SubTask 1- Stance Detection Deciding whether each message is neutral, in favor or against the target: ‘Catalan Independence’ SubTask 2- Gender Detection Identification of the gender of the author of the message Languages: Catalan and Spanish
  • 16.
    StanceCat: Task Description StanceCatTask SubTask 1- Stance Detection Deciding whether each message is neutral, in favor or against the target: ‘Catalan Independence’ SubTask 2- Gender Detection Identification of the gender of the author of the message Languages: Catalan and Spanish Stance detection: SemEval-2016, Task-6; author profiling: PAN@CLEF. Novelty: The two tasks have never been performed together for Spanish and Catalan as part of one single task. Results will be of interest not only for sentiment analysis but also for author profiling and for socio-political studies
  • 17.
  • 18.
    StanceCat: Corpus • TW-CaSecorpus  10.800 tweets #Independencia #27S TW- CaSe Female Male Total Catalan 2,700 2,700 5,400 Spanish 2,700 2,700 5,400 Cosmos tool (by Autoritas) Training Test 4,319 1,081 4,319 1,081 80% 20%
  • 19.
    StanceCat: Corpus • AnnotationScheme: Stance Tags –AGAINST: Negative stance –FAVOR: Positive stance –NONE: Neutral stance + stance cannot be inferred Gender Tags –FEMALE –MALE
  • 20.
    StanceCat: Corpus • Example: Language:Catalan Stance: FAVOR Gender: FEMALE Tweet: 15 diplomàtics internacional observen les plesbiscitàries, será que interessen a tothom menys a Espanya #27 ‘ 15 international diplomats observe the plebiscite, perhaps it is of interest to everybody except to Spain #27’
  • 21.
    StanceCat: Corpus • Criteria: –Writing text: emoticons, @mentions and #hashtags √ – Links (webpages, photographs, videos…) (NO, in TW-CaSe 0.1) (YES, in TW-CaSe 1.0)
  • 22.
    StanceCat: Corpus • Annotationprocedure: 1.3 trained annotators tagged the stance in 500 Catalan tweets and in 500 Spanish tweets in parallel 2.Interannotator Agreement Test (IAT) 3.Annotation of the whole corpus individually. Annotators: 3 trained annotators + 2 seniors researchers Meetings: once a week  problematic cases solved by common consensus
  • 23.
    StanceCat: Corpus • InterannotatorAgreement Test: Results Annotator pairs Pairwise agreement TW-CaSe-CA TW-CaSe-ES A-B 75.78% 76.40% A-C 79.54% 77.80% B-C 82.46% 81% Average Agreement 79.26% 78.40% Fleiss’ Kappa 0.60 0.60
  • 24.
    StanceCat: Corpus • Disagreements:Communicative intentions are unclear Language: Spanish Stance: NONE A= AGAINST B and C = NONE Gender: MALE Tweet: #27 voy a denunciar a todo aquel q me siga insultando usando ls red. Yo no soy imbécil, ni mi bandera es n trapo ‘#27 I’m going to denounce anyone who continues to insult me using the web. I’m not stupid, neither my flag is a rag’
  • 25.
    StanceCat: Corpus • Disagreements:Communicative intentions are unclear Language: Catalan Stance: NONE A= AGAINST B= FAVOR C = NONE Gender: MALE Tweet: La @cupnacional t la clau de Matrix ‘The @cupnacional has the key of Matrix’
  • 26.
    • Distribution oflabels for stance, gender and language StanceCat: Corpus Female Male Total Dataset favor against none favor against none Cat 1,456 57 646 1,192 74 894 4,319 training 365 14 162 298 18 224 1,081 test Spa 145 693 1,322 190 753 1,216 4,319 training 36 173 331 48 188 305 1,081 test
  • 27.
    StanceCat: Evaluation Metrics •Macro-average on F-score (FAVOR & AGAINST) to evaluate Stance (Semeval 2016) • Accuracy to evaluate Gender (PAN@CLEF)
  • 28.
    StanceCat: Baselines • Majorityclass: A random basis approach that returns the majority class. • LDR (Low Dimensionality Representation): – The key concept is the probability of occurrence (weight) of each word in the training set in each of the possible classes. – The distribution of weights for a document should be more similar to the distribution of weights of its corresponding class.
  • 29.
    StanceCat: Participation 10 PARTICIPANTS31 RUNS STANCE GENDER CA ES CA ES 9 10 4 5
  • 30.
    StanceCat: Approaches CLASSIFICATION APPROACHESPARTICIPANT FEATURES SVM DECISION TREES RANDOM FOREST LOGISTIC REGRESSION MULTINOMIAL NB NEURAL NETWORKS MULTILAYER PERCEPTRON LSTM CNN MLP FASTTEXT KIM BI-LSTM ltl_uni_due iTACOS ARA1337 ELiRF-UPV LTRC_IIITH atoppe LuSer deepCybErNet Word n-grams Character n-grams POS Hashtags Stylistic features (number of hashtags, number of words…) (Stance&gender) Specific tokens Word embeddings N-gram embeddings One-hot vectors
  • 31.
  • 32.
  • 33.
    StanceCat: Stance vs.Gender (Catalan)
  • 34.
    StanceCat: Stance vs.Gender (Spanish)
  • 35.
    StanceCat: Error Analysis •More errors in case of males. • In Catalan, more errors from Against to Favor. In Spanish, more errors from Favor to Against. • In Spanish, errors from Against to Favor are minimal (2%).
  • 36.
  • 37.
  • 38.
    StanceCat: Conclusions • Stanceand gender identification shared task • High participation – 10 teams, 5 countries, 31 runs • Challenging task – F-measures below 50% • Dataset released to the community – Spanish and Catalan
  • 39.
  • 40.