SlideShare a Scribd company logo
1 of 23
Download to read offline
9th Author Profiling task at PAN
Profiling Hate Speech
Spreaders on Twitter
PAN-AP-2021 CLEF 2021
Online, 21-24 September
Francisco Rangel - Symanto Research
Grétel Liz de la Peña Sarracén - Universitat Politècnica de València
BERTa Chulvi - Universitat Politècnica de València
Elisabetta Fersini - Università Degli Studi di Milano-Bicocca
Paolo Rosso - Universitat Politècnica de València
Introduction
Author profiling aims at identifying
personal traits such as age, gender,
personality traits, native language,
language variety… from writings?
This is crucial for:
- Marketing.
- Security.
- Forensics.
2
Author
Profiling
PAN’21
Last years goal
Profiling Harmful Information Spreaders:
2019 - Profiling Bots
2020 - Profiling Fake News Spreaders
2021 - Profiling Hate Speech Spreaders
3
Author
Profiling
PAN’21
Task goal
Given a Twitter feed, determine whether
its author is keen to spread hate speech
or not.
4
Author
Profiling
Two languages:
English Spanish
PAN’21
Corpus
5
Author
Profiling
PAN’21
(EN) English (ES) Spanish
Keen to spread
hate speech
Not keen to spread
hate speech
Total
Keen to spread
hate speech
Not keen to spread
hate speech
Total
Training 100 100 200 100 100 200
Test 50 50 100 50 50 100
Total 150 150 300 150 150 300
Methodology
1. Selection of users considered potential haters:
1.1. Keyword-based search (hateful words mainly towards women or immigrants)
1.2. User-based search (users appearing in reports and/or press) + following their networks
2. Timeline collection
3.1. Manual review of the tweets conveying hate speech
3.2. Users with more than ten hateful tweets are labelled as keen to spread them. Otherwise,
they are not.
For each user, we provided 200 tweets
Evaluation measures
6
Author
Profiling
PAN’21
The accuracy is calculated per language and averaged:
Baselines
7
Author
Profiling
PAN’21
RANDOM A baseline that randomly generates the predictions among the different classes
CHAR N-GRAMS With values for n from 2 to 6, with Logistic Regression
WORD N-GRAMS With values for n from 1 to 3, with Support Vector Machines
Symanto (LDSE) This method represents documents on the basis of the probability distribution of
occurrence of their words in the different classes. The key concept of LDSE is a
weight, representing the probability of a term to belong to one of the different
categories: hate speech spreader / non-spreader. The distribution of weights for
a given document should be closer to the weights of its corresponding category.
LDSE takes advantage of the whole vocabulary
USE Universal Sentence Encoder to feed a BiLSTM
XLMR XLM-Roberta transformer to feed a BiLSTM
MBERT Multilingual BERT transformer to feed a BiLSTM
TFIDF TFIDF vectors representing each user's text to feed a BiLSTM
66+1 participants
32 working notes
16+1 countries
8
Author
Profiling
PAN’21
Participation
https://mapchart.net/world.html
Approaches
9
Author
Profiling
PAN’21
Approaches - Preprocessing
10
Author
Profiling
Twitter elements (RT, VIA,
FAV)
Del Campo et al.; Huertas-García et al.; Schlicht & de Paula; Jain et al.; Giglou et al.; Vogel
& Meghana
Emojis and other
non-alphanumeric chars
Alcañiz & Andrés; Jain et al.; Anwar; Giglou et al.; Vogel & Meghana; Bagdon; Espinosa &
Sidorov; Cabrera et al.; Das & Patra
Lemmatisation Vogel & Meghana; Alcañiz & Andrés; Das & Patra; Jain et al.
Tokenisation Das & Patra; Jain et al.
Punctuation signs Cabrera et al.; Puertas & Martínez-Santos; Dukic & Krzic; Espinosa & Sidorov; Giglou et al.;
Alcañiz & Andrés; Del Campo et al.; Das & Patra; Jain et al.
Numbers Del Campo et al.; Huertas-García et al.; Giglou et al.; Vogel & Meghana
Lowercase Alcañiz & Andrés; Del Campo et al.; Das & Patra; Anwar; Vogel & Meghana; Bagdon; Dukic
& Krzic
Stopwords Alcañiz & Andrés; Das & Patra; Jain et al.; Vogel & Meghana
Character flooding Bagdon; Vogel & Meghana; Del Campo et al.; Alcañiz & Andrés
Short texts Vogel & Meghana
PAN’21
Approaches - Preprocessing
11
Author
Profiling
Contractions expansion Alcañiz & Andrés
Colloquial tokens (slang) Alcañiz & Andrés; Huertas-García et al.
t-SNE Finogeev et al.; Ceron & Casula
Kolmogorov-Smirnov test Ceron & Casula
TFIDF Ikae
Shift graphs Ikae
PAN’21
Approaches - Features
12
Author
Profiling
Stylistic features:
- Number of occurrences
- Verbs, adjs, pronouns
- Number of hashtags, mentions, URLs...
- Capital vs. lower letters
- Punctuation marks
- ...
Katona et al.; Zaragozá & Pinto
N-gram models Andrade & Gonçalves; Das & Patra; Siino et al.; Ikae; Balouchzahi;
Espinosa & Sidorov; Alcañiz & Andrés; Katona et al.
Emotional and personality features Katona et al.; Cervero; Puertas & Martínez-Santos; Huertas-García
et al.
Specialised lexicons (HS) Lai et al.; Tosev & Gievska
Embeddings Puertas & Martínez-Santos; Zaragozá & Pinto; Alcañiz & Andrés;
Del Campo et al.
...Lexical+statistical+syntactical+phonetical Puertas & Martínez-Santos
...Semantic-emotion-based Cabrera et al.
PAN’21
Approaches - Features
13
Author
Profiling
Transformers
...BERT Huertas-García et al.; Finogeev & Kaprielova; Akomeah et al.; Alzahrani & Jololian; Anwar;
Ceron & Casula; Uzan & Hacohen-Kerner; Dukic & Krzic
...SBERT Schlicht & de Paula; Vogel & Meghana
...ALBERT Akomeah et al.
...RoBERTa Uzan & Hacohen-Kerner; Anwar; Bagdon
...BERTTweet Anwar
...BETO Anwar
PAN’21
LDSE at char level Babaei et al.
Fine-tuned transformer to modify the Impostor Method Labadie et al.
Topics aggregation combined with ELMo to represent the user Irani et al.
CNN to extract features from external data Höllig et al.
Phonetic embeddings (among others) Puertas & Martínez-Santos
Approaches - Methods
14
Author
Profiling
SVM Alcañiz & Andrés; Del Campo et al.; Andrade & Gonçalves; Finogeev & Kaprielova;
Höllig et al.; Das & Patra; Giglou et al.; Ceron & Casula; Vogel & Meghana; Badgon;
Espinosa & Sidorov; Cabrera et al.
Logistic regression Finogeev & Kaprielova; Badgon; Dukic & Krzic
Random Forest Irani et al.; Puertas & Martínez-Santos; Andrade & Gonçalves
Ensembles Huertas-García et al.; Cervero; Ikae; Balouchzahi et al.; Tosev & Gievska
Adaboost, Ridge, Naive Bayes,
KNN, XGBoost, AutoML, ...
Katona et al.; Anwar; Jain et al.; Höllig et al.; Puertas & Martínez-Santos
Custom architecture Martin et al.; Baris & Magnossao
RNN Pallares & Herrero
CNN Siino et al.
LSTM Uzan & HacoHen-Kerner
bi-LSTM Vogel & Meghana
PAN’21
Global ranking
15
Author
Profiling
PAN’21
Confusion matrices
16
Author
Profiling
PAN’21
ENGLISH
SPANISH
Best results at PAN'21
17
Author
Profiling
PAN’21
Dukic and Sovic
- BERT
- Logistic Regression
Siino et al.
- 100-dim word-embedding
- CNN
Conclusions
● Several approaches to tackle the task:
○ Traditional machine learning methods (SVM, LR) combined with BERT obtained the highest
results.
● Results in English:
○ Over 64% on average.
○ Best (75%): Dukic and Sovic - BERT + Logistic Regression
● Results in Spanish:
○ Over 78% on average.
○ Best (85%): Siino et al. - 100-dimension word embedding + CNN
● Error analysis:
○ English:
■ False positives (non-hate speech spreaders as spreaders): 39.23%
■ False negatives (hate speech spreaders as non-spreaders): 34.40%
○ Spanish:
■ False positives (non-hate speech spreaders as spreaders): 28.13%
■ False negatives (hate speech spreaders as non-spreaders): 16.85%
Looking at the results, we can conclude:
● It is feasible to automatically identify Hate Speech Spreaders with high precision
○ ...even when only textual features are used.
● We have to bear in mind false positives since, in English they sum up to forty percent of the total
predictions and in Spanish they are almost double than false negatives, and misclassification might
lead to ethical or legal implications. 18
Author
Profiling
PAN’21
19
Author
Profiling
PAN’21
Industry at PAN (Author Profiling)
20
Author
Profiling
Organisation
Sponsors
PAN’21
This year, the winners of the task are:
● Marco Siino, Elisa Di Nuovo, Ilenia Tinnirello and
Marco La Cascia, Università degli Studi di
Palermo and Università degli Studi di Torino,
Italy
2022 -> Profiling
Irony and
Stereotypes
spreaders
21
Author
Profiling
PAN’21
22
Author
Profiling
On behalf of the author profiling task organisers:
Thank you very much for participating
and hope to see you next year!!
PAN’21
23
Author
Profiling
PAN’21

More Related Content

Similar to Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Spreaders on Twitter

Author Profiling in Social Media: Identifying Information about Gender, Age, ...
Author Profiling in Social Media: Identifying Information about Gender, Age, ...Author Profiling in Social Media: Identifying Information about Gender, Age, ...
Author Profiling in Social Media: Identifying Information about Gender, Age, ...
Francisco Manuel Rangel Pardo
 
Apac interpretive
Apac interpretiveApac interpretive
Apac interpretive
hhs
 
Presentation Research artificial intelligence
Presentation Research artificial intelligencePresentation Research artificial intelligence
Presentation Research artificial intelligence
KioGabrielRamirez
 
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
IJCI JOURNAL
 

Similar to Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Spreaders on Twitter (20)

Use of language and author profiling.key
Use of language and author profiling.keyUse of language and author profiling.key
Use of language and author profiling.key
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
Part-of-Speech Tagging of Northern Sotho: Disambiguating Polysemous Function ...
 
Author Profiling. PAN@CLEF-2013 Task
Author Profiling. PAN@CLEF-2013 TaskAuthor Profiling. PAN@CLEF-2013 Task
Author Profiling. PAN@CLEF-2013 Task
 
Should we be afraid of Transformers?
Should we be afraid of Transformers?Should we be afraid of Transformers?
Should we be afraid of Transformers?
 
Alex Parry - Investigating the relationship between programming and natural l...
Alex Parry - Investigating the relationship between programming and natural l...Alex Parry - Investigating the relationship between programming and natural l...
Alex Parry - Investigating the relationship between programming and natural l...
 
Author Profiling in Social Media: Identifying Information about Gender, Age, ...
Author Profiling in Social Media: Identifying Information about Gender, Age, ...Author Profiling in Social Media: Identifying Information about Gender, Age, ...
Author Profiling in Social Media: Identifying Information about Gender, Age, ...
 
Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
Apac interpretive
Apac interpretiveApac interpretive
Apac interpretive
 
NLP and Knowledge Graphs
NLP and Knowledge GraphsNLP and Knowledge Graphs
NLP and Knowledge Graphs
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Understanding natural language processing
Understanding natural language processingUnderstanding natural language processing
Understanding natural language processing
 
Tone and language lesson 1
Tone and language   lesson 1Tone and language   lesson 1
Tone and language lesson 1
 
Enhancing Intercultural Communicative Competence through cross-cultural inter...
Enhancing Intercultural Communicative Competence through cross-cultural inter...Enhancing Intercultural Communicative Competence through cross-cultural inter...
Enhancing Intercultural Communicative Competence through cross-cultural inter...
 
Presentation Research artificial intelligence
Presentation Research artificial intelligencePresentation Research artificial intelligence
Presentation Research artificial intelligence
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
 
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
 

More from Francisco Manuel Rangel Pardo

Social Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de NegocioSocial Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de Negocio
Francisco Manuel Rangel Pardo
 

More from Francisco Manuel Rangel Pardo (18)

AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
 
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre prof...
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 
EmoGraph for Age and Gender Identification
EmoGraph for Age and Gender IdentificationEmoGraph for Age and Gender Identification
EmoGraph for Age and Gender Identification
 
My Phd Student T-Shirt
My Phd Student T-ShirtMy Phd Student T-Shirt
My Phd Student T-Shirt
 
Kico's Stairway to Phd
Kico's Stairway to PhdKico's Stairway to Phd
Kico's Stairway to Phd
 
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
Overview of the 2nd. Author Profiling task at PAN-CLEF 2014
 
Social Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de NegocioSocial Business Intelligence - Inteligencia Social de Negocio
Social Business Intelligence - Inteligencia Social de Negocio
 
Dualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresaDualidad onda-partícula del científico de datos en la empresa
Dualidad onda-partícula del científico de datos en la empresa
 
On the Identification of Emotions and Authors’ Gender in Facebook Comments on...
On the Identification of Emotions and Authors’ Gender in Facebook Comments on...On the Identification of Emotions and Authors’ Gender in Facebook Comments on...
On the Identification of Emotions and Authors’ Gender in Facebook Comments on...
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 

Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Spreaders on Twitter

  • 1. 9th Author Profiling task at PAN Profiling Hate Speech Spreaders on Twitter PAN-AP-2021 CLEF 2021 Online, 21-24 September Francisco Rangel - Symanto Research Grétel Liz de la Peña Sarracén - Universitat Politècnica de València BERTa Chulvi - Universitat Politècnica de València Elisabetta Fersini - Università Degli Studi di Milano-Bicocca Paolo Rosso - Universitat Politècnica de València
  • 2. Introduction Author profiling aims at identifying personal traits such as age, gender, personality traits, native language, language variety… from writings? This is crucial for: - Marketing. - Security. - Forensics. 2 Author Profiling PAN’21
  • 3. Last years goal Profiling Harmful Information Spreaders: 2019 - Profiling Bots 2020 - Profiling Fake News Spreaders 2021 - Profiling Hate Speech Spreaders 3 Author Profiling PAN’21
  • 4. Task goal Given a Twitter feed, determine whether its author is keen to spread hate speech or not. 4 Author Profiling Two languages: English Spanish PAN’21
  • 5. Corpus 5 Author Profiling PAN’21 (EN) English (ES) Spanish Keen to spread hate speech Not keen to spread hate speech Total Keen to spread hate speech Not keen to spread hate speech Total Training 100 100 200 100 100 200 Test 50 50 100 50 50 100 Total 150 150 300 150 150 300 Methodology 1. Selection of users considered potential haters: 1.1. Keyword-based search (hateful words mainly towards women or immigrants) 1.2. User-based search (users appearing in reports and/or press) + following their networks 2. Timeline collection 3.1. Manual review of the tweets conveying hate speech 3.2. Users with more than ten hateful tweets are labelled as keen to spread them. Otherwise, they are not. For each user, we provided 200 tweets
  • 6. Evaluation measures 6 Author Profiling PAN’21 The accuracy is calculated per language and averaged:
  • 7. Baselines 7 Author Profiling PAN’21 RANDOM A baseline that randomly generates the predictions among the different classes CHAR N-GRAMS With values for n from 2 to 6, with Logistic Regression WORD N-GRAMS With values for n from 1 to 3, with Support Vector Machines Symanto (LDSE) This method represents documents on the basis of the probability distribution of occurrence of their words in the different classes. The key concept of LDSE is a weight, representing the probability of a term to belong to one of the different categories: hate speech spreader / non-spreader. The distribution of weights for a given document should be closer to the weights of its corresponding category. LDSE takes advantage of the whole vocabulary USE Universal Sentence Encoder to feed a BiLSTM XLMR XLM-Roberta transformer to feed a BiLSTM MBERT Multilingual BERT transformer to feed a BiLSTM TFIDF TFIDF vectors representing each user's text to feed a BiLSTM
  • 8. 66+1 participants 32 working notes 16+1 countries 8 Author Profiling PAN’21 Participation https://mapchart.net/world.html
  • 10. Approaches - Preprocessing 10 Author Profiling Twitter elements (RT, VIA, FAV) Del Campo et al.; Huertas-García et al.; Schlicht & de Paula; Jain et al.; Giglou et al.; Vogel & Meghana Emojis and other non-alphanumeric chars Alcañiz & Andrés; Jain et al.; Anwar; Giglou et al.; Vogel & Meghana; Bagdon; Espinosa & Sidorov; Cabrera et al.; Das & Patra Lemmatisation Vogel & Meghana; Alcañiz & Andrés; Das & Patra; Jain et al. Tokenisation Das & Patra; Jain et al. Punctuation signs Cabrera et al.; Puertas & Martínez-Santos; Dukic & Krzic; Espinosa & Sidorov; Giglou et al.; Alcañiz & Andrés; Del Campo et al.; Das & Patra; Jain et al. Numbers Del Campo et al.; Huertas-García et al.; Giglou et al.; Vogel & Meghana Lowercase Alcañiz & Andrés; Del Campo et al.; Das & Patra; Anwar; Vogel & Meghana; Bagdon; Dukic & Krzic Stopwords Alcañiz & Andrés; Das & Patra; Jain et al.; Vogel & Meghana Character flooding Bagdon; Vogel & Meghana; Del Campo et al.; Alcañiz & Andrés Short texts Vogel & Meghana PAN’21
  • 11. Approaches - Preprocessing 11 Author Profiling Contractions expansion Alcañiz & Andrés Colloquial tokens (slang) Alcañiz & Andrés; Huertas-García et al. t-SNE Finogeev et al.; Ceron & Casula Kolmogorov-Smirnov test Ceron & Casula TFIDF Ikae Shift graphs Ikae PAN’21
  • 12. Approaches - Features 12 Author Profiling Stylistic features: - Number of occurrences - Verbs, adjs, pronouns - Number of hashtags, mentions, URLs... - Capital vs. lower letters - Punctuation marks - ... Katona et al.; Zaragozá & Pinto N-gram models Andrade & Gonçalves; Das & Patra; Siino et al.; Ikae; Balouchzahi; Espinosa & Sidorov; Alcañiz & Andrés; Katona et al. Emotional and personality features Katona et al.; Cervero; Puertas & Martínez-Santos; Huertas-García et al. Specialised lexicons (HS) Lai et al.; Tosev & Gievska Embeddings Puertas & Martínez-Santos; Zaragozá & Pinto; Alcañiz & Andrés; Del Campo et al. ...Lexical+statistical+syntactical+phonetical Puertas & Martínez-Santos ...Semantic-emotion-based Cabrera et al. PAN’21
  • 13. Approaches - Features 13 Author Profiling Transformers ...BERT Huertas-García et al.; Finogeev & Kaprielova; Akomeah et al.; Alzahrani & Jololian; Anwar; Ceron & Casula; Uzan & Hacohen-Kerner; Dukic & Krzic ...SBERT Schlicht & de Paula; Vogel & Meghana ...ALBERT Akomeah et al. ...RoBERTa Uzan & Hacohen-Kerner; Anwar; Bagdon ...BERTTweet Anwar ...BETO Anwar PAN’21 LDSE at char level Babaei et al. Fine-tuned transformer to modify the Impostor Method Labadie et al. Topics aggregation combined with ELMo to represent the user Irani et al. CNN to extract features from external data Höllig et al. Phonetic embeddings (among others) Puertas & Martínez-Santos
  • 14. Approaches - Methods 14 Author Profiling SVM Alcañiz & Andrés; Del Campo et al.; Andrade & Gonçalves; Finogeev & Kaprielova; Höllig et al.; Das & Patra; Giglou et al.; Ceron & Casula; Vogel & Meghana; Badgon; Espinosa & Sidorov; Cabrera et al. Logistic regression Finogeev & Kaprielova; Badgon; Dukic & Krzic Random Forest Irani et al.; Puertas & Martínez-Santos; Andrade & Gonçalves Ensembles Huertas-García et al.; Cervero; Ikae; Balouchzahi et al.; Tosev & Gievska Adaboost, Ridge, Naive Bayes, KNN, XGBoost, AutoML, ... Katona et al.; Anwar; Jain et al.; Höllig et al.; Puertas & Martínez-Santos Custom architecture Martin et al.; Baris & Magnossao RNN Pallares & Herrero CNN Siino et al. LSTM Uzan & HacoHen-Kerner bi-LSTM Vogel & Meghana PAN’21
  • 17. Best results at PAN'21 17 Author Profiling PAN’21 Dukic and Sovic - BERT - Logistic Regression Siino et al. - 100-dim word-embedding - CNN
  • 18. Conclusions ● Several approaches to tackle the task: ○ Traditional machine learning methods (SVM, LR) combined with BERT obtained the highest results. ● Results in English: ○ Over 64% on average. ○ Best (75%): Dukic and Sovic - BERT + Logistic Regression ● Results in Spanish: ○ Over 78% on average. ○ Best (85%): Siino et al. - 100-dimension word embedding + CNN ● Error analysis: ○ English: ■ False positives (non-hate speech spreaders as spreaders): 39.23% ■ False negatives (hate speech spreaders as non-spreaders): 34.40% ○ Spanish: ■ False positives (non-hate speech spreaders as spreaders): 28.13% ■ False negatives (hate speech spreaders as non-spreaders): 16.85% Looking at the results, we can conclude: ● It is feasible to automatically identify Hate Speech Spreaders with high precision ○ ...even when only textual features are used. ● We have to bear in mind false positives since, in English they sum up to forty percent of the total predictions and in Spanish they are almost double than false negatives, and misclassification might lead to ethical or legal implications. 18 Author Profiling PAN’21
  • 20. Industry at PAN (Author Profiling) 20 Author Profiling Organisation Sponsors PAN’21 This year, the winners of the task are: ● Marco Siino, Elisa Di Nuovo, Ilenia Tinnirello and Marco La Cascia, Università degli Studi di Palermo and Università degli Studi di Torino, Italy
  • 21. 2022 -> Profiling Irony and Stereotypes spreaders 21 Author Profiling PAN’21
  • 22. 22 Author Profiling On behalf of the author profiling task organisers: Thank you very much for participating and hope to see you next year!! PAN’21