SlideShare a Scribd company logo
1 of 11
Download to read offline
Overview of PAN’16
New challenges for Authorship Analysis:
Cross-genre profiling, Clustering, Diarization,
and Obfuscation
PAN-AP-2016 CLEF 2016
Évora, 5-8 September
Paolo Rosso: Universitat Politècnica de Valencia
Francisco Rangel: Autoritas Consulting
Martin Potthast: Bauhaus - Universität Weimar
Efstathios Stamatatos: University of the Aegean
Michael Tschuggnall: University of Innsbruck
Benno Stein: Bauhaus-Universität Weimar
Introduction
Uncovering Plagiarism, Authorship, and Social
Software Misuse (PAN) is a forum for the digital text
forensics, where researchers and practitioners study
technologies that analyze texts with regard to
originality, authorship, and trustworthiness.
PAN focuses on the evaluation of selected tasks
from digital text forensics in order to develop
large-scale, standardized benchmarks, and to
assess the state-of-the-art techniques.
2
PAN’16
Evolution
3
PAN’16
158
PAN’16 focus
4
PAN’16
We have focused on focused on authorship tasks from the fields of (i) author
identification, (ii) author profiling, and (iii) author obfuscation evaluation (total 35
teams):
i. Author clustering / diarization: Author clustering is the task where given a
document collection the participant is asked to group documents written by the
same author so that each cluster corresponds to a different author. Author
diarization extends the previous tasks on intrinsic plagiarism detection.
ii. Age / gender identification: Since 2013, the main focus is in age and gender
identification. The goal of this year is the cross-genre evaluation.
iii. Author masking / obfuscation evaluation: Author masking and author
obfuscation evaluation aim respectively at perturbing an author’s style in a
given text to render it dissimilar to other texts from the same author, and at
adjusting a given text’s style so as to render it similar to that of a given author.
Author identification (clustering)
5
PAN’16
Two scenarios:
- Complete author clustering: Detailed analysis on:
- the number of different authors (k) found in the collection should be
identified.
- each document should be assigned to exactly one of the k authors.
- Authorship-link ranking: Viewed as a retrieval task, whose objective is to
establish authorship links between documents and provides a list of
document pairs ranked according to a confidence score (the score shows
how likely it is the document pair to be by the same author).
Corpora:
- Languages: English, Dutch and Greek.
- Genres: Articles and reviews.
Author identification (diarization)
6
PAN’16
Three subtasks:
- Traditional intrinsic plagiarism detection: Assuming a major author (70%
of a document) to find the remaining text portions written by other/s.
- Diarization with a given number of authors: Given a document composed
by a known number of authors, to group individual text fragments by
authors.
- Unrestricted diarization: The number of collaborating authors is not
given, so also the correct number of clusters, i.e., writers, has to be found.
Corpora:
- Webis-TRC-12 dataset, with 150 topics from TREC Web Tracks from
2009-2011
- Each subtask has variations of the dataset: number and proportions of
authors in a document, the decision, uniformly distributed...
Subtasks:
- Age and gender identification.
- Joint identification of age and gender for the same author
- The aim is at the cross-genre evaluation.
Corpora:
- Languages: English, Spanish, Dutch
- Genres: Twitter for training. Reviews, social media and blogs for evaluating.
Author profiling (age and gender identification)
7
PAN’16
Subtasks:
- Authorship verification: Given two documents, decide whether they have
been written by the same author.
- Author masking: Given two documents by the same author, paraphrase
the designated one so that the author cannot be verified anymore.
Corpora:
- Joint training and joint test datasets from the author verification tasks of
PAN 2013 to 2015.
Author obfuscation
8
PAN’16
Conclusions
9
PAN’16
- The author obfuscation shared task allowed to shed light on the
robustness of state-of-the-art author identification and author profiling
techniques against author obfuscation technology.
- New corpora have been developed in multiple languages: English,
Spanish, Dutch.
- PAN/FIRE:
- A shared task on plagiarism detection on texts written in Farsi.
- A shared task on author profiling on personality recognition in
source code.
See you on Tuesday and Wednesday
10
PAN’16
Rui Sousa-Silva
Universidade do Porto
Tuesday 6th Sept. 13:30 - 15:30
Wednesday 7th Sept. 13:30 - 15:30
16:15 - 18:15
Sponsors
11
PAN’16
On behalf of the PAN lab organisers:
Thank you very much for participating
and hope to see you next year!!

More Related Content

Similar to Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre profiling, Clustering, Diarization, and Obfuscation

LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docxwrite4
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Andrea Scharnhorst
 
Ethical and Unethical Methods of Plagiarism Prevention in Academic Writing
Ethical and Unethical Methods of Plagiarism Prevention in Academic WritingEthical and Unethical Methods of Plagiarism Prevention in Academic Writing
Ethical and Unethical Methods of Plagiarism Prevention in Academic WritingNader Ale Ebrahim
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Waqas Tariq
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdfMartin Wynne
 
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteDa Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteEquipex Biblissima
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups
Application Of Linguistic Cues In The Analysis Of Language Of Hate GroupsApplication Of Linguistic Cues In The Analysis Of Language Of Hate Groups
Application Of Linguistic Cues In The Analysis Of Language Of Hate GroupsLeonard Goudy
 
Hum2220 fa2015 research project packet
Hum2220 fa2015 research project packetHum2220 fa2015 research project packet
Hum2220 fa2015 research project packetProfWillAdams
 
Analysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network scienceAnalysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network scienceMarkus Luczak-Rösch
 
Judicial Review Chapters 4 and 5 in the text discuss.docx
Judicial Review Chapters 4 and 5 in the text discuss.docxJudicial Review Chapters 4 and 5 in the text discuss.docx
Judicial Review Chapters 4 and 5 in the text discuss.docxwrite22
 
Annotating Musical Theatre Plots On Narrative Structure And Emotional Content
Annotating Musical Theatre Plots On Narrative Structure And Emotional ContentAnnotating Musical Theatre Plots On Narrative Structure And Emotional Content
Annotating Musical Theatre Plots On Narrative Structure And Emotional ContentHeather Strinden
 
Forensic linguistics with Apache Spark
Forensic linguistics with Apache SparkForensic linguistics with Apache Spark
Forensic linguistics with Apache SparkSheamus McGovern
 
The difference queer fanfiction makes: Lessons for the publishing industry - ...
The difference queer fanfiction makes: Lessons for the publishing industry - ...The difference queer fanfiction makes: Lessons for the publishing industry - ...
The difference queer fanfiction makes: Lessons for the publishing industry - ...BookNet Canada
 
Distributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryDistributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryCarlos Badenes-Olmedo
 
Guzik ARTE 692
Guzik ARTE 692Guzik ARTE 692
Guzik ARTE 692Kyle Guzik
 
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...eveline wandl-vogt
 

Similar to Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre profiling, Clustering, Diarization, and Obfuscation (20)

LCC CTS 2 Option.docx
LCC CTS 2 Option.docxLCC CTS 2 Option.docx
LCC CTS 2 Option.docx
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...
 
Ethical and Unethical Methods of Plagiarism Prevention in Academic Writing
Ethical and Unethical Methods of Plagiarism Prevention in Academic WritingEthical and Unethical Methods of Plagiarism Prevention in Academic Writing
Ethical and Unethical Methods of Plagiarism Prevention in Academic Writing
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
 
Introduction to Nvivo
Introduction to NvivoIntroduction to Nvivo
Introduction to Nvivo
 
MacroMicroZoom.pdf
MacroMicroZoom.pdfMacroMicroZoom.pdf
MacroMicroZoom.pdf
 
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritteDa Biblissima a Biblissima+ : per un osservatorio delle culture scritte
Da Biblissima a Biblissima+ : per un osservatorio delle culture scritte
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups
Application Of Linguistic Cues In The Analysis Of Language Of Hate GroupsApplication Of Linguistic Cues In The Analysis Of Language Of Hate Groups
Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups
 
Hum2220 fa2015 research project packet
Hum2220 fa2015 research project packetHum2220 fa2015 research project packet
Hum2220 fa2015 research project packet
 
Analysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network scienceAnalysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network science
 
Judicial Review Chapters 4 and 5 in the text discuss.docx
Judicial Review Chapters 4 and 5 in the text discuss.docxJudicial Review Chapters 4 and 5 in the text discuss.docx
Judicial Review Chapters 4 and 5 in the text discuss.docx
 
Annotating Musical Theatre Plots On Narrative Structure And Emotional Content
Annotating Musical Theatre Plots On Narrative Structure And Emotional ContentAnnotating Musical Theatre Plots On Narrative Structure And Emotional Content
Annotating Musical Theatre Plots On Narrative Structure And Emotional Content
 
Analyzing Nontextual Content Features to Detect Academic Plagiarism
Analyzing Nontextual Content Features to Detect Academic PlagiarismAnalyzing Nontextual Content Features to Detect Academic Plagiarism
Analyzing Nontextual Content Features to Detect Academic Plagiarism
 
Forensic linguistics with Apache Spark
Forensic linguistics with Apache SparkForensic linguistics with Apache Spark
Forensic linguistics with Apache Spark
 
The difference queer fanfiction makes: Lessons for the publishing industry - ...
The difference queer fanfiction makes: Lessons for the publishing industry - ...The difference queer fanfiction makes: Lessons for the publishing industry - ...
The difference queer fanfiction makes: Lessons for the publishing industry - ...
 
Distributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIryDistributing Text Mining tasks with librAIry
Distributing Text Mining tasks with librAIry
 
Guzik ARTE 692
Guzik ARTE 692Guzik ARTE 692
Guzik ARTE 692
 
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...
Lexicography and Lexicology from a Pan-European Perspective: COST ENeL Workin...
 
Plagiarism.pptx ics
Plagiarism.pptx icsPlagiarism.pptx ics
Plagiarism.pptx ics
 

More from Francisco Manuel Rangel Pardo

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Francisco Manuel Rangel Pardo
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Francisco Manuel Rangel Pardo
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Francisco Manuel Rangel Pardo
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019Francisco Manuel Rangel Pardo
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Francisco Manuel Rangel Pardo
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Francisco Manuel Rangel Pardo
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Francisco Manuel Rangel Pardo
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIREFrancisco Manuel Rangel Pardo
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Francisco Manuel Rangel Pardo
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Francisco Manuel Rangel Pardo
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustFrancisco Manuel Rangel Pardo
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...Francisco Manuel Rangel Pardo
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...Francisco Manuel Rangel Pardo
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Francisco Manuel Rangel Pardo
 

More from Francisco Manuel Rangel Pardo (20)

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
 
RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Overview of PAN'16 - New challenges for Authorship Analysis: Cross-genre profiling, Clustering, Diarization, and Obfuscation

  • 1. Overview of PAN’16 New challenges for Authorship Analysis: Cross-genre profiling, Clustering, Diarization, and Obfuscation PAN-AP-2016 CLEF 2016 Évora, 5-8 September Paolo Rosso: Universitat Politècnica de Valencia Francisco Rangel: Autoritas Consulting Martin Potthast: Bauhaus - Universität Weimar Efstathios Stamatatos: University of the Aegean Michael Tschuggnall: University of Innsbruck Benno Stein: Bauhaus-Universität Weimar
  • 2. Introduction Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN) is a forum for the digital text forensics, where researchers and practitioners study technologies that analyze texts with regard to originality, authorship, and trustworthiness. PAN focuses on the evaluation of selected tasks from digital text forensics in order to develop large-scale, standardized benchmarks, and to assess the state-of-the-art techniques. 2 PAN’16
  • 4. PAN’16 focus 4 PAN’16 We have focused on focused on authorship tasks from the fields of (i) author identification, (ii) author profiling, and (iii) author obfuscation evaluation (total 35 teams): i. Author clustering / diarization: Author clustering is the task where given a document collection the participant is asked to group documents written by the same author so that each cluster corresponds to a different author. Author diarization extends the previous tasks on intrinsic plagiarism detection. ii. Age / gender identification: Since 2013, the main focus is in age and gender identification. The goal of this year is the cross-genre evaluation. iii. Author masking / obfuscation evaluation: Author masking and author obfuscation evaluation aim respectively at perturbing an author’s style in a given text to render it dissimilar to other texts from the same author, and at adjusting a given text’s style so as to render it similar to that of a given author.
  • 5. Author identification (clustering) 5 PAN’16 Two scenarios: - Complete author clustering: Detailed analysis on: - the number of different authors (k) found in the collection should be identified. - each document should be assigned to exactly one of the k authors. - Authorship-link ranking: Viewed as a retrieval task, whose objective is to establish authorship links between documents and provides a list of document pairs ranked according to a confidence score (the score shows how likely it is the document pair to be by the same author). Corpora: - Languages: English, Dutch and Greek. - Genres: Articles and reviews.
  • 6. Author identification (diarization) 6 PAN’16 Three subtasks: - Traditional intrinsic plagiarism detection: Assuming a major author (70% of a document) to find the remaining text portions written by other/s. - Diarization with a given number of authors: Given a document composed by a known number of authors, to group individual text fragments by authors. - Unrestricted diarization: The number of collaborating authors is not given, so also the correct number of clusters, i.e., writers, has to be found. Corpora: - Webis-TRC-12 dataset, with 150 topics from TREC Web Tracks from 2009-2011 - Each subtask has variations of the dataset: number and proportions of authors in a document, the decision, uniformly distributed...
  • 7. Subtasks: - Age and gender identification. - Joint identification of age and gender for the same author - The aim is at the cross-genre evaluation. Corpora: - Languages: English, Spanish, Dutch - Genres: Twitter for training. Reviews, social media and blogs for evaluating. Author profiling (age and gender identification) 7 PAN’16
  • 8. Subtasks: - Authorship verification: Given two documents, decide whether they have been written by the same author. - Author masking: Given two documents by the same author, paraphrase the designated one so that the author cannot be verified anymore. Corpora: - Joint training and joint test datasets from the author verification tasks of PAN 2013 to 2015. Author obfuscation 8 PAN’16
  • 9. Conclusions 9 PAN’16 - The author obfuscation shared task allowed to shed light on the robustness of state-of-the-art author identification and author profiling techniques against author obfuscation technology. - New corpora have been developed in multiple languages: English, Spanish, Dutch. - PAN/FIRE: - A shared task on plagiarism detection on texts written in Farsi. - A shared task on author profiling on personality recognition in source code.
  • 10. See you on Tuesday and Wednesday 10 PAN’16 Rui Sousa-Silva Universidade do Porto Tuesday 6th Sept. 13:30 - 15:30 Wednesday 7th Sept. 13:30 - 15:30 16:15 - 18:15 Sponsors
  • 11. 11 PAN’16 On behalf of the PAN lab organisers: Thank you very much for participating and hope to see you next year!!