SlideShare a Scribd company logo
PR-SOCO
Personality Recognition in
SOurce COde
PAN@FIRE 2016
Kolkata, 8-10 December
Francisco Rangel
Autoritas Consulting
Paolo Rosso
PRHLT - Universitat Politècnica
de Valencia - Spain
Fabio A. González & Felipe Restrepo-Calle
MindLab - Universidad Nacional Colombia
Manuel Montes
INAOE - Mexico
Introduction
Author profiling aims at identifying
personal traits such as age, gender,
native language or personality traits from
writings.
This is crucial for:
- Marketing
- Security
- Forensics
2
PAN@FIRE’16PR-SOCO
Task goal
To predict Personality Traits from
Source Codes.
This is crucial for:
- Human resources management
for IT departments.
3
PAN@FIRE’16PR-SOCO
Corpus
PAN@FIRE’16PR-SOCO
SOURCE CODES
2,492
AUTHORS
70
TRAINING TEST
49 21
● Java programs by computer science students at
Universidad Nacional de Colombia
● Allowed:
○ Multipe uploads of the same code
○ Errors (compiler output, debug information, source
codes in other languages such as Python…)
Evaluation measures
5
Two complementary measures per trait:
● Root Mean Squared Error to measure the goodness of
the approaches.
● Pearson Product-Moment Correlation to measure the
random chance effect.
PAN@FIRE’16PR-SOCO
48 runs
11 participants
9 accepted papers
7 countries 6
Republic of
Korea
PAN@FIRE’16PR-SOCO
Approaches - Features
7
Bag of Words, word n-gams or char n-grams Besumich, Gimenez, Besumich
Word vectors (skip-thought encoding) Lee
Byte streams Doval
ToneAnalyzed Montejo
Code structure (ANTLR syntax) Bilan, Castellanos
Specific features related to coding style
- Length of the program, length of the classes...
- Average length of variable names, class
names…
- Number of methods per class, ...
- Frequency of comments and length
- Identation, code layout, …
Bilan, Delair, Gimenez, HHU, Kumar, Uaemex
Halstead metrics (software engineering metrics) Castellanos
PAN@FIRE’16PR-SOCO
+ 2 baselines: char 3-grams and the observed mean.
Approaches - Methods
8
Logistic regression Lee, Gimenez
Lasso regression Besumich
Support vector regression Castellanos, Delair, Uaemex
Extra trees regression Castellanos
Gaussian processes Delair
M5, M5 rules Delair
Random trees Delair
Neural networks Doval, Uaemex
Linear regression HHU, Kumar
Nearest neighbour HHU, Uaemex
Symbolic regression Uaemex
PAN@FIRE’16PR-SOCO
RMSE distribution
9
PAN@FIRE’16PR-SOCO
Too many outliers with poor performance...
RMSE distribution (without outliers)
10
PAN@FIRE’16PR-SOCO
The best results (state of the art) The lowest sparsity
Pearson distribution
11
PAN@FIRE’16PR-SOCO
● Results much similar than for RMSE
● The average value is poor (lower than 0.3)
Neuroticism
12
PAN@FIRE’16PR-SOCO
Extroversion
13
PAN@FIRE’16PR-SOCO
Openness
14
PAN@FIRE’16PR-SOCO
Agreableness
15
PAN@FIRE’16PR-SOCO
Conscientiousness
16
PAN@FIRE’16PR-SOCO
Conclusions
● The task aimed at identifying big five personality traits from Java source codes.
● There have been 11 participants sending 48 runs.
● Two complementary measures were used:
○ RMSE: overall score of the performance.
○ Pearson Product-Moment Correlation: whether the performance is due to
random chance.
● Wrt. results:
○ Quite similar in terms of Pearson for all traits.
○ Higher differences wrt. RMSE: the best results for openness (6.95)
● Several different features:
○ Generic (word and character n-grams) vs. specific (obtained by parsing the code,
analysing its structure, style or comments)
○ Generic features obtained competitive results in terms of RMSE...
○ … but with lower Pearson values.
○ They seemed to be less robust.
● Baselines obtained low RMSE with low Pearson -> this highlights the need of using
both complementary measures.
17
PAN@FIRE’16PR-SOCO
18
On behalf of the PR-SOCO task organisers:
Thank you very much for participating
and hope to see you next year!!
PAN@FIRE’16PR-SOCO

More Related Content

Viewers also liked

TpM2015: Shadow hospitality: the view of the hoteliers
TpM2015: Shadow hospitality: the view of the hoteliersTpM2015: Shadow hospitality: the view of the hoteliers
TpM2015: Shadow hospitality: the view of the hoteliers
Tourism professional Meeting TpM @ HES-SO Valais
 
London Fire Brigade - Fire Resistance CPD Presentation
London Fire Brigade - Fire Resistance CPD PresentationLondon Fire Brigade - Fire Resistance CPD Presentation
London Fire Brigade - Fire Resistance CPD Presentation
Danny Hopkin
 
Cutting Extinguishing Method Use Cases
Cutting Extinguishing Method Use CasesCutting Extinguishing Method Use Cases
Cutting Extinguishing Method Use Cases
Anders Trewe
 
High Challenge Warehouse case study
High Challenge Warehouse case studyHigh Challenge Warehouse case study
High Challenge Warehouse case study
National Fire Protection Association (NFPA)
 
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire ModelModeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
dharmakarma
 
A Hands-On Guide for Inspection & Maintenance
A Hands-On Guide for Inspection & MaintenanceA Hands-On Guide for Inspection & Maintenance
A Hands-On Guide for Inspection & Maintenance
Fire Equipment Manufacturers' Association
 
Introduction to bye laws
Introduction to bye lawsIntroduction to bye laws
Introduction to bye laws
Nitin Thakral
 
National building codes 2005 history overview
National building codes 2005 history overviewNational building codes 2005 history overview
National building codes 2005 history overview
Shourya Puri
 
Dr. B. Krishnamurthy medicall 2011 fms
Dr. B. Krishnamurthy medicall 2011 fmsDr. B. Krishnamurthy medicall 2011 fms
Dr. B. Krishnamurthy medicall 2011 fms
Satishkumar Durairajan
 
Upgradation in Hotel & Guest Security
Upgradation in Hotel & Guest SecurityUpgradation in Hotel & Guest Security
Upgradation in Hotel & Guest Security
Mudit Grover
 
The Threats of Lightweight Construction and Modern Furnishings to Firefighters
The Threats of Lightweight Construction and Modern Furnishings to FirefightersThe Threats of Lightweight Construction and Modern Furnishings to Firefighters
The Threats of Lightweight Construction and Modern Furnishings to Firefighters
National Fire Protection Association (NFPA)
 

Viewers also liked (14)

Full dissertation
Full dissertationFull dissertation
Full dissertation
 
TpM2015: Shadow hospitality: the view of the hoteliers
TpM2015: Shadow hospitality: the view of the hoteliersTpM2015: Shadow hospitality: the view of the hoteliers
TpM2015: Shadow hospitality: the view of the hoteliers
 
G
GG
G
 
London Fire Brigade - Fire Resistance CPD Presentation
London Fire Brigade - Fire Resistance CPD PresentationLondon Fire Brigade - Fire Resistance CPD Presentation
London Fire Brigade - Fire Resistance CPD Presentation
 
Chapter 05
Chapter 05Chapter 05
Chapter 05
 
Cutting Extinguishing Method Use Cases
Cutting Extinguishing Method Use CasesCutting Extinguishing Method Use Cases
Cutting Extinguishing Method Use Cases
 
High Challenge Warehouse case study
High Challenge Warehouse case studyHigh Challenge Warehouse case study
High Challenge Warehouse case study
 
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire ModelModeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Model
 
A Hands-On Guide for Inspection & Maintenance
A Hands-On Guide for Inspection & MaintenanceA Hands-On Guide for Inspection & Maintenance
A Hands-On Guide for Inspection & Maintenance
 
Introduction to bye laws
Introduction to bye lawsIntroduction to bye laws
Introduction to bye laws
 
National building codes 2005 history overview
National building codes 2005 history overviewNational building codes 2005 history overview
National building codes 2005 history overview
 
Dr. B. Krishnamurthy medicall 2011 fms
Dr. B. Krishnamurthy medicall 2011 fmsDr. B. Krishnamurthy medicall 2011 fms
Dr. B. Krishnamurthy medicall 2011 fms
 
Upgradation in Hotel & Guest Security
Upgradation in Hotel & Guest SecurityUpgradation in Hotel & Guest Security
Upgradation in Hotel & Guest Security
 
The Threats of Lightweight Construction and Modern Furnishings to Firefighters
The Threats of Lightweight Construction and Modern Furnishings to FirefightersThe Threats of Lightweight Construction and Modern Furnishings to Firefighters
The Threats of Lightweight Construction and Modern Furnishings to Firefighters
 

Similar to PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)

RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
Francisco Manuel Rangel Pardo
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
semanticsconference
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
PeterMorrell4
 
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
Jim Salmons
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM Watson
Subhendu Dey
 

Similar to PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016) (6)

RusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRERusProfiling Gender Identification in Russian Texts PAN@FIRE
RusProfiling Gender Identification in Russian Texts PAN@FIRE
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...
 
Introduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM WatsonIntroduction to Cognitive Computing the science behind and use of IBM Watson
Introduction to Cognitive Computing the science behind and use of IBM Watson
 

More from Francisco Manuel Rangel Pardo

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Francisco Manuel Rangel Pardo
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Francisco Manuel Rangel Pardo
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Francisco Manuel Rangel Pardo
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
Francisco Manuel Rangel Pardo
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
Francisco Manuel Rangel Pardo
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Francisco Manuel Rangel Pardo
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Francisco Manuel Rangel Pardo
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Francisco Manuel Rangel Pardo
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Francisco Manuel Rangel Pardo
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Francisco Manuel Rangel Pardo
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
Francisco Manuel Rangel Pardo
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
Francisco Manuel Rangel Pardo
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
Francisco Manuel Rangel Pardo
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
Francisco Manuel Rangel Pardo
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
Francisco Manuel Rangel Pardo
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
Francisco Manuel Rangel Pardo
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
Francisco Manuel Rangel Pardo
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
Francisco Manuel Rangel Pardo
 
EmoGraph for Age and Gender Identification
EmoGraph for Age and Gender IdentificationEmoGraph for Age and Gender Identification
EmoGraph for Age and Gender Identification
Francisco Manuel Rangel Pardo
 

More from Francisco Manuel Rangel Pardo (20)

Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
Profiling Irony and Stereotype Spreaders on Twitter (IROSTEREO)
 
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
Overview of the 9th Author Profiling task at PAN: Profiling Hate Speech Sprea...
 
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
Overview of the 8th Author Profiling task at PAN: Profiling Fake News Spreade...
 
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling  ...
Overview of the 7th Author Profiling task at PAN: Bots and Gender Profiling ...
 
AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019AL4Trust - Artificial Intelligence for Building Trust 2019
AL4Trust - Artificial Intelligence for Building Trust 2019
 
Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.Author Profiling en Social Media. En la Academia... y en la Industria.
Author Profiling en Social Media. En la Academia... y en la Industria.
 
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
Multimodal Stance Detection in Tweets on Catalan #1Oct Referendum @Ibereval 2...
 
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
Overview of the 6th Author Profiling task at PAN: Multimodal Gender Identific...
 
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
Stance and Gender Detection in Tweets on Catalan Independence. Ibereval@SEPLN...
 
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
Gender and Language Variety Identification in Twitter. Overview of the 5th. A...
 
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016Overview of the 4th. Author Profiling task at PAN-CLEF 2016
Overview of the 4th. Author Profiling task at PAN-CLEF 2016
 
Redes sociales y preadolescentes
Redes sociales y preadolescentesRedes sociales y preadolescentes
Redes sociales y preadolescentes
 
AL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building TrustAL4Trust - Artificial Intelligence for Building Trust
AL4Trust - Artificial Intelligence for Building Trust
 
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
El Futuro de las Comunicaciones Personales a Través de los Dispositivos Móvil...
 
Smart Listening - MUIinf
Smart Listening - MUIinfSmart Listening - MUIinf
Smart Listening - MUIinf
 
IA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidadIA + Big Data = problema + oportunidad
IA + Big Data = problema + oportunidad
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015Author Profiling task at PAN Lab at CLEF 2015
Author Profiling task at PAN Lab at CLEF 2015
 
EmoGraph for Age and Gender Identification
EmoGraph for Age and Gender IdentificationEmoGraph for Age and Gender Identification
EmoGraph for Age and Gender Identification
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)

  • 1. PR-SOCO Personality Recognition in SOurce COde PAN@FIRE 2016 Kolkata, 8-10 December Francisco Rangel Autoritas Consulting Paolo Rosso PRHLT - Universitat Politècnica de Valencia - Spain Fabio A. González & Felipe Restrepo-Calle MindLab - Universidad Nacional Colombia Manuel Montes INAOE - Mexico
  • 2. Introduction Author profiling aims at identifying personal traits such as age, gender, native language or personality traits from writings. This is crucial for: - Marketing - Security - Forensics 2 PAN@FIRE’16PR-SOCO
  • 3. Task goal To predict Personality Traits from Source Codes. This is crucial for: - Human resources management for IT departments. 3 PAN@FIRE’16PR-SOCO
  • 4. Corpus PAN@FIRE’16PR-SOCO SOURCE CODES 2,492 AUTHORS 70 TRAINING TEST 49 21 ● Java programs by computer science students at Universidad Nacional de Colombia ● Allowed: ○ Multipe uploads of the same code ○ Errors (compiler output, debug information, source codes in other languages such as Python…)
  • 5. Evaluation measures 5 Two complementary measures per trait: ● Root Mean Squared Error to measure the goodness of the approaches. ● Pearson Product-Moment Correlation to measure the random chance effect. PAN@FIRE’16PR-SOCO
  • 6. 48 runs 11 participants 9 accepted papers 7 countries 6 Republic of Korea PAN@FIRE’16PR-SOCO
  • 7. Approaches - Features 7 Bag of Words, word n-gams or char n-grams Besumich, Gimenez, Besumich Word vectors (skip-thought encoding) Lee Byte streams Doval ToneAnalyzed Montejo Code structure (ANTLR syntax) Bilan, Castellanos Specific features related to coding style - Length of the program, length of the classes... - Average length of variable names, class names… - Number of methods per class, ... - Frequency of comments and length - Identation, code layout, … Bilan, Delair, Gimenez, HHU, Kumar, Uaemex Halstead metrics (software engineering metrics) Castellanos PAN@FIRE’16PR-SOCO + 2 baselines: char 3-grams and the observed mean.
  • 8. Approaches - Methods 8 Logistic regression Lee, Gimenez Lasso regression Besumich Support vector regression Castellanos, Delair, Uaemex Extra trees regression Castellanos Gaussian processes Delair M5, M5 rules Delair Random trees Delair Neural networks Doval, Uaemex Linear regression HHU, Kumar Nearest neighbour HHU, Uaemex Symbolic regression Uaemex PAN@FIRE’16PR-SOCO
  • 9. RMSE distribution 9 PAN@FIRE’16PR-SOCO Too many outliers with poor performance...
  • 10. RMSE distribution (without outliers) 10 PAN@FIRE’16PR-SOCO The best results (state of the art) The lowest sparsity
  • 11. Pearson distribution 11 PAN@FIRE’16PR-SOCO ● Results much similar than for RMSE ● The average value is poor (lower than 0.3)
  • 17. Conclusions ● The task aimed at identifying big five personality traits from Java source codes. ● There have been 11 participants sending 48 runs. ● Two complementary measures were used: ○ RMSE: overall score of the performance. ○ Pearson Product-Moment Correlation: whether the performance is due to random chance. ● Wrt. results: ○ Quite similar in terms of Pearson for all traits. ○ Higher differences wrt. RMSE: the best results for openness (6.95) ● Several different features: ○ Generic (word and character n-grams) vs. specific (obtained by parsing the code, analysing its structure, style or comments) ○ Generic features obtained competitive results in terms of RMSE... ○ … but with lower Pearson values. ○ They seemed to be less robust. ● Baselines obtained low RMSE with low Pearson -> this highlights the need of using both complementary measures. 17 PAN@FIRE’16PR-SOCO
  • 18. 18 On behalf of the PR-SOCO task organisers: Thank you very much for participating and hope to see you next year!! PAN@FIRE’16PR-SOCO