Author profiling aims at identifying personal traits such as age, gender, native language or personality traits from writings. PR-SOCO task at PAN@FIRE goal is to predict Personality Traits from Source Codes.
Overview of the PAN laboratory at CLEF 2016 in Évora.
It presents an overview on new challenges for authorship analysis from the perspectives of the cross-genre author profiling, author clustering and diarization, and author obfuscation.
Fire and Forest Dynamics in Northern Boreal Forestsakfireconsortium
This webinar was presented by Jill Johnston on Oct 28, 2010. For more information about this webinar, visit the Alaska Fire Science Consortium website at http://akfireconsortium.uaf.edu
Overview of the PAN laboratory at CLEF 2016 in Évora.
It presents an overview on new challenges for authorship analysis from the perspectives of the cross-genre author profiling, author clustering and diarization, and author obfuscation.
Fire and Forest Dynamics in Northern Boreal Forestsakfireconsortium
This webinar was presented by Jill Johnston on Oct 28, 2010. For more information about this webinar, visit the Alaska Fire Science Consortium website at http://akfireconsortium.uaf.edu
On February 18, 2010, Richard Gallagher of Zurich presented the keynote presentation at the Fire Protection Research Foundation’s SUPDET 2010 event where he summarized the presentations of the previous day. Seven leading engineering firms presented their ideas on how best to protect a high challenge warehouse from fire.
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Modeldharmakarma
In this presentation, we describe a mathematical model for modeling the stochasticity of firing neurons based on a modified integrate and fire model that incorporates gap junction potential.
Introduction
A hotel is “Home away from Home”.
A place where a bonafide traveler can receive food & shelter.
Security of guest & his property is of great concern for the hotel.
The management of any place of work are legally bound to provide a hazard-free, safe and secure environment to their employees.
One of the basic need of the hotel to plan safety and security plan for the hotel, its property & belongings.
At the same time is able to plan an efficient & effective system for guests & his belongings in terms of protection from mishaps, such as fire, theft etc.
Types of Security
Internal Security
Against theft, fire security, proper lighting.
External Security
Proper fencing of the building.
Fencing of pool area to avoid accidents in night.
Manning of service gates to restrict entry.
Staff
Identification of staff
Locker Inspection
Inventory records of different amenities.
Trash handling
Guest
Taking care of scanty baggage guest.
Keeping check of room, if guest has stolen or taken something along with him.
Threats in Hotel
Hotel’s Guardsmen
Upgradation in Technology
Advanced CCTV Cameras:
Clear Night Vision
High Resolution camera
Auto focus OR Face Recognition feature
Tag and Track system
Sound Recognition
Gait Recognition
Monitoring activity with software
Upgradation in Technology
ZAPLOX integrates mobile key with ASSA ABLOY locking:
1 Application, Multi- Functions.
Mobile access functionality for guests through RFID technology.
Key distribution is very easy.
Includes mobile check-in and check-out, room upgrades, direct bookings, special offers and more.
Mobile keys are highly secure, since a guest's Smartphone is less likely to be misplaced than a plastic keycard.
Upgradation in Technology
Upgraded Fire Alarm system:
Multi-criteria detectors can be set to varying degrees of sensitivity.
Lets management or security check the area before sounding a general evacuation alarm throughout the property.
When several detectors within an area are triggered, the fire alarm system can be programmed to initiate a full evacuation.
Same device that monitors both: Smoke and Fire.
The dual fire and CO detectors reduce overall installation time and material costs.
Upgradation in Safety Measures
Lift usage:
People entering the lobby and taking the lift to any floor must be stopped.
Lifts should be programmed.
Swiping room card in the lift and then lift will automatically take them to particular floor.
Managers providing a sense of ownership to employees:
Security will be much tighter.
Giving them more responsibility.
Creating a sense of ownership by profit sharing.
More aware staff is the need of the hour:
Staff is more interactive with guests.
Staff monitoring the body language of the guests with unusual behavior.
Trainings of safety and security measures more frequently.
Staff regularly updated with the evacuation plans.
More attentive
NFPA has created a Powerpoint presentation that you can use to help educate your community's decision-makers and the public about the dangers of lightweight construction materials under fire conditions. It features the stories of two incidents in which firefighters were killed or seriously injured in homes built according to the lightweight construction model. The presentation also includes data that shows that home fire sprinklers lessen the dangers posed by lightweight construction.
In these slides, the overview of the RusProfiling shared task at PAN@FIRE 2017 in Bangalore, India.
This year task aimed at gender identification in Russian texts in a cross-genre perspective: training on Twitter, evaluating on Twitter, Facebook, reviews, essays and gender-imitated texts.
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...Jim Salmons
I wrote this article and its embedded program in 1985. Dennis Bollay, President of ExperTelligence, presented the 'MOE The Bartender' program in a new product demo of ExperOPS5 at the Apple Developer Conference, Artificial Intelligence Session on January 15, 1986, in San Francisco, CA USA. :-)
The "WOW STFU!" feature of the demo was MOE's vocalizing his activity -- including belting out verses to '99 Bottles of Beer on the Wall' -- by exercising the new Macintosh MacinTalk feature.
Introduction to Cognitive Computing the science behind and use of IBM WatsonSubhendu Dey
The lecture was given in a Cognitive and Analytics workshop at Indian Institute of Management. Topics covered was -
1) Understanding Natural Language Processing, Classification, Watson & its modules
2) Industry applications of Cognitive Computing
3) Understanding Cognitive Architecture
4) Understanding the disciplines / tools being used in Cognitive Science
These are the slides of the overview of the ninth Author Profiling task at PAN-CLEF 2022 presented online. This year task aimed at Profiling Irony and Stereotype Spreaders.
These are the slides of the overview of the ninth Author Profiling task at PAN-CLEF 2021 presented online. This year task aimed at Profiling Hate Speech Spreaders on Twitter.
On February 18, 2010, Richard Gallagher of Zurich presented the keynote presentation at the Fire Protection Research Foundation’s SUPDET 2010 event where he summarized the presentations of the previous day. Seven leading engineering firms presented their ideas on how best to protect a high challenge warehouse from fire.
Modeling Stochasticity and Gap Junction Dynamics: Integrate and Fire Modeldharmakarma
In this presentation, we describe a mathematical model for modeling the stochasticity of firing neurons based on a modified integrate and fire model that incorporates gap junction potential.
Introduction
A hotel is “Home away from Home”.
A place where a bonafide traveler can receive food & shelter.
Security of guest & his property is of great concern for the hotel.
The management of any place of work are legally bound to provide a hazard-free, safe and secure environment to their employees.
One of the basic need of the hotel to plan safety and security plan for the hotel, its property & belongings.
At the same time is able to plan an efficient & effective system for guests & his belongings in terms of protection from mishaps, such as fire, theft etc.
Types of Security
Internal Security
Against theft, fire security, proper lighting.
External Security
Proper fencing of the building.
Fencing of pool area to avoid accidents in night.
Manning of service gates to restrict entry.
Staff
Identification of staff
Locker Inspection
Inventory records of different amenities.
Trash handling
Guest
Taking care of scanty baggage guest.
Keeping check of room, if guest has stolen or taken something along with him.
Threats in Hotel
Hotel’s Guardsmen
Upgradation in Technology
Advanced CCTV Cameras:
Clear Night Vision
High Resolution camera
Auto focus OR Face Recognition feature
Tag and Track system
Sound Recognition
Gait Recognition
Monitoring activity with software
Upgradation in Technology
ZAPLOX integrates mobile key with ASSA ABLOY locking:
1 Application, Multi- Functions.
Mobile access functionality for guests through RFID technology.
Key distribution is very easy.
Includes mobile check-in and check-out, room upgrades, direct bookings, special offers and more.
Mobile keys are highly secure, since a guest's Smartphone is less likely to be misplaced than a plastic keycard.
Upgradation in Technology
Upgraded Fire Alarm system:
Multi-criteria detectors can be set to varying degrees of sensitivity.
Lets management or security check the area before sounding a general evacuation alarm throughout the property.
When several detectors within an area are triggered, the fire alarm system can be programmed to initiate a full evacuation.
Same device that monitors both: Smoke and Fire.
The dual fire and CO detectors reduce overall installation time and material costs.
Upgradation in Safety Measures
Lift usage:
People entering the lobby and taking the lift to any floor must be stopped.
Lifts should be programmed.
Swiping room card in the lift and then lift will automatically take them to particular floor.
Managers providing a sense of ownership to employees:
Security will be much tighter.
Giving them more responsibility.
Creating a sense of ownership by profit sharing.
More aware staff is the need of the hour:
Staff is more interactive with guests.
Staff monitoring the body language of the guests with unusual behavior.
Trainings of safety and security measures more frequently.
Staff regularly updated with the evacuation plans.
More attentive
NFPA has created a Powerpoint presentation that you can use to help educate your community's decision-makers and the public about the dangers of lightweight construction materials under fire conditions. It features the stories of two incidents in which firefighters were killed or seriously injured in homes built according to the lightweight construction model. The presentation also includes data that shows that home fire sprinklers lessen the dangers posed by lightweight construction.
In these slides, the overview of the RusProfiling shared task at PAN@FIRE 2017 in Bangalore, India.
This year task aimed at gender identification in Russian texts in a cross-genre perspective: training on Twitter, evaluating on Twitter, Facebook, reviews, essays and gender-imitated texts.
ExperOPS5: A Rule-based, Data-driven Production System Language Puts a Mind b...Jim Salmons
I wrote this article and its embedded program in 1985. Dennis Bollay, President of ExperTelligence, presented the 'MOE The Bartender' program in a new product demo of ExperOPS5 at the Apple Developer Conference, Artificial Intelligence Session on January 15, 1986, in San Francisco, CA USA. :-)
The "WOW STFU!" feature of the demo was MOE's vocalizing his activity -- including belting out verses to '99 Bottles of Beer on the Wall' -- by exercising the new Macintosh MacinTalk feature.
Introduction to Cognitive Computing the science behind and use of IBM WatsonSubhendu Dey
The lecture was given in a Cognitive and Analytics workshop at Indian Institute of Management. Topics covered was -
1) Understanding Natural Language Processing, Classification, Watson & its modules
2) Industry applications of Cognitive Computing
3) Understanding Cognitive Architecture
4) Understanding the disciplines / tools being used in Cognitive Science
Similar to PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016) (6)
These are the slides of the overview of the ninth Author Profiling task at PAN-CLEF 2022 presented online. This year task aimed at Profiling Irony and Stereotype Spreaders.
These are the slides of the overview of the ninth Author Profiling task at PAN-CLEF 2021 presented online. This year task aimed at Profiling Hate Speech Spreaders on Twitter.
These are the slides of the overview of the eighth Author Profiling task at PAN-CLEF 2020 presented online. This year task aimed at Profiling Fake News spreaders on Twitter
These are the slides of the overview of the fourth Author Profiling task at PAN-CLEF 2019 presented in Lugano. This year task aimed at discriminating bots from humans in Twitter accounts, and in the case of humans, between males and females.
AL4Trust is the title of the speech given in the Applications of the Computational Linguistics subject at MIARFID'19 degree in Artificial Intelligence, Pattern Recognition and Digital Imaging at Universitat Politècnica de València.
It shows the importance of the artificial intelligence technologies applied in big data environments as part of the six pillars of the digital transformation.
Diapositivas utilizadas en mi charla a los alumnos del máster Universitario en Sistemas Inteligentes de la Universitat Jaume I de Castellón. En la charla presento dos aproximaciones a los problemas de author profiling de identificación de sexo y edad, y de variedad del lenguaje, haciendo hincapié en la doble perspectiva universidad-empresa cuando se trata del rendimiento de los métodos aplicados: precisos y/o rápidos.
These are the slides of the overview of the fourth Author Profiling task at PAN-CLEF 2018 presented at Avignon. This year task aimed at multimodal (texts + images) gender identification of Twitter users.
In these slides, the overview of the fifth Author Profiling task at PAN-CLEF 2017 presented at Dublin.
This year task aimed at gender and language variety identification problems in Spanish, English, and as a novelty, Arabic and Portuguese.
These are the slides of the overview of the fourth Author Profiling task at PAN-CLEF 2017 presented at Evora. This year task aimed at cross-genre evaluation of the age and gender identification problems.
Cyberacoso (cyber bullying), cyberabuso (cyber grooming), la ballena azul, el abecedario del diablo, la privacidad en las redes sociales, lo perjudicial de estar siempre conectado las redes sociales, el postureo y la apariencia...
Las redes sociales son maravillosas, permiten una interconexión con el mundo impensable cuando algunos éramos pequeños, pero hay que tener ciertas precauciones y así se lo tenemos que hacer ver a nuestros (pre)adolescentes para que las usen con sentido y responsabilidad, y sean capaces de detectar y denunciar casos como los anteriores.
Esta charla fue dada a mi hija mayor y tres de mis sobrinas que, a priori, ya estaban de vuelta y media y creían que se lo sabían todo. Sus caras lo decían todo...
AL4Trust is the title of the speech given in the Applications of the Computational Linguistics subject at MIARFID'17 degree in Artificial Intelligence, Pattern Recognition and Digital Imaging at Universitat Politècnica de València.
It shows the importance of the artificial intelligence technologies applied in big data environments as part of the six pillars of the digital transformation.
Presentación de Autoritas en la mesa redonda de las jornadas Activa tu Futuro de la Universitat Politècnica de València sobre el futuro de las comunicaciones personales a través de los dispositivos móviles y su análisis mediante tecnologías big data.
El objetivo de las jornadas es dar a conocer los másteres de la UPV, como el master en Big Data donde Autoritas participa activamente. En esta ponencia mostramos las diferentes problemáticas a solucionar en la generación de inteligencia social de negocio y las oportunidades que se brindan a los profesionales que deseen activar su futuro en tecnologías de análisis del big data.
Ponencia sobre Escucha Inteligente en el Master Universitario en Ingeniería Informática (MUIinf). Como caso práctico se explica el geoposicionamiento basado en la identificación de variedad del lenguaje.
Ponencia realizada en la asignatura de Aplicaciones para la Lingüística Computacional de la edición del 2016 del Master en Inteligencia Artificial, Reconocimiento de Patrones e Imagen Digital de la Universitat Politècnica de València.
El objetivo de la ponencia es mostrar a los alumnos que lo que han estudiado en el master es de gran utilidad en la sociedad actual, tanto académica como empresarial, pero que cuando se encuentren en entornos reales, cada vez más relacionados con el big data, van a tener que lidiar con una serie de problemas y decisiones donde van a tener que equilibrar entre diferentes aspectos de la calidad de los resultados, lo que por otra parte les va a brindar enormes oportunidades de desarrollo profesional.
Language variety identification aims at labelling texts in a native lan- guage (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Ar- gentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our LDR method with common state-of-the-art representations and show an increase in accuracy of ∼35%. Furthermore, we compare LDR with two reference distributed representation models. Experimental results show com- petitive performance while dramatically reducing the dimensionality — and in- creasing the big data suitability — to only 6 features per variety. Additionally, we analyse the behaviour of the employed machine learning algorithms and the most discriminating features. Finally, we employ an alternative dataset to test the robustness of our low dimensionality representation with another set of similar languages.
Language variety identification is an author profiling subtask which aims to detect lexical and semantic variations in order to classify different varieties of the same language. In this work we approach the task by using distributed representations based on Mikolov et al. investigations.
Our aim is at investigating how people use the language, and especially how they convey verbal emotions, to determine their age and gender. We propose EmoGraph, a graph-based approach that captures how people use language and convey verbal emotions in order to identify their age and gender. Results are competitive with state-of-the-art ones and robust against languages and genres.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
PR-SOCO Personality Recognition in SOurce COde (PAN@FIRE 2016)
1. PR-SOCO
Personality Recognition in
SOurce COde
PAN@FIRE 2016
Kolkata, 8-10 December
Francisco Rangel
Autoritas Consulting
Paolo Rosso
PRHLT - Universitat Politècnica
de Valencia - Spain
Fabio A. González & Felipe Restrepo-Calle
MindLab - Universidad Nacional Colombia
Manuel Montes
INAOE - Mexico
2. Introduction
Author profiling aims at identifying
personal traits such as age, gender,
native language or personality traits from
writings.
This is crucial for:
- Marketing
- Security
- Forensics
2
PAN@FIRE’16PR-SOCO
3. Task goal
To predict Personality Traits from
Source Codes.
This is crucial for:
- Human resources management
for IT departments.
3
PAN@FIRE’16PR-SOCO
4. Corpus
PAN@FIRE’16PR-SOCO
SOURCE CODES
2,492
AUTHORS
70
TRAINING TEST
49 21
● Java programs by computer science students at
Universidad Nacional de Colombia
● Allowed:
○ Multipe uploads of the same code
○ Errors (compiler output, debug information, source
codes in other languages such as Python…)
5. Evaluation measures
5
Two complementary measures per trait:
● Root Mean Squared Error to measure the goodness of
the approaches.
● Pearson Product-Moment Correlation to measure the
random chance effect.
PAN@FIRE’16PR-SOCO
7. Approaches - Features
7
Bag of Words, word n-gams or char n-grams Besumich, Gimenez, Besumich
Word vectors (skip-thought encoding) Lee
Byte streams Doval
ToneAnalyzed Montejo
Code structure (ANTLR syntax) Bilan, Castellanos
Specific features related to coding style
- Length of the program, length of the classes...
- Average length of variable names, class
names…
- Number of methods per class, ...
- Frequency of comments and length
- Identation, code layout, …
Bilan, Delair, Gimenez, HHU, Kumar, Uaemex
Halstead metrics (software engineering metrics) Castellanos
PAN@FIRE’16PR-SOCO
+ 2 baselines: char 3-grams and the observed mean.
8. Approaches - Methods
8
Logistic regression Lee, Gimenez
Lasso regression Besumich
Support vector regression Castellanos, Delair, Uaemex
Extra trees regression Castellanos
Gaussian processes Delair
M5, M5 rules Delair
Random trees Delair
Neural networks Doval, Uaemex
Linear regression HHU, Kumar
Nearest neighbour HHU, Uaemex
Symbolic regression Uaemex
PAN@FIRE’16PR-SOCO
17. Conclusions
● The task aimed at identifying big five personality traits from Java source codes.
● There have been 11 participants sending 48 runs.
● Two complementary measures were used:
○ RMSE: overall score of the performance.
○ Pearson Product-Moment Correlation: whether the performance is due to
random chance.
● Wrt. results:
○ Quite similar in terms of Pearson for all traits.
○ Higher differences wrt. RMSE: the best results for openness (6.95)
● Several different features:
○ Generic (word and character n-grams) vs. specific (obtained by parsing the code,
analysing its structure, style or comments)
○ Generic features obtained competitive results in terms of RMSE...
○ … but with lower Pearson values.
○ They seemed to be less robust.
● Baselines obtained low RMSE with low Pearson -> this highlights the need of using
both complementary measures.
17
PAN@FIRE’16PR-SOCO
18. 18
On behalf of the PR-SOCO task organisers:
Thank you very much for participating
and hope to see you next year!!
PAN@FIRE’16PR-SOCO