Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

•

0 likes•4 views

In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.

Technology

21st International Conference on Mining Software Repositories
Incivility in Open Source Projects:
A Comprehensive Annotated Dataset of Locked GitHub
Issue Threads
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
Drexel University
Preprint: https://arxiv.org/abs/2402.04183
Virginia Commonwealth
University
Elmhurst University
imranm3@vcu.edu

Motivation and Research Objective
● Fostering healthy collaborations in OSS is challenging
● Understanding and addressing incivility within OSS
discussions
● A lack of a comprehensive approach to address uncivil
interactions
● Lack of large annotated SE datasets
Research Objective: Curating a dataset of locked GitHub
issues enables analyzing incivility in OSS development
Annotated dataset of locked GitHub issue threads with heated discussions

Dataset Annotation
● 404 Locked issue threads from 213 GitHub projects, and 5,961
Individual comments
● Locked as "too heated" or demonstrated clear characteristics
indicative of heated discussions
● A total of 19 annotators
● To further improve the annotation quality, we used GPT-4
● Manually checked the instances of disagreements between GPT-4
and annotators

● Tone Bearing Discussion Features (TBDFs), uncivil features*
○ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc
● Triggers*
○ Failed use of code, Technical disagreements, Communication breakdown, etc
● Targets*
○ People, Code/Tool, Company/organization, Undirected
● Consequences*
○ Discontinued further discussion, Escalating further, etc
*
C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022
*
Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021
*
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identiﬁcation of Toxic Code Reviews Using ToxiCR, 2023
*
Our open coding process
Annotated Features

Dataset Description
● 1,365 comments annotated with an uncivil feature
● Bitter frustration, Impatience, and Mocking are the most prevalent
TBDFs
● Failed use of tool/code or error messages the most common Trigger
● People are the most common Target
● Discontinued further discussion is the most common Consequence

● A curated dataset of 404 locked issue threads
from 213 GitHub projects [Scan QR Code]
● Bitter frustration, Impatience, and Mocking
are the most prevalent TBDFs
● Failed use of tool/code or error messages
the most common trigger
● People are the most common target
● Discontinued further discussion is the most
common consequence
Preprint: https://arxiv.org/abs/2307.15631
ramtin.ehsani@drexel.edu
Preprint: https://arxiv.org/abs/2402.04183
imranm3@vcu.edu
Summary Research Directions
● Automated moderation bot development
● Impact of incivility on project health
● Effectiveness of moderation strategies
● Early warning systems development
● Underrepresented communities'
experiences
● Predicting heated thread locking
● Identifying productive intervention points

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

API Workshop: Deep dive into code samplesTom Johnson

SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...South Tyrol Free Software Conference

Open Source: What is It?DuraSpace

Towards editorial transparency in computational journalismJennifer Stark

Europace's journey to InnerSourceEnrico Hartung

Open Collaboration and Peer Production: Technical Infrastructure and Communit...Sebastian Benthall

Designing and Implementing Search SolutionsFindwise

Andrew Moore past-present-potentialLancaster University Library

Introduction to License Compliance and My research (D. German)dmgerman

Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative

Software Mining and Software DatasetsTao Xie

PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea

Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki

"Hands Off! Best Practices for Code Hand Offs"Naomi Dushay

Providing Services to our Remote Users: Open Source SolutionsNicole C. Engard

Open Source Security and ChatGPT-Published.pdfJavier Perez

Operationalisation of Collaboration Sunbelt 2015Dawn Foster

Icpc16.pptPtidej Team

Icpc16.pptYann-Gaël Guéhéneuc

Open source 101 for studentsSage Sharp

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads (20)

API Workshop: Deep dive into code samples

SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...

Open Source: What is It?

Towards editorial transparency in computational journalism

Europace's journey to InnerSource

Open Collaboration and Peer Production: Technical Infrastructure and Communit...

Designing and Implementing Search Solutions

Andrew Moore past-present-potential

Introduction to License Compliance and My research (D. German)

Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013

Software Mining and Software Datasets

PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...

Voxxed Days Thessaloniki 2016 - Documentation Avoidance

"Hands Off! Best Practices for Code Hand Offs"

Providing Services to our Remote Users: Open Source Solutions

Open Source Security and ChatGPT-Published.pdf

Operationalisation of Collaboration Sunbelt 2015

Icpc16.ppt

Open source 101 for students

Recently uploaded

2024 May Patch TuesdayIvanti

The Metaverse: Are We There Yet?Mark Billinghurst

Top 10 CodeIgniter Development CompaniesTopCSSGallery

How to Check CNIC Information Online with Pakdata cfdanishmna97

Google I/O Extended 2024 WarsawGDSC PJATK

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc

Working together SRE & Platform EngineeringMarcus Vechiato

Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance

Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier

WebAssembly is Key to Better LLM PerformanceSamy Fodil

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier

Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance

TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda

Overview of Hyperledger FoundationHyperleger Tokyo Meetup

Generative AI Use Cases and Applications.pdfalexjohnson7307

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB

Recently uploaded (20)

2024 May Patch Tuesday

The Metaverse: Are We There Yet?

Top 10 CodeIgniter Development Companies

How to Check CNIC Information Online with Pakdata cf

Google I/O Extended 2024 Warsaw

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

Working together SRE & Platform Engineering

Introduction to FIDO Authentication and Passkeys.pptx

Design and Development of a Provenance Capture Platform for Data Science

WebAssembly is Key to Better LLM Performance

(Explainable) Data-Centric AI: what are you explaininhg, and to whom?

Hyatt driving innovation and exceptional customer experiences with FIDO passw...

TopCryptoSupers 12thReport OrionX May2024

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...

Overview of Hyperledger Foundation

Generative AI Use Cases and Applications.pdf

Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

1. 21st International Conference on Mining Software Repositories Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee Drexel University Preprint: https://arxiv.org/abs/2402.04183 Virginia Commonwealth University Elmhurst University imranm3@vcu.edu

2. Motivation and Research Objective ● Fostering healthy collaborations in OSS is challenging ● Understanding and addressing incivility within OSS discussions ● A lack of a comprehensive approach to address uncivil interactions ● Lack of large annotated SE datasets Research Objective: Curating a dataset of locked GitHub issues enables analyzing incivility in OSS development Annotated dataset of locked GitHub issue threads with heated discussions

3. Dataset Annotation ● 404 Locked issue threads from 213 GitHub projects, and 5,961 Individual comments ● Locked as "too heated" or demonstrated clear characteristics indicative of heated discussions ● A total of 19 annotators ● To further improve the annotation quality, we used GPT-4 ● Manually checked the instances of disagreements between GPT-4 and annotators

4. ● Tone Bearing Discussion Features (TBDFs), uncivil features* ○ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc ● Triggers* ○ Failed use of code, Technical disagreements, Communication breakdown, etc ● Targets* ○ People, Code/Tool, Company/organization, Undirected ● Consequences* ○ Discontinued further discussion, Escalating further, etc * C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022 * Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021 * Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identiﬁcation of Toxic Code Reviews Using ToxiCR, 2023 * Our open coding process Annotated Features

5. Dataset Description ● 1,365 comments annotated with an uncivil feature ● Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs ● Failed use of tool/code or error messages the most common Trigger ● People are the most common Target ● Discontinued further discussion is the most common Consequence

6. ● A curated dataset of 404 locked issue threads from 213 GitHub projects [Scan QR Code] ● Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs ● Failed use of tool/code or error messages the most common trigger ● People are the most common target ● Discontinued further discussion is the most common consequence Preprint: https://arxiv.org/abs/2307.15631 ramtin.ehsani@drexel.edu Preprint: https://arxiv.org/abs/2402.04183 imranm3@vcu.edu Summary Research Directions ● Automated moderation bot development ● Impact of incivility on project health ● Effectiveness of moderation strategies ● Early warning systems development ● Underrepresented communities' experiences ● Predicting heated thread locking ● Identifying productive intervention points

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

Recommended

Recommended

More Related Content

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads (20)

Recently uploaded

Recently uploaded (20)

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads