In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
1. 21st International Conference on Mining Software Repositories
Incivility in Open Source Projects:
A Comprehensive Annotated Dataset of Locked GitHub
Issue Threads
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
Drexel University
Preprint: https://arxiv.org/abs/2402.04183
Virginia Commonwealth
University
Elmhurst University
imranm3@vcu.edu
2. Motivation and Research Objective
● Fostering healthy collaborations in OSS is challenging
● Understanding and addressing incivility within OSS
discussions
● A lack of a comprehensive approach to address uncivil
interactions
● Lack of large annotated SE datasets
Research Objective: Curating a dataset of locked GitHub
issues enables analyzing incivility in OSS development
Annotated dataset of locked GitHub issue threads with heated discussions
3. Dataset Annotation
● 404 Locked issue threads from 213 GitHub projects, and 5,961
Individual comments
● Locked as "too heated" or demonstrated clear characteristics
indicative of heated discussions
● A total of 19 annotators
● To further improve the annotation quality, we used GPT-4
● Manually checked the instances of disagreements between GPT-4
and annotators
4. ● Tone Bearing Discussion Features (TBDFs), uncivil features*
○ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc
● Triggers*
○ Failed use of code, Technical disagreements, Communication breakdown, etc
● Targets*
○ People, Code/Tool, Company/organization, Undirected
● Consequences*
○ Discontinued further discussion, Escalating further, etc
*
C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022
*
Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021
*
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023
*
Our open coding process
Annotated Features
5. Dataset Description
● 1,365 comments annotated with an uncivil feature
● Bitter frustration, Impatience, and Mocking are the most prevalent
TBDFs
● Failed use of tool/code or error messages the most common Trigger
● People are the most common Target
● Discontinued further discussion is the most common Consequence
6. ● A curated dataset of 404 locked issue threads
from 213 GitHub projects [Scan QR Code]
● Bitter frustration, Impatience, and Mocking
are the most prevalent TBDFs
● Failed use of tool/code or error messages
the most common trigger
● People are the most common target
● Discontinued further discussion is the most
common consequence
Preprint: https://arxiv.org/abs/2307.15631
ramtin.ehsani@drexel.edu
Preprint: https://arxiv.org/abs/2402.04183
imranm3@vcu.edu
Summary Research Directions
● Automated moderation bot development
● Impact of incivility on project health
● Effectiveness of moderation strategies
● Early warning systems development
● Underrepresented communities'
experiences
● Predicting heated thread locking
● Identifying productive intervention points