SlideShare a Scribd company logo
1 of 6
Download to read offline
21st International Conference on Mining Software Repositories
Incivility in Open Source Projects:
A Comprehensive Annotated Dataset of Locked GitHub
Issue Threads
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
Drexel University
Preprint: https://arxiv.org/abs/2402.04183
Virginia Commonwealth
University
Elmhurst University
imranm3@vcu.edu
Motivation and Research Objective
● Fostering healthy collaborations in OSS is challenging
● Understanding and addressing incivility within OSS
discussions
● A lack of a comprehensive approach to address uncivil
interactions
● Lack of large annotated SE datasets
Research Objective: Curating a dataset of locked GitHub
issues enables analyzing incivility in OSS development
Annotated dataset of locked GitHub issue threads with heated discussions
Dataset Annotation
● 404 Locked issue threads from 213 GitHub projects, and 5,961
Individual comments
● Locked as "too heated" or demonstrated clear characteristics
indicative of heated discussions
● A total of 19 annotators
● To further improve the annotation quality, we used GPT-4
● Manually checked the instances of disagreements between GPT-4
and annotators
● Tone Bearing Discussion Features (TBDFs), uncivil features*
○ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc
● Triggers*
○ Failed use of code, Technical disagreements, Communication breakdown, etc
● Targets*
○ People, Code/Tool, Company/organization, Undirected
● Consequences*
○ Discontinued further discussion, Escalating further, etc
*
C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022
*
Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021
*
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023
*
Our open coding process
Annotated Features
Dataset Description
● 1,365 comments annotated with an uncivil feature
● Bitter frustration, Impatience, and Mocking are the most prevalent
TBDFs
● Failed use of tool/code or error messages the most common Trigger
● People are the most common Target
● Discontinued further discussion is the most common Consequence
● A curated dataset of 404 locked issue threads
from 213 GitHub projects [Scan QR Code]
● Bitter frustration, Impatience, and Mocking
are the most prevalent TBDFs
● Failed use of tool/code or error messages
the most common trigger
● People are the most common target
● Discontinued further discussion is the most
common consequence
Preprint: https://arxiv.org/abs/2307.15631
ramtin.ehsani@drexel.edu
Preprint: https://arxiv.org/abs/2402.04183
imranm3@vcu.edu
Summary Research Directions
● Automated moderation bot development
● Impact of incivility on project health
● Effectiveness of moderation strategies
● Early warning systems development
● Underrepresented communities'
experiences
● Predicting heated thread locking
● Identifying productive intervention points

More Related Content

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

API Workshop: Deep dive into code samples
API Workshop: Deep dive into code samplesAPI Workshop: Deep dive into code samples
API Workshop: Deep dive into code samplesTom Johnson
 
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...South Tyrol Free Software Conference
 
Open Source: What is It?
Open Source: What is It?Open Source: What is It?
Open Source: What is It?DuraSpace
 
Towards editorial transparency in computational journalism
Towards editorial transparency in computational journalismTowards editorial transparency in computational journalism
Towards editorial transparency in computational journalismJennifer Stark
 
Europace's journey to InnerSource
Europace's journey to InnerSourceEuropace's journey to InnerSource
Europace's journey to InnerSourceEnrico Hartung
 
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Sebastian Benthall
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)dmgerman
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"Naomi Dushay
 
Providing Services to our Remote Users: Open Source Solutions
Providing Services to our Remote Users: Open Source SolutionsProviding Services to our Remote Users: Open Source Solutions
Providing Services to our Remote Users: Open Source SolutionsNicole C. Engard
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfJavier Perez
 
Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Dawn Foster
 
Open source 101 for students
Open source 101 for studentsOpen source 101 for students
Open source 101 for studentsSage Sharp
 

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads (20)

API Workshop: Deep dive into code samples
API Workshop: Deep dive into code samplesAPI Workshop: Deep dive into code samples
API Workshop: Deep dive into code samples
 
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
 
Open Source: What is It?
Open Source: What is It?Open Source: What is It?
Open Source: What is It?
 
Towards editorial transparency in computational journalism
Towards editorial transparency in computational journalismTowards editorial transparency in computational journalism
Towards editorial transparency in computational journalism
 
Europace's journey to InnerSource
Europace's journey to InnerSourceEuropace's journey to InnerSource
Europace's journey to InnerSource
 
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Andrew Moore past-present-potential
Andrew Moore past-present-potentialAndrew Moore past-present-potential
Andrew Moore past-present-potential
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
 
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation AvoidanceVoxxed Days Thessaloniki 2016 - Documentation Avoidance
Voxxed Days Thessaloniki 2016 - Documentation Avoidance
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"
 
Providing Services to our Remote Users: Open Source Solutions
Providing Services to our Remote Users: Open Source SolutionsProviding Services to our Remote Users: Open Source Solutions
Providing Services to our Remote Users: Open Source Solutions
 
Open Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdfOpen Source Security and ChatGPT-Published.pdf
Open Source Security and ChatGPT-Published.pdf
 
Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015Operationalisation of Collaboration Sunbelt 2015
Operationalisation of Collaboration Sunbelt 2015
 
Icpc16.ppt
Icpc16.pptIcpc16.ppt
Icpc16.ppt
 
Icpc16.ppt
Icpc16.pptIcpc16.ppt
Icpc16.ppt
 
Open source 101 for students
Open source 101 for studentsOpen source 101 for students
Open source 101 for students
 

Recently uploaded

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 

Recently uploaded (20)

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

  • 1. 21st International Conference on Mining Software Repositories Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee Drexel University Preprint: https://arxiv.org/abs/2402.04183 Virginia Commonwealth University Elmhurst University imranm3@vcu.edu
  • 2. Motivation and Research Objective ● Fostering healthy collaborations in OSS is challenging ● Understanding and addressing incivility within OSS discussions ● A lack of a comprehensive approach to address uncivil interactions ● Lack of large annotated SE datasets Research Objective: Curating a dataset of locked GitHub issues enables analyzing incivility in OSS development Annotated dataset of locked GitHub issue threads with heated discussions
  • 3. Dataset Annotation ● 404 Locked issue threads from 213 GitHub projects, and 5,961 Individual comments ● Locked as "too heated" or demonstrated clear characteristics indicative of heated discussions ● A total of 19 annotators ● To further improve the annotation quality, we used GPT-4 ● Manually checked the instances of disagreements between GPT-4 and annotators
  • 4. ● Tone Bearing Discussion Features (TBDFs), uncivil features* ○ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc ● Triggers* ○ Failed use of code, Technical disagreements, Communication breakdown, etc ● Targets* ○ People, Code/Tool, Company/organization, Undirected ● Consequences* ○ Discontinued further discussion, Escalating further, etc * C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022 * Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021 * Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023 * Our open coding process Annotated Features
  • 5. Dataset Description ● 1,365 comments annotated with an uncivil feature ● Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs ● Failed use of tool/code or error messages the most common Trigger ● People are the most common Target ● Discontinued further discussion is the most common Consequence
  • 6. ● A curated dataset of 404 locked issue threads from 213 GitHub projects [Scan QR Code] ● Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs ● Failed use of tool/code or error messages the most common trigger ● People are the most common target ● Discontinued further discussion is the most common consequence Preprint: https://arxiv.org/abs/2307.15631 ramtin.ehsani@drexel.edu Preprint: https://arxiv.org/abs/2402.04183 imranm3@vcu.edu Summary Research Directions ● Automated moderation bot development ● Impact of incivility on project health ● Effectiveness of moderation strategies ● Early warning systems development ● Underrepresented communities' experiences ● Predicting heated thread locking ● Identifying productive intervention points