On the Analysis of Non-Coding
Roles in Open Source Development
Javier L. Cánovas Izquierdo, Jordi Cabot
Paper accepted at
EMPIRICAL SOFTWARE ENGINEERING 27, 18 (2022)
Published: November 2nd, 2021
An Empirical Study of NPM Package projects
OSS Sustainability
Open Source projects suffer from grave
sustainability issues as many people use the
software but very few contribute to it
How can we optimize the collaboration?
How can we improve the onboarding process?
Can we “capture” new contributors?
OSS is not only code…
…it’s community
How to enforce development process?
How to sustain the community?
…
unsplash/bekir-donmez
Role characterization in GitHub
DEVELOPER
REVIEWER
MERGER
REPORTER
COMMENTER
REACTOR
Role characterization in GitHub
DEVELOPER
REVIEWER
MERGER
REPORTER
COMMENTER
REACTOR
NON-CODING
CODING
Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
DATASET CONSTRUCTION
RETRIEVAL
& CLONING
REPOSITORY
ANALYSIS
GRAPH
GENERATION
NPM ecosystem
Top 100 repos
SourceCred
Analysis tool
Collaboration
Graphs
28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
Methodology
RESEARCH QUESTIONS
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
unsplash/rawpixel
DATASET CONSTRUCTION
RETRIEVAL
& CLONING
REPOSITORY
ANALYSIS
GRAPH
GENERATION
NPM ecosystem
Top 100 repos
SourceCred
Analysis tool
Collaboration
Graphs
28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments
APPROACH
Full set of projects
General Groups of projects
Specific
Project Type Community Size
vs.
unsplash-SvenMieke
Results
RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
RQ1. Role-based Activity Distribution
Activity Distribution Analysis Prototypical Contributor Profile
Results Summary
What is the role-based activity distribution in OSS?
RQ1
High presence of commenters’ actions (higher than developers’)
Reviewers’ and reactors’ actions grow as the community does
All roles have their importance highlighting the complexity of OSS
High collaboration rate
Increasing structure on the development side
Broader participation of non-coding contributors
RQ2. Role Diversity
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
POS
ORGANIZATION INDIVIDUAL
SIZE GROUP SIZE GROUP
1 8,408 CHEERLEADER 3,351 CHEERLEADER
2 2,497 REPORTER 2,491 REPORTER
3 2,148 COMMENTER 1,417 COMMENTER
4 1,259 COMMENTER + CHEERLEADER 641 COMMENTER + CHEERLEADER
5 507 REPORTER + CHEERLEADER 522 REPORTER + CHEERLEADER
6 366 DEVELOPER 434 DEVELOPER
7 328 REPORTER + COMMENTER + CHEERLEADER 252 DEVELOPER + MERGER
…
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
POS SIZE GROUP
1 11,759 REACTOR
2 4,988 REPORTER
3 3,565 COMMENTER
4 1,900 COMMENTER + REACTOR
5 1,093 REPORTER + REACTOR
6 800 DEVELOPER
7 519 REPORTER + COMMENTER + REACTOR
…
POS
ORGANIZATION INDIVIDUAL
SIZE GROUP SIZE GROUP
1 8,408 CHEERLEADER 3,351 CHEERLEADER
2 2,497 REPORTER 2,491 REPORTER
3 2,148 COMMENTER 1,417 COMMENTER
4 1,259 COMMENTER + CHEERLEADER 641 COMMENTER + CHEERLEADER
5 507 REPORTER + CHEERLEADER 522 REPORTER + CHEERLEADER
6 366 DEVELOPER 434 DEVELOPER
7 328 REPORTER + COMMENTER + CHEERLEADER 252 DEVELOPER + MERGER
…
POS
TIER 1 TIER 2 TIER 3
SIZE GROUP SIZE GROUP SIZE GROUP
1 163 REPORTER 1,049 REPORTER 10,914 CHEERLEADER
2 73 DEVELOPER 783 CHEERLEADER 3,776 REPORTER
3 67 COMMENTER 631 COMMENTER 2,867 COMMENTER
4 62 CHEERLEADER 276 DEVELOPER 1,658 COMMENTER + CHEERLEADER
5 51 DEVELOPER + MERGER 221 COMMENTER + CHEERLEADER 871 REPORTER + CHEERLEADER
6 39
REPORTER + CHEERLEADER 183 REPORTER + CHEERLEADER 460 REPORTER + COMMENTER +
CHEERLEADER
7 21 COMMENTER + CHEERLEADER 123 DEVELOPER + MERGER 451 DEVELOPER
… … …
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
POS SIZE GROUP
1 11,759 REACTOR
2 4,988 REPORTER
3 3,565 COMMENTER
4 1,900 COMMENTER + REACTOR
5 1,093 REPORTER + REACTOR
6 800 DEVELOPER
7 519 REPORTER + COMMENTER + REACTOR
…
POS
ORGANIZATION INDIVIDUAL
SIZE GROUP SIZE GROUP
1 8,408 REACTOR 3,351 REACTOR
2 2,497 REPORTER 2,491 REPORTER
3 2,148 COMMENTER 1,417 COMMENTER
4 1,259 COMMENTER + REACTOR 641 COMMENTER + REACTOR
5 507 REPORTER + REACTOR 522 REPORTER + REACTOR
6 366 DEVELOPER 434 DEVELOPER
7 328 REPORTER + COMMENTER + REACTOR 252 DEVELOPER + MERGER
…
POS
TIER 1 TIER 2 TIER 3
SIZE GROUP SIZE GROUP SIZE GROUP
1 163 REPORTER 1,049 REPORTER 10,914 REACTOR
2 73 DEVELOPER 783 REACTOR 3,776 REPORTER
3 67 COMMENTER 631 COMMENTER 2,867 COMMENTER
4 62 REACTOR 276 DEVELOPER 1,658 COMMENTER + REACTOR
5 51 DEVELOPER + MERGER 221 COMMENTER + REACTOR 871 REPORTER + REACTOR
6 39 REPORTER + REACTOR 183 REPORTER + REACTOR 460 REPORTER + COMMENTER + REACTOR
7 21 COMMENTER + REACTOR 123 DEVELOPER + MERGER 451 DEVELOPER
… … …
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
Role Distribution Most Common Configuration Role Migration Paths
RQ2. Role Diversity
Role Distribution Most Common Configuration Role Migration Paths
Results Summary
What is the role-based activity distribution in OSS?
RQ1
How specialized is the community around each role?
RQ2
High presence of commenters’ actions (higher than developers’)
Reviewers’ and reactors’ actions grow as the community does
All roles have their importance highlighting the complexity of OSS
High collaboration rate
Increasing structure on the development side
Broader participation of non-coding contributors
Projects are diverse, with high presence of reactors, commenters and reporters Presence of non-coding roles
Reactors, commenters and reporters often appear in a one-role configuration Entry point for people joining the project
One-role configuration still persists or move to other non-coding roles Potential low onboarding rate
Lack of cross-role configurations combining coding and non-coding roles Specialization
unsplash-kelly-sikkema
Discussion
Discussion
Photos from Unsplash by Jamie Street, Alvaro Reves, Iyan Kurnia, Chuttersnap, M.B.M. (top to bottom)
IMPROVE
ONBOARDING
GOVERNANCE OF
NON-CODING
CONTRIBUTORS
PROMOTION OF
MIGRATION PATHS
METHODS TO VISUALIZE
CONTRIBUTIONS
TEMPORAL
ANALYSIS
Situation: Efforts to attract and onboard new contributors are clearly targeting developers
Why not focusing on non-coding contributors and maybe then incentivize them to participate in coding tasks?
Situation: Governance rules (e.g., contributing.md) focus mainly on coding contributors
How to make non-coding contributions more visible in code hosting platforms?
Situation: Lack of information about the roles of the project and how (and where) they are welcome
Would it be possible to identify “careers” within the project?
Situation: It is hard to know the roles played by contributors in OSS projects
Could graphical representations (e.g., our radar graphs), help on profiling contributors (beyond coding tasks)?
Situation: Most empirical analysis focus on a project snapshot
How could we leverage on the temporal dimension of OSS project activities?
Thanks!
IMPROVE
ONBOARDING
GOVERNANCE OF
NON-CODING
CONTRIBUTORS
PROMOTION OF
MIGRATION PATHS
METHODS TO
VISUALIZE
CONTRIBUTIONS
TEMPORAL
ANALYSIS
Javier L. Cánovas Izquierdo
jcanovasi@uoc.edu
@jlcanovas
Jordi Cabot
jordi.cabot@icrea.cat
@jordiCabot
Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 4.0 International license.

On the Analysis of Non-Coding Roles in Open Source Development

  • 1.
    On the Analysisof Non-Coding Roles in Open Source Development Javier L. Cánovas Izquierdo, Jordi Cabot Paper accepted at EMPIRICAL SOFTWARE ENGINEERING 27, 18 (2022) Published: November 2nd, 2021 An Empirical Study of NPM Package projects
  • 2.
    OSS Sustainability Open Sourceprojects suffer from grave sustainability issues as many people use the software but very few contribute to it How can we optimize the collaboration? How can we improve the onboarding process? Can we “capture” new contributors? OSS is not only code… …it’s community How to enforce development process? How to sustain the community? … unsplash/bekir-donmez
  • 3.
    Role characterization inGitHub DEVELOPER REVIEWER MERGER REPORTER COMMENTER REACTOR
  • 4.
    Role characterization inGitHub DEVELOPER REVIEWER MERGER REPORTER COMMENTER REACTOR NON-CODING CODING
  • 5.
    Methodology RESEARCH QUESTIONS What isthe role-based activity distribution in OSS? RQ1 How specialized is the community around each role? RQ2 unsplash/rawpixel
  • 6.
    Methodology RESEARCH QUESTIONS What isthe role-based activity distribution in OSS? RQ1 How specialized is the community around each role? RQ2 unsplash/rawpixel APPROACH Full set of projects General Groups of projects Specific Project Type Community Size vs.
  • 7.
    Methodology RESEARCH QUESTIONS What isthe role-based activity distribution in OSS? RQ1 How specialized is the community around each role? RQ2 unsplash/rawpixel DATASET CONSTRUCTION RETRIEVAL & CLONING REPOSITORY ANALYSIS GRAPH GENERATION NPM ecosystem Top 100 repos SourceCred Analysis tool Collaboration Graphs 28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments APPROACH Full set of projects General Groups of projects Specific Project Type Community Size vs.
  • 8.
    Methodology RESEARCH QUESTIONS What isthe role-based activity distribution in OSS? RQ1 How specialized is the community around each role? RQ2 unsplash/rawpixel DATASET CONSTRUCTION RETRIEVAL & CLONING REPOSITORY ANALYSIS GRAPH GENERATION NPM ecosystem Top 100 repos SourceCred Analysis tool Collaboration Graphs 28,468 users / 38,502 commits / 13,941 issues / 12,312 pull requests / 89,484 comments APPROACH Full set of projects General Groups of projects Specific Project Type Community Size vs.
  • 9.
  • 10.
    RQ1. Role-based ActivityDistribution Activity Distribution Analysis Prototypical Contributor Profile
  • 11.
    RQ1. Role-based ActivityDistribution Activity Distribution Analysis Prototypical Contributor Profile
  • 12.
    RQ1. Role-based ActivityDistribution Activity Distribution Analysis Prototypical Contributor Profile
  • 13.
    RQ1. Role-based ActivityDistribution Activity Distribution Analysis Prototypical Contributor Profile
  • 14.
    RQ1. Role-based ActivityDistribution Activity Distribution Analysis Prototypical Contributor Profile
  • 15.
    Results Summary What isthe role-based activity distribution in OSS? RQ1 High presence of commenters’ actions (higher than developers’) Reviewers’ and reactors’ actions grow as the community does All roles have their importance highlighting the complexity of OSS High collaboration rate Increasing structure on the development side Broader participation of non-coding contributors
  • 16.
    RQ2. Role Diversity RoleDistribution Most Common Configuration Role Migration Paths
  • 17.
    RQ2. Role Diversity RoleDistribution Most Common Configuration Role Migration Paths
  • 18.
    RQ2. Role Diversity RoleDistribution Most Common Configuration Role Migration Paths
  • 19.
    RQ2. Role Diversity POS ORGANIZATIONINDIVIDUAL SIZE GROUP SIZE GROUP 1 8,408 CHEERLEADER 3,351 CHEERLEADER 2 2,497 REPORTER 2,491 REPORTER 3 2,148 COMMENTER 1,417 COMMENTER 4 1,259 COMMENTER + CHEERLEADER 641 COMMENTER + CHEERLEADER 5 507 REPORTER + CHEERLEADER 522 REPORTER + CHEERLEADER 6 366 DEVELOPER 434 DEVELOPER 7 328 REPORTER + COMMENTER + CHEERLEADER 252 DEVELOPER + MERGER … Role Distribution Most Common Configuration Role Migration Paths
  • 20.
    RQ2. Role Diversity POSSIZE GROUP 1 11,759 REACTOR 2 4,988 REPORTER 3 3,565 COMMENTER 4 1,900 COMMENTER + REACTOR 5 1,093 REPORTER + REACTOR 6 800 DEVELOPER 7 519 REPORTER + COMMENTER + REACTOR … POS ORGANIZATION INDIVIDUAL SIZE GROUP SIZE GROUP 1 8,408 CHEERLEADER 3,351 CHEERLEADER 2 2,497 REPORTER 2,491 REPORTER 3 2,148 COMMENTER 1,417 COMMENTER 4 1,259 COMMENTER + CHEERLEADER 641 COMMENTER + CHEERLEADER 5 507 REPORTER + CHEERLEADER 522 REPORTER + CHEERLEADER 6 366 DEVELOPER 434 DEVELOPER 7 328 REPORTER + COMMENTER + CHEERLEADER 252 DEVELOPER + MERGER … POS TIER 1 TIER 2 TIER 3 SIZE GROUP SIZE GROUP SIZE GROUP 1 163 REPORTER 1,049 REPORTER 10,914 CHEERLEADER 2 73 DEVELOPER 783 CHEERLEADER 3,776 REPORTER 3 67 COMMENTER 631 COMMENTER 2,867 COMMENTER 4 62 CHEERLEADER 276 DEVELOPER 1,658 COMMENTER + CHEERLEADER 5 51 DEVELOPER + MERGER 221 COMMENTER + CHEERLEADER 871 REPORTER + CHEERLEADER 6 39 REPORTER + CHEERLEADER 183 REPORTER + CHEERLEADER 460 REPORTER + COMMENTER + CHEERLEADER 7 21 COMMENTER + CHEERLEADER 123 DEVELOPER + MERGER 451 DEVELOPER … … … Role Distribution Most Common Configuration Role Migration Paths
  • 21.
    RQ2. Role Diversity POSSIZE GROUP 1 11,759 REACTOR 2 4,988 REPORTER 3 3,565 COMMENTER 4 1,900 COMMENTER + REACTOR 5 1,093 REPORTER + REACTOR 6 800 DEVELOPER 7 519 REPORTER + COMMENTER + REACTOR … POS ORGANIZATION INDIVIDUAL SIZE GROUP SIZE GROUP 1 8,408 REACTOR 3,351 REACTOR 2 2,497 REPORTER 2,491 REPORTER 3 2,148 COMMENTER 1,417 COMMENTER 4 1,259 COMMENTER + REACTOR 641 COMMENTER + REACTOR 5 507 REPORTER + REACTOR 522 REPORTER + REACTOR 6 366 DEVELOPER 434 DEVELOPER 7 328 REPORTER + COMMENTER + REACTOR 252 DEVELOPER + MERGER … POS TIER 1 TIER 2 TIER 3 SIZE GROUP SIZE GROUP SIZE GROUP 1 163 REPORTER 1,049 REPORTER 10,914 REACTOR 2 73 DEVELOPER 783 REACTOR 3,776 REPORTER 3 67 COMMENTER 631 COMMENTER 2,867 COMMENTER 4 62 REACTOR 276 DEVELOPER 1,658 COMMENTER + REACTOR 5 51 DEVELOPER + MERGER 221 COMMENTER + REACTOR 871 REPORTER + REACTOR 6 39 REPORTER + REACTOR 183 REPORTER + REACTOR 460 REPORTER + COMMENTER + REACTOR 7 21 COMMENTER + REACTOR 123 DEVELOPER + MERGER 451 DEVELOPER … … … Role Distribution Most Common Configuration Role Migration Paths
  • 22.
    RQ2. Role Diversity RoleDistribution Most Common Configuration Role Migration Paths
  • 23.
    RQ2. Role Diversity RoleDistribution Most Common Configuration Role Migration Paths
  • 24.
    Results Summary What isthe role-based activity distribution in OSS? RQ1 How specialized is the community around each role? RQ2 High presence of commenters’ actions (higher than developers’) Reviewers’ and reactors’ actions grow as the community does All roles have their importance highlighting the complexity of OSS High collaboration rate Increasing structure on the development side Broader participation of non-coding contributors Projects are diverse, with high presence of reactors, commenters and reporters Presence of non-coding roles Reactors, commenters and reporters often appear in a one-role configuration Entry point for people joining the project One-role configuration still persists or move to other non-coding roles Potential low onboarding rate Lack of cross-role configurations combining coding and non-coding roles Specialization
  • 25.
  • 26.
    Discussion Photos from Unsplashby Jamie Street, Alvaro Reves, Iyan Kurnia, Chuttersnap, M.B.M. (top to bottom) IMPROVE ONBOARDING GOVERNANCE OF NON-CODING CONTRIBUTORS PROMOTION OF MIGRATION PATHS METHODS TO VISUALIZE CONTRIBUTIONS TEMPORAL ANALYSIS Situation: Efforts to attract and onboard new contributors are clearly targeting developers Why not focusing on non-coding contributors and maybe then incentivize them to participate in coding tasks? Situation: Governance rules (e.g., contributing.md) focus mainly on coding contributors How to make non-coding contributions more visible in code hosting platforms? Situation: Lack of information about the roles of the project and how (and where) they are welcome Would it be possible to identify “careers” within the project? Situation: It is hard to know the roles played by contributors in OSS projects Could graphical representations (e.g., our radar graphs), help on profiling contributors (beyond coding tasks)? Situation: Most empirical analysis focus on a project snapshot How could we leverage on the temporal dimension of OSS project activities?
  • 27.
    Thanks! IMPROVE ONBOARDING GOVERNANCE OF NON-CODING CONTRIBUTORS PROMOTION OF MIGRATIONPATHS METHODS TO VISUALIZE CONTRIBUTIONS TEMPORAL ANALYSIS Javier L. Cánovas Izquierdo jcanovasi@uoc.edu @jlcanovas Jordi Cabot jordi.cabot@icrea.cat @jordiCabot Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 4.0 International license.