Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Is the Pareto Principle
Applicable to the Core
Teams of GitHub Projects?
Kazuhiro
Yamashita
Yasutaka
Kamei
Shane
McIntosh
Naoyasu
Ubayashi
Ahmed
E. Hassan

Core developers play a critical
role
in software development
2
Core developers are responsible
for guiding and coordinating the
development of an OSS project.
The most productive developers
who have made roughly 80% of
the total contributions.
Nakakoji
Mockus

In fact, some argue that core
developers in OSS projects follow the
Pareto Principle
5
Effort Result
80% 80%
20%20%

Pareto Principle in Software
Development
6
20 ％
80 ％ 20 ％
80 ％
Project
Developers Artifacts

Prior studies have arrived at mixed
conclusions about core teams and the
Pareto Principle
7
Pareto Non-Pareto
Goeminne
IWSQM
Robles
RAMSS
Mockus
TOSEM
Geldenhuys
ECSEAA
Koch
ISJ Dinh-Trong
TSE
The results depend on small number
of case study systems
Other

Prior studies have arrived at mixed
conclusions about core teams and the
Pareto Principle
8
< 10 or 15 Other
Goeminne
IWSQM
Robles
RAMSS
Mockus
TOSEM
Geldenhuys
ECSEAA
Koch
ISJ
Dinh-Trong
TSE

Overview of our study of core
teams on GitHub
19
Applicability of the Pareto Principle
Number of Core Developers

teams on GitHub
20
Core and Non-Core Developers Activities

Collecting and analyzing
GitHub data to study core team
activity
21
Filter Heuristics
Core
Non-Core
Core
Non-Core
Calc Prop
Projects
Core
Non-Core
Classify
Commits
Core Team Size Activity

activity
22
Filter Heuristics
Core
Non-Core
Projects
22
Core
Non-Core
Calc Prop
Core
Non-Core
Classify
Commits

Preprocessing GitHub data to handle
forks, duplicates, and to remove
immature projects
23
8,510,504 repositories -> 2,496 repositories

activity
24
Filter Heuristics
Core
Non-Core
Projects
24
Core
Non-Core
Calc Prop
Core
Non-Core
Classify
Commits

Using heuristics to identify core
team members
26
Commit-based LOC-based Access-based
Core Core Core

29
A B C D
Our commit-based core
contributor heuristic
Number of
Commits
= Commit

Step1: Sort contributors by
their number of commits
30
A BC D
Number of
Commits

Step2: Compute the proportion
of commits that each
contributor
32
A BC D
60% 20% 10% 10%
Commits ratio

Step3: Core contributors are those
developers below the 0.8 cumulative
contribution cutoff
33
A BC D
0.8
1.0
0.6
Cumulative
ratio
Pct. CoreDev
2/4*100 = 50%
Num CoreDev
2

activity
35
Filter Heuristics
Core
Non-Core
Projects
35
Core
Non-Core
Calc Prop
Core
Non-Core
Classify
Commits

teams on GitHub
36

teams on GitHub
37

activity
38
Filter Heuristics
Core
Non-Core
Projects
38
Core
Non-Core
Calc Prop
Core
Non-Core
Classify
Commits

Our approach to study Core
Team Size
40
30%20%10%
Percentage of Core Devs
Compliance with
the Pareto Principle
Stratify projects along the confounding factors
Small Medium Large Small Medium Large Small Medium Large
LOC Total Author Age
The example project does not
follow the Pareto Principle

Core team proportions are
widespread
43
Commit-based Divide by LOC

Often, there are fewer than 15
core developers in a projects
44
Number of core developers in projects
88% 98% 96%
Commit-Based LOC-Based Access-Based

teams on GitHub
45
More than half projects do not follow the Pareto
principle
Most of projects have 15 or less core developers

teams on GitHub
48
principle

activity
49
Filter Heuristics
Core
Non-Core
Projects
49
Core
Non-Core
Calc Prop
Core
Non-Core
Classify
Commits

Our approach to study
activity
50
By using the keywords, we classify the commits.
Development
Activity Type Keywords
Forward Engineering implement, add, request
Maintenance
Reengineering optimiz, adjust
Corrective Engineering bug, fix, issue, error
Management license, formatting, TODO

No big differences in
proportions of development
activities
54

teams on GitHub
55
principle
There are no big differences
between
core and non-core activities

teams on GitHub
56
principle
There are no big differences
between
core and non-core activities

Extremely large core team may
be interesting
58
Heuristic -15 16-20 21-50 51-100 101-
Commit-
Based 2,197 98 137 17 47
LOC-
Based 2,454 15 13 4 10
Access-
Based 1,164 24 24 0 0

Many projects face a risk of
bus factor
59
43% (Core=1: 8%) 81% (Core=1: 24%) 54% (Core=1: 21%)
In fact, most of projects have less than 5 core developers

Core Developer
• additional slides
65

Additional description of our
definition
66
0.8
1.0
A B C D E
Depend on
Name

Commit-based
67
Age Total Author

LOC-based
68
Age Total Author
LOC

Access-based
69
Age Total Author
LOC

Data Extraction
70

Data Extraction
72
(1) Filter projects by GHTorrent
Filter forked repositories.

Fork
73
One of the features of GitHub
Fork (clone)
Original
Repository
Fork
Repository
Pull Request

Data Extraction
74
Filter less than 10 devs repositories.

Data Extraction
75
Filter repositories which is developed
outside of GitHub.

Data Extraction
76
Filter repositories which is developed
outside of GitHub.

Data Extraction
78
(2) Clone repositories
4,618 repositories -> 4,154 repositories
local server
clone

Data Extraction
80
(3) Filter duplicate projects
Project A Fork of Afork
clone
Project B
register
Clone of A

Data Extraction
81
(3) Filter duplicate projects
Project A Project B
Compare SHAs
c87cce1
e1a7260
f40ccb5
455e44c
8b67f28
651fa5e
655b8be
757dd93
a4cf371
8145880
cf484e3
4e63bde

Data Extraction
83
(4) Calculate metrics
LOC
Total Commits
Total Authors
AgeRepository

Data Extraction
85
(5) Filter projects by metrics
Filter less than 1,000 LOC repositories.

Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Similar to Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects (20)

More from SAIL_QU

More from SAIL_QU (20)

Recently uploaded

Recently uploaded (20)

Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Editor's Notes