Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Exploring the Use of Labels to
Categorize Issues in
Open-Source Software Projects
Jordi Cabot, Javier Luis Cánovas Izquier...
Open-Source Systems
…computer software with its source code made
available with a license in which the copyright
holder pr...
Contributing to OSS
doc
code
support
request
question
bug
pull request
doc
code
support
pull request
Contributing to OSS
request
question
bug
doc
code
support
pull request
Contributing to OSS
request
question
bug
Label Issues in GitHub
Title
Author
Description
Asignee
Status
Labels
Label Issues in GitHub
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
Default labels
Label Issues in Github
Label Issues in Github
Label Issues in Github
Label Issues in Github
GitHub Analysis
GHTorrent
GiLA
GitHub Analysis
GHTorrent
GiLA
GitHub Analysis
GHTorrent
RQ1. Label Usage
How many labels are used in Github? How many labels are
used per project? What ...
RQ1. Label Usageflickr/tiffanyTerry
Label Usage in GitHub
Using Labels
122,012
3%
Not Using labels
3,635,026
97%
Lesson: Labels are scarcely used in GitHub
Main Labels
Lesson: Default labels are the winners but Documentation and feature are also broadly used
Labels used together
Lesson: bug-enhancement are
the labels most used together
Projects using labels
55561
31026
13390
6910
4206
3011
1934 1378 955 723
2918
0
10000
20000
30000
40000
50000
60000
1 2 3 ...
Labels/Issue
55561
31026
13390
6910
4206
3011
1934 1378 955 723
2918
1 1.02 1.04 1.06 1.09 1.08
1.13
1.18 1.2
1.25
1.52
0
...
% Labeled Issues
55561
31026
13390
6910
4206
3011
1934 1378 955 723
2918
59.87 61
58.89 58.84 59.72
56.16
57.83 58.99 58.8...
Users involved in labeled issues
55561
31026
13390
6910
4206
3011
1934 1378 955 723
2918
59.87 61
58.89 58.84 59.72
56.16
...
RQ2. Label Influenceflickr/JorisLouwes
Label Influence
Projects
Using
labels
Time to
Solve
Issue
Age
%
Solved
People
Involved
Label Influence
26.93
46.18
74.92
101.3
111.8
145.7
116.4
127.2
116.4
70.4
306.4
148.1
22.53
43.51
48.76
53.21
55.27 56.3
...
Going further
Detecting familiesflickr/RichBrooks
Detecting families
bug build
contribution
documentation
duplicate
0 - backlog
1 - ready
2 - working
3 - done
docs
enhancem...
Detecting families
bug build
contribution
documentation
duplicate
0 - backlog
1 - ready
2 - working
3 - done
docs
enhancem...
0 - backlog
frontend-pango
type-cleanup
2 - working
enhancement
component-ui
component-ui-gtk milestone
imported1.0.0taken...
duplicate
component-ui-gtk
importedtaken
documentation
update
high-priority
frontend-pango
type-cleanup
2 - working
usabil...
1.0.0
0.5.0
imported
component-notyi
fixed
milestone
milestone-release0.7
0.0.1
duplicate
type-cleanup
contribution
bug bu...
taken 1.0.0
0.5.0
type-ask 3 - done
fixed
0.0.1bug build
priority-low
new
0.0.3
docs
usability
priority-high
low-priority
...
Conclusion
• Label mechanism is scarcely used
• When used, it may have a positive impact in the project
• Confirmed the ex...
Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 3.0 License.
T...
Label Usage (issues)
45150
17268
3915
1071 421 223 84 49 19 4 12
69.55
75.94
79.65
82.18 84.31
78.1
84.65 84.64 83.57 85
8...
Label Influence
795.4 808.6
937.3
998.5
1111
1060
1152 1139
982.9
1425
1148
46.18
74.92
101.3 111.8
145.7
116.4 127.2 116....
Label Influence
4516
5867
7540
8646
9196
9729
9330
10610
10100
8321 8302
2577
3747
6346
7427
8081
8335 8268
9154
8612
7276...
Label Influence
43.51
48.76
53.21
55.27 56.3
58.82 57.95 59.28
63.23
47.59
60.19
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 ...
Label Influence
3.4
5.44
9.73
15.61
20.93
25.91
35.18
46.79
51.24
71.67
129.18
0
20
40
60
80
100
120
140
1 2 3 4 5 6 7 8 9...
Upcoming SlideShare
Loading in …5
×

Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects

1,270 views

Published on

Slides from our paper titled "Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects" at SANER 2015 conference

You can find the paper here:
https://www.researchgate.net/publication/272794664_Exploring_the_Use_of_Labels_to_Categorize_Issues_in_Open-Source_Software_Projects

Published in: Education
  • Be the first to comment

Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects

  1. 1. Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, Belén Rolandi SANER conference March 2015
  2. 2. Open-Source Systems …computer software with its source code made available with a license in which the copyright holder provides the rights to study, change and distribute the software to anyone and for any purpose. …Open-Source Software (OSS) is developed in a collaborative public manner.
  3. 3. Contributing to OSS doc code support request question bug pull request
  4. 4. doc code support pull request Contributing to OSS request question bug
  5. 5. doc code support pull request Contributing to OSS request question bug
  6. 6. Label Issues in GitHub Title Author Description Asignee Status Labels
  7. 7. Label Issues in GitHub bug duplicate enhancement help wanted invalid question wontfix Default labels
  8. 8. Label Issues in Github
  9. 9. Label Issues in Github
  10. 10. Label Issues in Github
  11. 11. Label Issues in Github
  12. 12. GitHub Analysis GHTorrent GiLA
  13. 13. GitHub Analysis GHTorrent GiLA
  14. 14. GitHub Analysis GHTorrent RQ1. Label Usage How many labels are used in Github? How many labels are used per project? What are the most popular ones? RQ2. Label Influence For those projects using labels, does its usage influence the evolution of the project? GiLA Early Research Achievement Can we detect group of labels commonly used together? Are there label families?
  15. 15. RQ1. Label Usageflickr/tiffanyTerry
  16. 16. Label Usage in GitHub Using Labels 122,012 3% Not Using labels 3,635,026 97% Lesson: Labels are scarcely used in GitHub
  17. 17. Main Labels Lesson: Default labels are the winners but Documentation and feature are also broadly used
  18. 18. Labels used together Lesson: bug-enhancement are the labels most used together
  19. 19. Projects using labels 55561 31026 13390 6910 4206 3011 1934 1378 955 723 2918 0 10000 20000 30000 40000 50000 60000 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project # projects Total: 122,012 1.47% 0.82% 0.94%
  20. 20. Labels/Issue 55561 31026 13390 6910 4206 3011 1934 1378 955 723 2918 1 1.02 1.04 1.06 1.09 1.08 1.13 1.18 1.2 1.25 1.52 0 0.5 1 1.5 2 2.5 0 10000 20000 30000 40000 50000 60000 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project # projects Avg. Labels/issue Avg: 1.14
  21. 21. % Labeled Issues 55561 31026 13390 6910 4206 3011 1934 1378 955 723 2918 59.87 61 58.89 58.84 59.72 56.16 57.83 58.99 58.83 55.06 55.88 0 10 20 30 40 50 60 70 80 90 100 0 10000 20000 30000 40000 50000 60000 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project # projects %labeled issues Avg: 58.29%
  22. 22. Users involved in labeled issues 55561 31026 13390 6910 4206 3011 1934 1378 955 723 2918 59.87 61 58.89 58.84 59.72 56.16 57.83 58.99 58.83 55.06 55.88 80.98 72.06 77.73 75.81 75.22 72.05 72.87 75.52 72.06 69.25 70.43 0 10 20 30 40 50 60 70 80 90 100 0 10000 20000 30000 40000 50000 60000 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project # projects %labeled issues % users involved in labeled issues Avg: 78,72%
  23. 23. RQ2. Label Influenceflickr/JorisLouwes
  24. 24. Label Influence Projects Using labels Time to Solve Issue Age % Solved People Involved
  25. 25. Label Influence 26.93 46.18 74.92 101.3 111.8 145.7 116.4 127.2 116.4 70.4 306.4 148.1 22.53 43.51 48.76 53.21 55.27 56.3 58.82 57.95 59.28 63.23 47.59 60.19 0 10 20 30 40 50 60 70 0 50 100 150 200 250 300 350 0 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project Med. Time to solve % solved On average, the percentage of solved labeled issues tends to increase together with the number of labels used in the project, it may confirm that the effort of categorizing issues is beneficial for the project advancement It might come at the cost of taking more time to solve those labeled issues ρ = 0.80 ρ = 0.73
  26. 26. Going further Detecting familiesflickr/RichBrooks
  27. 27. Detecting families bug build contribution documentation duplicate 0 - backlog 1 - ready 2 - working 3 - done docs enhancement invalid urgent priority-high high-priority priority-low question priority-medium usability component-logic component-notyi component-ui priority-low component-mode-perl component-ui-gtk frontend-gtkfrontend-pango milestone imported 0.0.1 0.0.1 0.0.3 1.0.0.rc1 0.2.0 0.5.0 1.0.0 update type-cleanup p1 p2 p3 taken fixeddiscuss milestone-release0.4 milestone-release0.7 performance medium-priority usability wontfix new type-ask Families? low-priority
  28. 28. Detecting families bug build contribution documentation duplicate 0 - backlog 1 - ready 2 - working 3 - done docs enhancement invalid urgent priority-high high-priority priority-low question priority-medium usability component-logic component-notyi component-ui priority-low component-mode-perl low-priority component-ui-gtk frontend-gtkfrontend-pango milestone imported 0.0.1 0.0.1 0.0.3 1.0.0.rc1 0.2.0 0.5.0 1.0.0 update type-cleanup p1 p2 p3 taken fixeddiscuss milestone-release0.4 milestone-release0.7 performance medium-priority usability wontfix new type-ask Family # Labels % Projects Priority 1,027 (2.33%) 4.33% Version 2,703 (6.14%) 1.68% Workflow 1,972 (4.48%) 5.67% Architecture 1,104 (2.51%) 2.00%
  29. 29. 0 - backlog frontend-pango type-cleanup 2 - working enhancement component-ui component-ui-gtk milestone imported1.0.0taken milestone-release0.4 usability contribution documentation duplicate 3 - done invalid question component-logic component-notyicomponent-mode-perl frontend-gtk 0.2.0 0.5.0 update milestone-release0.7 performancenew type-ask bug build 1 - ready docs usability 0.0.1 0.0.1 0.0.3 1.0.0.rc1 p1 p2 p3 fixeddiscuss wontfix Detecting families urgent priority-high high-priority priority-low priority-medium priority-low low-priority medium-priority Family # Labels % Projects Priority 1,027 (2.33%) 4.33% Version 2,703 (6.14%) 1.68% Workflow 1,972 (4.48%) 5.67% Architecture 1,104 (2.51%) 2.00%
  30. 30. duplicate component-ui-gtk importedtaken documentation update high-priority frontend-pango type-cleanup 2 - working usability contribution 3 - done invalidfrontend-gtk type-ask 0 - backlog enhancement component-ui question component-logic component-notyicomponent-mode-perl bug build 1 - ready docs usability p1 p2 p3 fixeddiscuss wontfix urgent priority-high priority-low priority-medium priority-low low-priority medium-priority performance Detecting families Family # Labels % Projects Priority 1,027 (2.33%) 4.33% Version 2,703 (6.14%) 1.68% Workflow 1,972 (4.48%) 5.67% Architecture 1,104 (2.51%) 2.00% milestone 1.0.0 milestone-release0.4 0.2.0 0.5.0 milestone-release0.7 new 0.0.1 0.0.1 0.0.3 1.0.0.rc1
  31. 31. 1.0.0 0.5.0 imported component-notyi fixed milestone milestone-release0.7 0.0.1 duplicate type-cleanup contribution bug build priority-low medium-priority new 0.0.3 component-ui-gtk documentation update high-priority frontend-pango usability invalidfrontend-gtk type-askenhancement component-ui question component-logic component-mode-perl docs usability discuss wontfix urgent priority-high priority-low priority-medium low-priority milestone-release0.4 0.2.0 0.0.1 1.0.0.rc1 performance Detecting families Family # Labels % Projects Priority 1,027 (2.33%) 4.33% Version 2,703 (6.14%) 1.68% Workflow 1,972 (4.48%) 5.67% Architecture 1,104 (2.51%) 2.00% 2 - working 3 - done 0 - backlog 1 - ready p2 p3 taken p1
  32. 32. taken 1.0.0 0.5.0 type-ask 3 - done fixed 0.0.1bug build priority-low new 0.0.3 docs usability priority-high low-priority milestone-release0.4 0.2.0 1.0.0.rc1 2 - working 1 - ready p2 p3 p1 imported milestone duplicate type-cleanup contribution medium-priority documentation update high-priorityusability invalid enhancement question discuss wontfix urgent priority-low priority-medium 0.0.1 0 - backlog performance Detecting families Family # Labels % Projects Priority 1,027 (2.33%) 4.33% Version 2,703 (6.14%) 1.68% Workflow 1,972 (4.48%) 5.67% Architecture 1,104 (2.51%) 2.00% component-notyi milestone-release0.7 component-ui-gtk frontend-pango frontend-gtk component-uicomponent-logic component-mode-perl
  33. 33. Conclusion • Label mechanism is scarcely used • When used, it may have a positive impact in the project • Confirmed the existence of families when using labels • Further research is needed to better classify their use • How families influence the project success • Why projects choose a specific label family • How labels evolve during the life-cycle of the project • Perform the analysis to other web-based code hosting services Early result Future
  34. 34. Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 3.0 License. Thanks! Come to see our awesome demostration! Belén Rolandi maria.rolandi@inria.fr Jordi Cabot jordi.cabot@inria.fr Javier L. Cánovas Izquierdo javier.canovas@inria.fr Valerio Cosentino valerio.cosentino@inria.fr
  35. 35. Label Usage (issues) 45150 17268 3915 1071 421 223 84 49 19 4 12 69.55 75.94 79.65 82.18 84.31 78.1 84.65 84.64 83.57 85 82.64 0 10 20 30 40 50 60 70 80 90 100 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 2 3 4 5 6 7 8 9 10 >10 Projects with 0 to 9 issues 9996 13203 8771 5121 3115 1995 1177 823 518 337 780 18.39 43.48 52.64 58.36 62.72 63.68 66.88 67.81 72.16 69.74 73.85 0 10 20 30 40 50 60 70 80 90 100 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 2 3 4 5 6 7 8 9 10 >10 Projects with 10 to 99 issues 407 545 694 703 656 773 651 481 394 363 1765 6.03 11.81 31.15 28.68 30.52 31.52 39.11 43.11 41.96 42.89 52.25 0 10 20 30 40 50 60 70 80 90 100 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 2 3 4 5 6 7 8 9 10 >10 Projects with 100 to 999 issues 8 10 10 15 14 20 22 25 24 19 361 28.67 14.41 5.78 14.45 17.28 12.8 24.43 22.83 28.31 21.6 33.95 0 10 20 30 40 50 60 70 80 90 100 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1 2 3 4 5 6 7 8 9 10 >10 Projects with more than 999 issues Total: 68,216 Avg: 70.10% Total: 46,836 Avg: 45,68% Total: 7,432 Avg: 34.81% Total: 528 Avg: 29.54%
  36. 36. Label Influence 795.4 808.6 937.3 998.5 1111 1060 1152 1139 982.9 1425 1148 46.18 74.92 101.3 111.8 145.7 116.4 127.2 116.4 70.4 306.4 148.1 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project Avg. Time to solve Med. Time to solve
  37. 37. Label Influence 4516 5867 7540 8646 9196 9729 9330 10610 10100 8321 8302 2577 3747 6346 7427 8081 8335 8268 9154 8612 7276 5918 0 2000 4000 6000 8000 10000 12000 0 2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project Avg. Issue Age Med. Issue Age
  38. 38. Label Influence 43.51 48.76 53.21 55.27 56.3 58.82 57.95 59.28 63.23 47.59 60.19 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project % Solved
  39. 39. Label Influence 3.4 5.44 9.73 15.61 20.93 25.91 35.18 46.79 51.24 71.67 129.18 0 20 40 60 80 100 120 140 1 2 3 4 5 6 7 8 9 10 >10 # labels used in the project Avg. People involved

×