Evidence for the Paretoprinciple in OSS Activity         Mathieu Goeminne & Tom Mens Service de Génie Logiciel, Institut d...
Research topic       • Study of open source software evolution.       • Taking into account the community (social         ...
Goals       • Long-term goal          •    Understand how software community and software               product/project co...
Questions       •   Is there a core group (of developers and/or           users) being significantly more active than the  ...
Methodology       • Exploiting available data from source           code repositories, mailing lists and           bug tra...
Econometrics       • Gini, Hoover, Theil (normalised) : express           inequality in a distribution.       • Values bet...
The 3 notions of               activities we used       • # commits done       • # mails sent       • # bug status changed...
Activity distributions                                  (Gini index)  $"!#,"                                              ...
Pareto principle       • Most of the activity is carried out by a small           group of persons.       • Typically : 20...
Pareto principle (cont.)   1 0.9                                                                                      Bras...
Core groups       • Display Venn diagrams of most active (top           20) persons, according to each definition of       ...
Core groups (cont.)                                                                      Brasero                          ...
Conclusions       • Activity distributions seem to become           more and more unequally distributed.       • The Paret...
Future work       • Determine if the core group of a project           evolves over time.       • Use sliding windows to i...
Future work (cont.)       •   Study correlation between community structure           (social network) and source code qua...
Thank youUniversité de Mons   Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany   Mathieu Goeminne & Tom Mens          ...
Upcoming SlideShare
Loading in …5
×

Evidence for the Pareto principle in open source software activity

689 views
581 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
689
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Evidence for the Pareto principle in open source software activity

  1. 1. Evidence for the Paretoprinciple in OSS Activity Mathieu Goeminne & Tom Mens Service de Génie Logiciel, Institut d’Informatique Faculté des Sciences, Université de Mons 1
  2. 2. Research topic • Study of open source software evolution. • Taking into account the community (social network) of persons surrounding the software project (developers, users). • Looking for recurrent behaviour in this community.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 2
  3. 3. Goals • Long-term goal • Understand how software community and software product/project co-evolve • Provide guidelines and tools to support this • Short-term goal • Study of how development activity is distributed over the different stakeholders. • Find evidence for the Pareto principle in evolving OSS.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 3
  4. 4. Questions • Is there a core group (of developers and/or users) being significantly more active than the others? • How does the activity distribution evolve over time? • Is there an overlap between the different activities? • How does the activity distribution vary across different projects?Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 4
  5. 5. Methodology • Exploiting available data from source code repositories, mailing lists and bug trackers. • Use of economy metrics measuring distribution (in)equality. • Empirical study of 3 OSS : Brasero, Evince and Wine.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 5
  6. 6. Econometrics • Gini, Hoover, Theil (normalised) : express inequality in a distribution. • Values between 0 and 1 • 0 reflects a perfect equality • 1 reflects a perfect inequality • Have similar behaviors.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 6
  7. 7. The 3 notions of activities we used • # commits done • # mails sent • # bug status changed All of them are related to typical developer activitiesUniversité de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 7
  8. 8. Activity distributions (Gini index) $"!#," Brasero!#+"!#*"!#)"!#("!#" .4556/7" Evince!#&" 58697"!#%" :;"0".<8=>?7"!#$" !" -./0!)" 1230!*" -./0!*" 1230!+" -./0!+" 1230!," -./0!," 1230$!" -./0$!" Wine $" $"!#," !#,"!#+" !#+"!#*" !#*"!#)" !#)" 2.44536"!#(" !#(" 47586"!#" 1233456" !#" 9:;"<=>.<3"2?7@;=6"!#&" 37486" !#&"!#%" 9:;"/<.2/5"1=7>;<6" !#%"!#$" !#$" !" !" -./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!" -./0,+" -./0,," 1230!!" 1230!$" 1230!%" 1230!&" 1230!" 1230!(" 1230!)" 1230!*" 1230!+" 1230!," 1230$!" Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 8
  9. 9. Pareto principle • Most of the activity is carried out by a small group of persons. • Typically : 20% do 80% of the job. • Doesn’t necessarily imply that the activity distribution follows a Pareto law.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 9
  10. 10. Pareto principle (cont.) 1 0.9 Brasero commits 0.8 0.7 mails 0.6 br changes 0.5 0.4 Evince 0.3 0.2 Wine 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10.90.80.70.60.50.40.30.20.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 10
  11. 11. Core groups • Display Venn diagrams of most active (top 20) persons, according to each definition of activity. • For each person, show the percentage of activity attributable to this person. • Use heuristics to take into account and merge multiple identities representing the same real person.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 11
  12. 12. Core groups (cont.) Brasero Evince WineUniversité de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 12
  13. 13. Conclusions • Activity distributions seem to become more and more unequally distributed. • The Pareto principle is clearly present in studied projects. • For Brasero and Evince, the activity is led by a limited number of persons involved in 2 or 3 of the defined activities. • For Wine, it seems not to be the case.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 13
  14. 14. Future work • Determine if the core group of a project evolves over time. • Use sliding windows to ignore inactive persons and discover new active persons. • Study “bus factor” and the persons involved. • Automatic generation of Venn diagrams including all persons involved in software evolution.Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 14
  15. 15. Future work (cont.) • Study correlation between community structure (social network) and source code quality (as computed using software metrics). • Automatic statistical analysis to determine the distributions fitting the data. • Extend and refine types of activity. For instance: • different types of commit activity (doc, source code, test, etc.); of mail activity (information, asking, answering, etc.); of bug repository activity (bug creation, modification and commenting)Université de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 15
  16. 16. Thank youUniversité de Mons Tuesday 2011-03-01, SQM workshop, Oldenburg, Germany Mathieu Goeminne & Tom Mens 16

×