An empirical study on the  Specialisation Effect inOpen Source Communities        Mathieu Goeminne & Tom Mens             ...
Our case study: Gnome      •   A large ecosystem          •    ~1,300 projects having a common               characteristi...
Methodology      • Goal-Question-Metric approach:       • Define research goals       • Define questions to reach these goal...
Research Goals:                 understand      1. how development effort varies across         projects;      2. how deve...
Research Questions      • Are the projects containing more activity         types more active? [Goal 1]      • How special...
Workload and          Involvement Metrics    •    workload APTW(p,d,t)          •    # files in project p touched by develo...
The Gini Index    Economic measure of inequality      • Aggregates a collection of values (e.g.         incomes in a count...
Experimental setup      •   Extract data from Gnome’s Git project repositories      •   Populate databases using this data...
Activity types   • 13 activity types, including:    • coding, translation, documentation, building,         development do...
How is the activity distributed       across activity types?               0.4               0.3               0.2        ...
Developer workload distribution         Imbalance by activity type         RAWS(p)                                        ...
Project workload distribution           Imbalance by activity type         RPWS(p)                                        ...
How specialized are developers                     toward different activity types?        1.0                 ●          ...
How specialized are projects                      toward different activity types?        1.0                ●            ...
What is the developer capacity ofparallel work in different projects?                                                     ...
What is the project capacity of                             accomodating developers?                                      ...
Future work      •   Take into account other factors          •    time: How do our metrics and their relations evolve    ...
Thank youBENEVOL 2011   Empirical Study on Specialisation in FLOSS communities   08/12/11                                 ...
Upcoming SlideShare
Loading in …5
×

An empirical study on the Specialisation Effect in Open Source Communities

516 views
466 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
516
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

An empirical study on the Specialisation Effect in Open Source Communities

  1. 1. An empirical study on the Specialisation Effect inOpen Source Communities Mathieu Goeminne & Tom Mens University of Mons Bogdan Vasilescu & Alexander Serebrenik Eindhoven University of Technology 1
  2. 2. Our case study: Gnome • A large ecosystem • ~1,300 projects having a common characteristic : integration in the same GNU/ Linux Desktop environment • A large developer community • > 5,000 involved authors in the Git repositories • A set of activity types • coding, documentation, translation, etc.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 2
  3. 3. Methodology • Goal-Question-Metric approach: • Define research goals • Define questions to reach these goals • Define metrics the answer these questions • Use metrics to statistically verify hypotheses about the questionsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 3
  4. 4. Research Goals: understand 1. how development effort varies across projects; 2. how development effort varies across developers; 3. how the type of development activity relates to the development effort.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 4
  5. 5. Research Questions • Are the projects containing more activity types more active? [Goal 1] • How specialised are the projects towards different activity types? [Goal 1] • What influences the project specialisation? To which extent are developers specialised in different activity types? [Goal 2] • etc.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 5
  6. 6. Workload and Involvement Metrics • workload APTW(p,d,t) • # files in project p touched by developer d having touched a file of activity type t • involvement APTI(p,d,t) • 1 if APTW(p,d,t) > 0; 0 otherwise • Higher level metrics : aggregation over all projects, developers and/or activity types. Ex: • NPD(d) = Number of projects in which developer d is involved • PW(p) = Global workload of project p • RPTW(p,t) = Workload in p w.r.t activity type t, relative to the global project workload • PWS(p) = Specialisation (imbalance or unequal distribution) of workload across project activity types (using Gini index).BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 6
  7. 7. The Gini Index Economic measure of inequality • Aggregates a collection of values (e.g. incomes in a country) • A value between 0 and 1 : • 0 if everybody has the same income • 1 if a person has all the income and the others don’t have anything.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 7
  8. 8. Experimental setup • Extract data from Gnome’s Git project repositories • Populate databases using this data • Detect multiple logins belonging the same ‘real’ developer thanks to an identity matching algorithm. • Define activity types carried out by project developers • Compute metrics • Verify statistical hypothesesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 8
  9. 9. Activity types • 13 activity types, including: • coding, translation, documentation, building, development documentation. • File classification: based on the file path, name and extension and a set of rules. A rule is a type and a regular expression. • Ex: (doc, ‘.*/DOC(-?)BOOK(S?)/.*’)BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 9
  10. 10. How is the activity distributed across activity types? 0.4 0.3 0.2 0.1 0.0 develdoc build doc unknown test code translate mmedia library image config ui meta databaseBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 10
  11. 11. Developer workload distribution Imbalance by activity type RAWS(p) ● ● ● ●●●●● ● ●● ●●●● ●●●●● ●●●● ●●●● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 Most developers concentrate their workload in few activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 11
  12. 12. Project workload distribution Imbalance by activity type RPWS(p) ● ●●● ●● 0.0 0.2 0.4 0.6 0.8 1.0 Most projects concentrate their workload in few activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 12
  13. 13. How specialized are developers toward different activity types? 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● build code config db devdoc doc transl. img lib meta mm test ui (2675) (3754) (1287) (241) (2937) (1721) (2008) (1129) (19) (1031) (482) (896) (1195) Coding, translating, and writing developer documentation are (highly) specialized activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 13
  14. 14. How specialized are projects toward different activity types? 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● build code config db devdoc doc transl. img lib meta mm test ui (1185) (1262) (626) (132) (1253) (1220) (691) (701) (15) (1081) (314) (590) (579) Most of the work in projects is done in coding, writingdeveloper documentation, translating and creating a build system. BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 14
  15. 15. What is the developer capacity ofparallel work in different projects? ● 140 ● 120 ● ● ● 100 ● ● ● Max. concurrent projects ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● 40 ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● 20 ● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●●●● ●●●● ● ● ● ● ● ●●●●● ●● ● ● ●●●● ●●● ●●● ● ● ● ●● ● ●● ● ●● ● ● ●●●●● ● ● ● ● ●●● ● ● ● ●● ● ● ●●●●●●●● ● ● ● ●● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ●●● ● ● ●●●●● ●●● ● ●●●●● ●●●●●●●● ● ● ●● ● ●● ● ●●●●● ●●● ●● ● ●●●●● ● ● ●●●●● ● ●●● ● ●●●● ● ● ● ●●●●●● ●●●● ●● ● ●● ● ● ● ●●●●●●● ●● ●●●●●● ● ●● ●●●●●● ●● ●● ● ●●● ●●●●●●● ● ● ● ● ●●●● ●●● ●●●● ● ●●●● ● ●● ●● ●● ●●●● ●● ●●● ● ● ●●●● ●●●● ●●● ●● ● ●● ●●● ● ● ● ●●● ●●● ● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ●●●●●● ● ●●●● ● ●●● ●● ●● ●● ●● ●●● ●● ●● ● ●● ● ● 0 0 100 200 300 400 500 Number of projectsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 15
  16. 16. What is the project capacity of accomodating developers? ● 250 ● ● ● ● 200 ● ● ● ● ● ● ● ● Max. concurrent authors ● ● ● ● ● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● 100 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●●●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ●●●● ●●● ● ● 50 ● ● ●● ●● ● ● ● ●● ●●● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ● ●●●● ●●●● ●● ● ● ●● ●●●● ● ● ●●●●●●●● ●●●● ● ● ●● ●●● ● ●●●● ●●● ●●● ● ● ●●● ● ●● ●●● ● ● ●● ● ●●●● ●●●●●● ● ●●● ● ●● ●● ●●● ● ●●● ● ● ●●●●● ● ● ●●● ● ●●● ● ●● ● ●● ●●● ● ●● ●● ● ●●●●●●●● ●●●●● ● ●●●●● ●●●●● ●● ●●●● ● ● ●●●●● ●●●●● ●● ● ●●●●● ●●● ● ● ●●●●● ●●● ● ● ● ●●●●● ● ●●●●● ● ●●●●● ● ●●●●● ●● ● ●●● ●●●●● ●●●● ●●● ●● ●● ● ●●● ●●● ●● ●●●● ● ●●●● ● ●● ● ●●● ●● ●● ●● ●● ● ●● ●● ●●● ●● ●●● ●● ●● ●● ●● ●● ● ●●● ●●● ●● ●● ● ●●● ●● ● ●● ●● ●● ●● ●● ● ●● ●● ● ● ● ●● 0 0 200 400 600 Number of authorsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 16
  17. 17. Future work • Take into account other factors • time: How do our metrics and their relations evolve over time? • project categories (e.g. archived, deprecated, ...) • project age and maturity • main programming language • ... • Develop a dashboard tool to detect, prevent and predict health problems in evolving software ecosystemsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 17
  18. 18. Thank youBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 18

×