• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An empirical study on the Specialisation Effect in Open Source Communities
 

An empirical study on the Specialisation Effect in Open Source Communities

on

  • 480 views

 

Statistics

Views

Total Views
480
Views on SlideShare
480
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An empirical study on the Specialisation Effect in Open Source Communities An empirical study on the Specialisation Effect in Open Source Communities Presentation Transcript

    • An empirical study on the Specialisation Effect inOpen Source Communities Mathieu Goeminne & Tom Mens University of Mons Bogdan Vasilescu & Alexander Serebrenik Eindhoven University of Technology 1
    • Our case study: Gnome • A large ecosystem • ~1,300 projects having a common characteristic : integration in the same GNU/ Linux Desktop environment • A large developer community • > 5,000 involved authors in the Git repositories • A set of activity types • coding, documentation, translation, etc.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 2
    • Methodology • Goal-Question-Metric approach: • Define research goals • Define questions to reach these goals • Define metrics the answer these questions • Use metrics to statistically verify hypotheses about the questionsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 3
    • Research Goals: understand 1. how development effort varies across projects; 2. how development effort varies across developers; 3. how the type of development activity relates to the development effort.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 4
    • Research Questions • Are the projects containing more activity types more active? [Goal 1] • How specialised are the projects towards different activity types? [Goal 1] • What influences the project specialisation? To which extent are developers specialised in different activity types? [Goal 2] • etc.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 5
    • Workload and Involvement Metrics • workload APTW(p,d,t) • # files in project p touched by developer d having touched a file of activity type t • involvement APTI(p,d,t) • 1 if APTW(p,d,t) > 0; 0 otherwise • Higher level metrics : aggregation over all projects, developers and/or activity types. Ex: • NPD(d) = Number of projects in which developer d is involved • PW(p) = Global workload of project p • RPTW(p,t) = Workload in p w.r.t activity type t, relative to the global project workload • PWS(p) = Specialisation (imbalance or unequal distribution) of workload across project activity types (using Gini index).BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 6
    • The Gini Index Economic measure of inequality • Aggregates a collection of values (e.g. incomes in a country) • A value between 0 and 1 : • 0 if everybody has the same income • 1 if a person has all the income and the others don’t have anything.BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 7
    • Experimental setup • Extract data from Gnome’s Git project repositories • Populate databases using this data • Detect multiple logins belonging the same ‘real’ developer thanks to an identity matching algorithm. • Define activity types carried out by project developers • Compute metrics • Verify statistical hypothesesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 8
    • Activity types • 13 activity types, including: • coding, translation, documentation, building, development documentation. • File classification: based on the file path, name and extension and a set of rules. A rule is a type and a regular expression. • Ex: (doc, ‘.*/DOC(-?)BOOK(S?)/.*’)BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 9
    • How is the activity distributed across activity types? 0.4 0.3 0.2 0.1 0.0 develdoc build doc unknown test code translate mmedia library image config ui meta databaseBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 10
    • Developer workload distribution Imbalance by activity type RAWS(p) ● ● ● ●●●●● ● ●● ●●●● ●●●●● ●●●● ●●●● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 Most developers concentrate their workload in few activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 11
    • Project workload distribution Imbalance by activity type RPWS(p) ● ●●● ●● 0.0 0.2 0.4 0.6 0.8 1.0 Most projects concentrate their workload in few activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 12
    • How specialized are developers toward different activity types? 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● build code config db devdoc doc transl. img lib meta mm test ui (2675) (3754) (1287) (241) (2937) (1721) (2008) (1129) (19) (1031) (482) (896) (1195) Coding, translating, and writing developer documentation are (highly) specialized activitiesBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 13
    • How specialized are projects toward different activity types? 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● build code config db devdoc doc transl. img lib meta mm test ui (1185) (1262) (626) (132) (1253) (1220) (691) (701) (15) (1081) (314) (590) (579) Most of the work in projects is done in coding, writingdeveloper documentation, translating and creating a build system. BENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 14
    • What is the developer capacity ofparallel work in different projects? ● 140 ● 120 ● ● ● 100 ● ● ● Max. concurrent projects ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● 40 ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● 20 ● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ●● ● ●●●● ●●●● ● ● ● ● ● ●●●●● ●● ● ● ●●●● ●●● ●●● ● ● ● ●● ● ●● ● ●● ● ● ●●●●● ● ● ● ● ●●● ● ● ● ●● ● ● ●●●●●●●● ● ● ● ●● ● ● ●●●●● ● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ●●● ● ● ●●●●● ●●● ● ●●●●● ●●●●●●●● ● ● ●● ● ●● ● ●●●●● ●●● ●● ● ●●●●● ● ● ●●●●● ● ●●● ● ●●●● ● ● ● ●●●●●● ●●●● ●● ● ●● ● ● ● ●●●●●●● ●● ●●●●●● ● ●● ●●●●●● ●● ●● ● ●●● ●●●●●●● ● ● ● ● ●●●● ●●● ●●●● ● ●●●● ● ●● ●● ●● ●●●● ●● ●●● ● ● ●●●● ●●●● ●●● ●● ● ●● ●●● ● ● ● ●●● ●●● ● ● ●●● ● ●●● ● ●●● ● ●● ● ● ● ●●●●●● ● ●●●● ● ●●● ●● ●● ●● ●● ●●● ●● ●● ● ●● ● ● 0 0 100 200 300 400 500 Number of projectsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 15
    • What is the project capacity of accomodating developers? ● 250 ● ● ● ● 200 ● ● ● ● ● ● ● ● Max. concurrent authors ● ● ● ● ● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● 100 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ● ●●●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ●●●● ●●● ● ● 50 ● ● ●● ●● ● ● ● ●● ●●● ● ●● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ●● ●● ● ● ●●●● ●●●● ●● ● ● ●● ●●●● ● ● ●●●●●●●● ●●●● ● ● ●● ●●● ● ●●●● ●●● ●●● ● ● ●●● ● ●● ●●● ● ● ●● ● ●●●● ●●●●●● ● ●●● ● ●● ●● ●●● ● ●●● ● ● ●●●●● ● ● ●●● ● ●●● ● ●● ● ●● ●●● ● ●● ●● ● ●●●●●●●● ●●●●● ● ●●●●● ●●●●● ●● ●●●● ● ● ●●●●● ●●●●● ●● ● ●●●●● ●●● ● ● ●●●●● ●●● ● ● ● ●●●●● ● ●●●●● ● ●●●●● ● ●●●●● ●● ● ●●● ●●●●● ●●●● ●●● ●● ●● ● ●●● ●●● ●● ●●●● ● ●●●● ● ●● ● ●●● ●● ●● ●● ●● ● ●● ●● ●●● ●● ●●● ●● ●● ●● ●● ●● ● ●●● ●●● ●● ●● ● ●●● ●● ● ●● ●● ●● ●● ●● ● ●● ●● ● ● ● ●● 0 0 200 400 600 Number of authorsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 16
    • Future work • Take into account other factors • time: How do our metrics and their relations evolve over time? • project categories (e.g. archived, deprecated, ...) • project age and maturity • main programming language • ... • Develop a dashboard tool to detect, prevent and predict health problems in evolving software ecosystemsBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 17
    • Thank youBENEVOL 2011 Empirical Study on Specialisation in FLOSS communities 08/12/11 18