On the variation and specialisation of workload : The gnome case

231 views
195 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
231
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

On the variation and specialisation of workload : The gnome case

  1. 1. On the variation and specialisation of workload The Gnome case B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mensmardi 4 décembre 2012
  2. 2. Gnome as an ecosystem • Ecosystem: set of interconnected projects • ~ 1400 projects • ~ 3000 contributors • 15 years of activity Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  3. 3. How does workload vary across contributors? • Who are they? • What do they do? • How do they do it? A partial answer by analysing the git repositories. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  4. 4. Who are the contributors? Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  5. 5. Identity matching • Contributors have an account per project repository… • … and sometimes more than one. • No explicit links between the accounts, need to guess them. • Based on names and e-mails found in the git repositories. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  6. 6. Identity matching (cont.) • (semi) automatic classification techniques. • Must take into account variations, abbreviations, permutations, misspelling, nicknames, etc. • No perfect process: even a manualy post-checked result can contain false positives and false negatives. • Since Gnome has no strict identification regulation on the whole, some matches are not detectable without an extra context information. Fictitious example: • Robbie Williams <robbiew@gnome.org> • Euphegenia Doubtfire <euphegenia@gmail.com> Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  7. 7. What do the contributors do? Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  8. 8. 13 activity types • Identified by the path, name and extension of the touched files. • Coding : *.c, *.java, etc. • Translation : *.po, etc. • Testing : */test/*, etc. • ... Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  9. 9. How do the contributors contribute? Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  10. 10. Metrics • APTW(p,c,t) : Number of files touched by the contributor c performing an activity of type t in a project p. • Derived metrics, by aggregation: max, sum, etc. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  11. 11. Workload 600 500 • 50% contributorsNumber of authors 400 made < 14 changes. 300 • 1 contributor made 200 185,874 changes. 100 0 0 2 4 6 8 10 12 log(AW) Université de Mons Rapport de formation doctorale 2011 Mathieu Goeminne mardi 4 décembre 2012
  12. 12. The more things you do, the more things you can! • Correlations • Between the number of activity types and the workload. • Between the number of projects and the workload. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  13. 13. Favorite activities of contributors having ≥ 14 changes • Most frequent contributors specialise in coding and development documentation. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  14. 14. Favorite activities of contributors having < 14 changes • Most occasional contributors specialise in translation and coding. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  15. 15. How strongly do the contributor’s focus? • Basic measure : RATW(c,t) • % of the total workload of c dedicated to t. • Use of Gini as inequality index: • Value in [0, 1[ • 0 if the workload is equally distributed. • Close to 1 if the workload is concentrated in few activity types. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  16. 16. Contributor’s focus (cont.) • Occasional contributors typically participate in a single activity type. • Frequent contributors typically participate in few activity types. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  17. 17. To summarise Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  18. 18. What did we learn? • Most contributors are occasional and are involved in only one activity type; few are very active; frequent contributors are involved in few activity types. • The more things you do, the more things you can. • Occasional contributors are translators, involved in many projects. Frequent contributors are coders and are involved in few projects. • And more again in our paper. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  19. 19. How did we do it? • Contributor matching: semi-automatic and automatic methods. • Activity identification based on file path/name/extension rules. • Advanced statistical analysis (among others for the partial ordering of activity types). • Specialisation: aggregation with inequality indices. Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  20. 20. In the future • Add a temporal aspect: How does the contributors’ behaviour change over time? • Consider subsets of Gnome: subecosystems composed by projects sharing stronger properties than all projects on average: archived, by theme, etc. • Combine both by studying migration trends. •… Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012
  21. 21. Thank you On the variation and specialisation of workload – A case study of the Gnome ecosystem community B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens Empirical Software Engineering Waiting for being accepted Variation and specialisation of workload Benevol 2012mardi 4 décembre 2012

×