ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)
Upcoming SlideShare
Loading in...5
×
 

ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track)

on

  • 440 views

Presentation of research goals and ongoing research in the joint ARC project "ECOS: Ecological Studies of Open Source Software Ecosystems", presented by Tom Mens (UMONS) during the projects track of ...

Presentation of research goals and ongoing research in the joint ARC project "ECOS: Ecological Studies of Open Source Software Ecosystems", presented by Tom Mens (UMONS) during the projects track of the CSMR-WCRE 2014 Software Evolution Week. Collaborators: Philippe Grosjean and Maelick Claes.

Statistics

Views

Total Views
440
Views on SlideShare
440
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track) ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014 Projects Track) Presentation Transcript

  • ECOS:  Ecological  Studies  of
 Open  Source  So6ware  Ecosystems
 • • Tom  Mens,  Maelick  Claes   So6ware  Engineering  Lab   ! • Philippe  Grosjean
  Numerical  Ecology  Lab informaEque.umons.ac.be/genlog/projects/ecos
  • About  ECOS informaEque.umons.ac.be/genlog/projects/ecos • “AcEon  de  Recherche  Concertée”
 of  University  of  Mons   – Interdisciplinary  project   • Combines  research  in  biology  (ecology)  and
 compuEng  science  (empirical  so6ware  engineering)   – COMPLEXYS  Research  InsEtute   – Oct  2012  —>  Sep  2017   – 500K  EUR  funding   • Related  EU  project: 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 2
  • High-­‐level  project  goal • Improve  understanding  of,  and  support  for,  open  source   so#ware  ecosystems   –Draw  inspiraEon  from  biological  evoluEon,  ecology  and   natural  ecosystems   • Determine  main  factors  of  success  and  failure  of  OSS  projects   within  their  ecosystem   –Provide  beeer  techniques  and  mechanisms  to  predict  and   improve  survivability  of  OSS  projects  and  resilience  of  their   ecosystems   –Provide  guidelines  and  evoluEon  dashboards  to  support   so6ware  communiEes 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 3
  • So6ware  ecosystem   DefiniEon Business-­‐oriented  view • “a  set  of  actors  func5oning  as  a  unit   and  interac5ng  with  a  shared  market   for  so#ware  and  services,  together   with  the  rela5onships  among   them.”  (Jansen  et  al.  2009) Examples • Eclipse   • Android  and  iOS  app  store 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 4
  • So6ware  ecosystem   DefiniEon Development-­‐centric  view Examples • “a  collec5on  of  so#ware  products   that  have  some  given  degree  of   symbio5c  rela5onships.”   • Gnome
 KDE   ! • Debian
 Ubuntu   ! • R’s  CRAN   ! • Apache • Messerschmie  &  Szyperski:  So#ware   ecosystem:  Understanding  an   indispensable  technology  and  industry.  MIT   Press,  2003.   • “a  collec5on  of  so#ware  projects   that  are  developed  and  evolve   together  in  the  same  environment.”   • M.  Lungu:  Towards  reverse  engineering   so6ware  ecosystems.  Int’l  Conf.  So#ware   Maintenance,  2008,  pp.  428–431. 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 5
  • Main  Research  QuesEons • Which  control  mechanisms  driving  natural   ecosystems  can  be  used  to  explain   dynamics  of  so6ware  ecosystems?   ! • Which  mechanisms  and  measures  can  we   borrow  from  ecology  to  explain  and   predict  how  so6ware  projects  evolve? 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 6
  • Terminology Biological  ecosystem DefiniEons Example:  coral  reefs • Ecology:  the  scien5fic  study   of  the  interac5ons  that   determine  the  distribu5on  and   abundance  of  organisms   • Ecosystem:  the  physical  and   biological  components  of  an   environment  considered  in   rela5on  to  each  other  as  a  unit   – combines  all  living   organisms  (plants,   animals,  micro-­‐organisms)   and  physical  components   (light,  water,  soil,  rocks,   minerals) 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium • High  biodiversity:  polyps,  sea   anemones,  fish,  mollusks,   sponges,  algae 7
  • Comparison 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 8
  • Ecological  theories  of evoluEon  of  species • Jean-­‐BapEste  Lamarck  (1744-­‐  1829)   • animal  organs  and  behaviour  can  change  according  to
 the  way  they  are  used   • those  characterisEcs  can  transmit  from  one  generaEon  to
 the  next  to  reach  a  greater  level  of  perfecEon   • Example:  giraffe’s  necks  have  become  longer  while  trying  to  reach  the  upper   leaves  of  a  tree   • Charles  Darwin  (1809–1882)   • all  species  of  life  have  descended  over  Eme
 from  common  ancestors   • this  branching  paeern  resulted  from  natural  selecEon   • evoluEon  history  is  represented  by  a  phylogene5c  tree   • Example:  13  types  of  Galapagos  finches,  same  habits
 and  characterisEcs,  but  different  beaks 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 9
  • Ecological  theories  of evoluEon  of  species Hologenome  theory   • The  unit  of  natural  selecEon  is  the  holobiont:  the  organism   together  with  its  associated  microbial  communiEes,  that  live   together  in  symbiosis.   • The  holobiont  can  adapt  to  changing  environmental  condiEons   far  more  rapidly  than  by  geneEc  mutaEon  and  selecEon  alone.     • Darwinism  emphasises  compe55on  (survival  of  the  fieest),   hologenome  theory  also  includes  coopera5on  (through   symbiosis)   ! In  so6ware  evoluEon:  Hologenome  theory  may  be  closer  to  what  one   observes  in  open  source  projects  where  cooperaEon  plays  a  more   important  role. 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 10
  • Ecological  theories  of evoluEon  of  species ReEculate  evoluEon   • EvoluEon  history  is  represented  as  a  graph  structure.
 Two  or  more  evoluEonary  lineages  can  be
 recombined  at  some  level   • hybrid  specia5on  (2  lineages  recombine  to  create
 a  new  one)     • horizontal  gene  transfer  (genes  are  transferred
 across  species)   ! In  so6ware  evoluEon:  Distributed  VCS  like  Git  promote  reEculate   evoluEon  through  fork  and  merge  (but  few  projects  actually  merge)   ! See  Robles  et  al.  A  Comprehensive  Study  of  So#ware  Forks:  Dates,   Reasons  and  Outcomes.  OSS  Conference  2012,  Best  Paper  Award. 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 11
  • EvoluEon  History So6ware
 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 12
  • Trophic  web  (food  chain) in  natural  ecosystems 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 13
  • Trophic  web  in
 so6ware  ecosystems •Producer-­‐consumer  relaEon TOP-­‐DOWN   change  requests   &  bug  reports BOTTOM-­‐UP   changes  in  core   projects  and   architecture   Onion  model Users Peripheral   developers Core  developers 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 14
  • Core  Architecture  -­‐  or Why  developers  are  polyps
 Coral  reef  ecosystem So6ware  ecosystem • Sclerac5nian  coral  polyps  are   • Core  developers  are   responsible  for  creaEng  the   responsible  for  creaEng  the   coral  reef  structure   core  so6ware  architecture   • This  coral  reef  is  required  for   • Based  on  this  core   the  other  species  of  the   architecture,  other   ecosystem  to  thrive. developers  and  third  parEes   can  create  other  projects,   services,  and  so  on. 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 15
  • So6ware  Ecosystem
 Dynamics Predator-­‐prey  relaEonship  (Lotka-­‐Volterra  1925/1926)   • Predators  (hunEng  animals)  feed  upon  their  prey
 (aeacked  animals)   • Can  be  described  by  a  dynamic  model  with  mutually  dependent
 parametric  differenEal  equaEons   Analogies  in  so6ware  maintenance   • Debuggers  are  predators,  so6ware  defects  are  prey   Calzolari  et  al.  Maintenance  and  tes5ng  effort  modeled  by  linear  and  nonlinear   dynamic  systems,”  Informa5on  and  So#ware  Technology,  2001   • Developers  are  predators,  the  informaEon  they  seek  is  prey   Lawrance  et  al.  Scents  in  programs:  Does  informa5on  foraging  theory  apply  to   program  maintenance?  VL/HCC  2007   • Dual  (socio-­‐technical)  view:   • Developers  are  predators,  the  projects  they  work  on  are  prey   • Projects  are  predators  that  feed  upon  the  cogniEve  resources  of  their  developers 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 16
  • Desirable  ecosystem  characterisEcs Biodiversity  measures  the  degree  of   variaEon  of  species  within  a  given   ecosystem   • Maximum  diversity  if  all  species   have  same  number  of   individuals   • Low  diversity  if  a  parEcular   species  dominates  the  others   • Many  different  metrics:  Shannon   entropy,  Simpson  index,  evenness,  …   ! • Posnee  et  al.  used  similar  noEon  to   measure  developer  ac5vity  focus   and  module  ac5vity  focus Dual Ecological Measures of Focus in Software Development Daryl Posnett† , Raissa D’Souza∗ , Premkumar Devanbu,† and, Vladimir Filkov† †∗ University of California Davis, USA † {dpposnett,ptdevanbu,vfilkov}@ucdavis.edu,∗ raissa@cse.ucdavis.edu Abstract—Work practices vary among software developers. Some are highly focused on a few artifacts; others make wideranging contributions. Similarly, some artifacts are mostly authored, or “owned”, by one or few developers; others have very wide ownership. Focus and ownership are related but different phenomena, both with strong effect on software quality. Prior studies have mostly targeted ownership; the measures of ownership used have generally been based on either simple counts, information-theoretic views of ownership, or social-network views of contribution patterns. We argue for a more general conceptual view that unifies developer focus and artifact ownership. We analogize the developer-artifact contribution network to a predator-prey food web, and draw upon ideas from ecology to produce a novel, and conceptually unified view of measuring focus and ownership. These measures relate to both cross-entropy and Kullback-Liebler divergence, and simultaneously provide two normalized measures of focus from both the developer and artifact perspectives. We argue that these measures are theoretically well-founded, and yield novel predictive, conceptual, and actionable value in software projects. We find that more focused developers introduce fewer defects than defocused developers. In contrast, files that receive narrowly focused activity are more likely to contain defects than other files. I. I NTRODUCTION Developers are the lifeblood of open source software, OSS, and their contributions are vital for OSS to thrive. Rather than being assigned tasks by management, OSS developers are generally free to choose the style, focus, and breadth of their contributions. Some might be quite focused, working on one specific subsystem; others may contribute to many different subsystems. An device driver expert, for example, may contribute very specialized knowledge to an open source project, focusing on only a few files or packages. His contributions to a small subset of modules1 may be his only contribution during his tenure with the project. In contrast, a project leader may work on a variety of different tasks touching many modules within a project. While OSS developers are free to choose their contribution styles, such choices are not inconsequential, especially to the central issue of software quality. A dominant theme emerging from previous work in this area is module ownership [1], [2], [3]. Low ownership of a module, i.e., too many contributors, can adversely impact code quality. There is, however, an entirely different perspective, developer’s attention focus, which is relatively unexplored. Human attention and cognition are finite resoucres [4]. When different tasks are simultaneously engaged, they can compete 1 We use modules to mean either packages or files, depending on the context. 978-1-4673-3074-9/13/$31.00 c 2013 IEEE 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium ICSE   2013 for mental resources and task performance can suffer [5]. A developer engaged in many different tasks carries a greater cognitive burden than a more focused developer. Interestingly, the developer and module perspectives are, conceptually symmetric, dualistic views of focus. From a module’s perspective, strong ownership indicates a strong focused contribution. We refer to this as module activity focus, or MAF, a measure of how focused the activities are on a module. Symmetrically, we refer to the developer’s attention focus, or DAF, a measure of how focused the activities are of a particular developer. A surprising, but natural analogy for MAF and DAF, are predator-prey food webs from ecology. In a sense, modules are predators that “feed upon” the cognitive resources of developers. As the number of developers contributing to a module increases, the diversity of cognitive resources upon which the module “feeds” also increases; likewise, a developer is a “prey” whose limited cognitive resources are spread over the modules that “prey” upon her. Ecosystem diversity is of great interest to ecologists. Williams and Martinez call the roles complexity and diversity play “[o]ne of the most important and least settled questions in ecology.” [6] This diversity has two symmetric perspectives, both from a prey’s perspective, and a predator’s perspective. Ecologists have developed sophisticated symmetric measures of predator-prey relationships, drawing upon ideas such as entropy and Kulback-Leibler divergence, that simultaneously capture both perspectives. We adapt these measures for software engineering projects into the metrics MAF and DAF. In this work, we employ the methodology presented by El Emam to validate our measures [7]. In particular, we show that the DAF and MAF measures succeed in distinguishing important cases that extant measures don’t capture. We make the following contributions: • We adapt terminology and motivation from ecology, based on bipartite graphs; • We incorporate and generalize previous results on developer and artifact diversity; • We provide easy to compute measures of focus, MAF and DAF, normalized to facilitate comparison within and across projects; • We show these measures more precisely capture outcomes relevant to software researchers and practitioners. This novel analysis simultaneously considers focus both from the artifact perspective and the author perspective. Researchers can use our MAF and DAF metrics to more 452 ICSE 2013, San Francisco, CA, USA 17
  • Desirable  ecosystem  characterisEcs • Stability   • the  capacity  to  maintain  an  equilibrium  over  longer  periods  of  Eme   • Resistance   • the  ability  to  withstand  environmental  changes  without  too  much   disturbances  of  its  biological  communiEes   • Resilience   • the  ability  to  return  to  an  equilibrium  a6er  a  disturbance   ! Goal:  Use  these  and  related  measures  to  study  maintainability  and   survivability  of  so6ware  projects  within  their  ecosystem 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 18
  • Ongoing  Research   2  case  studies • CRAN  (Comprehensive  R  Archive  Network)   – CharacterisEcs   15  years   >  5000  packages   >  2500  contributors   different  OS  flavours  (Linux,  Windows,  MacOS,  Solaris)   superlinear  package  growth   – Goal   • Study  package  dependencies  and  maintainability  (number  of   errors  and  Eme  to  fix)  and  their  effect  on  package   survivability   • See  our  CSMR-­‐WCRE  2014  ERA  paper  “On  the    maintainability   of  CRAN  packages”   5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 19
  • Ongoing  Research   2  case  studies • GNOME   – CharacterisEcs   16  years   >  1400  projects   >  5800  contributors   >  1.3M  commits   >  12M  file  touches   – Goals   1. Combine  different  ecosystem  measures  into  a  predicEve   model  of  project  survivability       2. Study  migra5on  paberns  of  contributors  and  their  effect   on  project  survivability 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 20
  • Ongoing  Research   GNOME  case  study  1 Combine  different  ecosystem   measures  into  a  predicEve   model  of  project  survivability     IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 38, NO. 1, JANUARY/FEBRUARY 2012 163 Defining and Evaluating a Measure of Open Source Project Survivability Uzma Raja, Member, IEEE Computer Society, and Marietta J. Tretter – Replicate  and  generalise  the   empirical  study  by  Uzma   Raja Abstract—In this paper, we define and validate a new multidimensional measure of Open Source Software (OSS) project survivability, called Project Viability. Project viability has three dimensions: vigor, resilience, and organization. We define each of these dimensions and formulate an index called the Viability Index (V I) to combine all three dimensions. Archival data of projects hosted at SourceForge.net are used for the empirical validation of the measure. An Analysis Sample (n ¼ 136) is used to assign weights to each dimension of project viability and to determine a suitable cut-off point for V I. Cross-validation of the measure is performed on a holdout Validation Sample (n ¼ 96). We demonstrate that project viability is a robust and valid measure of OSS project survivability that can be used to predict the failure or survival of an OSS project accurately. It is a tangible measure that can be used by organizations to compare various OSS projects and to make informed decisions regarding investment in the OSS domain. Index Terms—Evaluation framework, external validity, open source software, project evaluation, software measurement, software survivability. Ç 1 INTRODUCTION O PEN Source Software (OSS) projects are developed and distributed for free, with full access to the project source code. Recently there has been a significant increase in the use of these projects. Some OSS projects have earned themselves a high reputation and corporate sponsorships. Large corporations (e.g., IBM, SUN microsystems) are becoming involved with the OSS movement in various capacities. Projections indicate that the corporate interest in OSS projects will grow stronger in the future [1] and these projects will see integration in enterprise architecture [2]. This increased use of OSS projects creates the need for better project evaluation measures. Traditionally, software projects are evaluated by conformance to budget, schedule, and user requirements [3], [4], [5], [6], [7], [8]. These measures, however, are difficult to map to OSS projects, which are developed through a network of volunteer participants, with no defined budget, schedule, or customer. Although there is a surge in the investment in OSS projects [1], research indicates that a large number of OSS projects fail [9], [10]. Some have questioned the operational reliability and quality of OSS projects [11]. Since there are no contractual or legal bindings for providing OSS updates or maintenance services, businesses investing human or financial capital on adoption of OSS projects need the ability to evaluate whether the project will continue to exist or not [12]. Development teams need to measure 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium . U. Raja is with the Department of Information Systems, Statistics and Management Science, The University of Alabama, Box #870226, 300 Campus Drive, Tuscaloosa, AL 35487. E-mail: uraja@cba.ua.edu. . M.J. Tretter is with the Department of Information and Operations Management, Texas A&M University, Mail Stop #310D, Wehner project survivability to control and improve performance. Individual and corporate users need a measure of project survivability to compare the available OSS projects before making decisions regarding project adoption. In this paper, we define and validate a new multidimensional measure of OSS project survivability, called Project Viability. OSS projects provide access to their development archives, thereby providing a unique opportunity to conduct empirical research [13] and develop reliable measures [14], [15]. In the following sections, we define, formulate, and validate project viability. Section 2 provides a brief overview of the existing empirical research in OSS and the background of project survivability. Section 3 defines the dimensions of project viability and formulates an index to measure it. Section 4 discusses the empirical evaluation framework and validates the new measure using OSS project data. Discussion of the results is presented in Section 5 and conclusions are given Section 6. 2 BACKGROUND A large number of OSS projects are available for use. However, the failure rate of these projects is high [9]. The evaluation of OSS projects is different than Commercial Software Systems (CSS) [16]. The adopters of OSS projects need a mechanism to compare the chances of failure or survival of the available projects. This would allow better decisions regarding corporate resource investment. A range of measures has been used in prior research to evaluate OSS projects. Godfrey and Tu [17] examined the evolution of the Linux kernel and its growth pattern in one 21
  • Ongoing  Research   GNOME  case  study  2 Study  migra5on   paberns  of  contributors   and  their  effect  on   project  survivability 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 22
  • joiners are incoming coders in the considered project that were not active in any of the G NOME projects during the preceding period. A similar definition holds for the local and global leavers. Formally, the metrics are defined as follows. Let p be a G NOME project, t a 6-month activity period (and t 1 the previous period), c a coder, Gnome the set of G NOME’s code projects, and isDev(c,t, p) is a predicate which is true if and only if c made a code commit in p during t: Ongoing  Research   GNOME  case  study  2 Timeline  (6-­‐month  intervals)   of  joiners  to  Gnome  projects localLeavers(p,t) = {c|isDev(c,t 1, p) ^ ¬isDev(c,t, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t, p2 ))} globalLeavers(p,t) = {c|isDev(c,t 1, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t, p2 ))} localJoiners(p,t) = {c|isDev(c,t, p) ^ ¬isDev(c,t 1, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t 1, p2 ))} globalJoiners(p,t) = {c|isDev(c,t, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t 1, p2 ))} 2001 2003 2005 2007 2009 2011 2013 30 1999 2001 2003 2005 2007 2009 Time 25 evolution gtk+ 5 0 2011 2013 1997 1999 2001 2003 2005 2007 2009 2011 2013 Time gimp 15 20 Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (red dashed) joiners (y-axis) for three G NOME projects. Joiners 25 Joiners 15 20 25 20 15 10 35 1997 30 1999 Time Joiners GTK+ 5 0 1997 15 Joiners 20 25 30 25 20 15 Joiners 10 15 10 5 Gimp 0 35 EvoluEon 30 30 35 Joiners 20 25 30 -­‐  Black  =  local  joiners  from  other  Gnome  projects   -­‐  Red  =  global  joiners  from  outside  of  Gnome   -­‐  Blue  =  stayers 5 0 10 5 0 10 5 0 10 We did not find any general trend, the patterns of intake and loss of coders are highly project-specific. Figure 1.11 illustrates the evolution of the number of local and global joiners for some of the more important G NOME projects (the figures for leavers are very similar). For some projects (e.g., evolution) we do not observe a big difference between the number of local and global joiners, respectively. These projects seem to attract new developers both from within and outside of G NOME. Other projects, like gimp, 2013 attract most of its incoming developers from outside 1997 1999 2001 2003 2005 2007 2009 2011 2013 1997 1999 2001 2003 2005 2007 2009 2011 1997 1999 2001 2003 2005 2007 2009 2011 2013 G NOME. A third category of projects attracts most of its incoming developers from Time Time Time other G NOME projects. This is the case for gtk+, glib and libgnome, which 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Bcan be considered as belonging to the core of G NOME. This observation seems to 23 elgium
  • 28 Tom Mens, Ma¨ lick Claes, Philippe Grosjean and Alexander Serebrenik e MigraEon  in  so6ware  ecosystems Gnome  case  study project that were not active in this project during the preceding 6-month period, but that were involved in some activity in other G NOME projects instead. Global joiners are incoming coders in the considered project that were not active in any of the G NOME projects during the preceding period. A similar definition holds for the local and global leavers. Formally, the metrics are defined as follows. Let p be a G NOME project, t a 6-month activity period (and t 1 the previous period), c a coder, Gnome the set of G NOME’s code projects, and isDev(c,t, p) is a predicate which is true if and only if c made a code commit in p during t: Timeline  (6-­‐month  intervals)   of  leavers  from  Gnome  projects localLeavers(p,t) = {c|isDev(c,t 1, p) ^ ¬isDev(c,t, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t, p2 ))} globalLeavers(p,t) = {c|isDev(c,t 1, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t, p2 ))} localJoiners(p,t) = {c|isDev(c,t, p) ^ ¬isDev(c,t 1, p) ^ 9p2 (p2 2 Gnome ^ isDev(c,t 1, p2 ))} globalJoiners(p,t) = {c|isDev(c,t, p) ^ 8p2 (p2 2 Gnome ) ¬isDev(c,t 1, p2 ))} 35 30 25 Joiners 20 30 5 10 25 evolution 2009 2011 2013 1997 0 20 2007 1999 2001 2003 2005 2007 2009 Time 15 2005 gtk+ 2011 2013 1997 1999 2001 2003 2005 2007 2009 2011 2013 Time gimp 10 Fig. 1.11 Historical evolution (timeline) of the number of local (black solid) and global (red dashed) joiners (y-axis) for three G NOME projects. 5 15 10 15 30 20 15 Joiners 10 5 0 2003 Leavers 0 2001 Time 5 GTK+ 25 30 25 20 Joiners 10 5 25 20 20 Leavers 1999 We did not find any general trend, the patterns of intake and loss of coders are highly project-specific. Figure 1.11 illustrates the evolution of the number of local 1997 1999 2001 2003 2005 2007 2009 2011 2013 1997 1999 2001 2003 2005 2007 2009 2011 2013 1997 1999 2001 2003 2005 2007 2009 2011 and global joiners for some of the more important G NOME projects (the figures for2013 Time Time leavers are very similar). For some projects (e.g., evolution) we do not observe Time a elgium big difference between the number of local and global joiners, respectively. These24 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  B 0 0 0 Leavers 15 1997 5 10 Gimp 15 35 30 EvoluEon 25 30 35 -­‐  Black  =  local  joiners  from  other  Gnome  projects   -­‐  Red  =  global  joiners  from  outside  of  Gnome   -­‐  Blue  =  stayers
  • Some  references UMONS Faculté des Sciences Département d’Informatique To appear in 2013 in Springer’s Empirical Software Engineering journal – manuscript No (will be inserted by the editor) On the variation and specialisation of workload – A case study of the Gnome ecosystem community Understanding the Evolution of Socio-technical Aspects in Open Source Ecosystems: An Empirical Analysis of GNOME Mathieu Goeminne DOI: 10.1007/s10664-013-9244-1 A dissertation submitted in fulfillment of the requirements of the degree of Docteur en Sciences Advisor Jury Dr. TOM M ENS Dr. X AVIER B LANC Université de Mons, Belgium Université de Bordeaux 1, France Dr. V ÉRONIQUE B RUYÈRE Université de Mons, Belgium @  MSR  2013 Universidad Rey Juan Carlos, Spain A historical dataset for G NOME contributors Dr. T M Dr. J ESUS M. G ONZALEZ -B ARAHONA OM ENS Université de Mons, Belgium Mathieu Goeminne, Ma¨ lick Claes and Tom Mens e Software Engineering Lab, COMPLEXYS research institute,A LEXANDER S EREBRENIK Dr. UMONS, Belgium Technische Universiteit Eindhoven, The Netherlands Abstract—We present a dataset of the open source software ecosystem G NOME from a social point of view. We have collected historical data about the contributors to all G NOME projects stored on git.gnome.org, taking June into account the problem of identity matching, and associating different activity types to the contributors. This type of information is very useful to complement the traditional, source-code related information one can obtain by mining and analyzing the actual source code. The dataset can be obtained at https://bitbucket.org/ mgoeminne/sgl-flossmetric-dbmerge. Bogdan Vasilescu · Alexander Serebrenik · Mathieu Goeminne · Tom Mens Dr. J EF we have In this paper, we present the process W IJSENused Université de Mons, information to create a dataset containing the historicalBelgium related to contributors to the G NOME ecosystem. Our database 2013 and the tools and scripts used to created it can be found on a dedicated Bitbucket repository2 . In contrast to many other datasets, we do not focus on source code, since a significant amount of files committed to G NOME’s project repositories do not even contain code (e.g., image files, web pages, documentation, localization and many more). Such type of information is often ignored in MSR research while it is very relevant to understand which types of activities contributors are I. I NTRODUCTION 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium Abstract Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach— e T-graphs—for reporting the results of comparing multiple distributions. We used these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored to which extent projects and contributors specialise in particular activity types. Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the ecosystem. We also observed notable di↵erences between frequent and occasional contributors in terms of the activity types they are involved in and the number of projects they contribute to. Occasional contributors and contributors that are involved in many di↵erent projects tend to be more involved in the localization activity, while frequent contributors tend to be more involved in the coding activity in a limited number of projects. Keywords open source · software ecosystem · metrics · developer community · case study B. Vasilescu and A. Serebrenik MDSE, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, The Nether- 25
  • References Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. ! Springer, ISBN 978-3-642-45398-4 Chapter 10 Studying Evolving Software Ecosystems based on Ecological Models Tom Mens, Ma¨ lick Claes, Philippe Grosjean and Alexander Serebrenik e Research on software evolution is very active, but evolutionary principles, models and theories that properly explain why and how software systems evolve over time are still lacking. Similarly, more empirical research is needed to understand how different software projects co-exist and co-evolve, and how contributors collaborate within their encompassing software ecosystem. In this chapter, we explore the differences and analogies between natural ecosystems and biological evolution on the one hand, and software ecosystems and software evolution on the other hand. The aim is to learn from research in ecology to advance the understanding of evolving software ecosystems. Ultimately, we wish to use such knowledge to derive diagnostic tools aiming to analyse and optimise the fitness of software projects in their environment, and to help software project communities in managing their projects better. February  2014  -­‐  CSMR-­‐WCRE  So6ware  EvoluEon  Week,  Antwerp,  Belgium 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 26
  • Interested  in  joining? • Open  PhD  posiEon  available   • 6  to  12  month  postdoc  visits  welcomed 5  February  2014  —  CSMR-­‐WCRE  So6ware  EvoluEon  Week,Antwerp,  Belgium 27