Co-­‐Evolving	
  Changes	
  in	
  a
Data-­‐Intensive	
  So3ware	
  System
Mathieu	
  Goeminne,	
  Alexandre	
  Decan,	
  T...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Conte...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Resea...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Case	...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Intro...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Intro...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Intro...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Evolu...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Preli...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Futur...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Refer...
A	
  Historical	
  Dataset	
  for	
  the	
  Gnome	
  
Ecosystem
Mathieu	
  Goeminne,	
  Tom	
  Mens	
  
Service	
  de	
  G...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Gnome...
European	
  Open	
  Symposium	
  on	
  Empirical	
  So3ware	
  Engineering	
  —	
  Lille,	
  France,	
  July	
  2014
Refer...
Upcoming SlideShare
Loading in …5
×

Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

560 views
452 views

Published on

Presentation by Mathieu Goeminne of joint empirical research (with Tom Mens, Alexandre Decan, Alexander Serebrenik, Bogdan Vasilescu) on evolving software systems during the EOSESE 2014 European Symposium in Lille, France, 30 June 2014. Part 1 of the presentation focuses on data-intensive software systems; part 2 focuses on the Gnome ecosystem

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
560
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Empirical research results for the evolution of a data-intensive software system and the GNOME ecosystem

  1. 1. Co-­‐Evolving  Changes  in  a Data-­‐Intensive  So3ware  System Mathieu  Goeminne,  Alexandre  Decan,  Tom  Mens   Service  de  Génie  Logiciel,  Université  de  Mons hDp://informaIque.umons.ac.be/genlog/projects/disse EOSESE  2014  European  Open  Symposium  on  Empirical  So6ware  Engineering  -­‐  July  2014
  2. 2. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Context • FNRS  Projet  de  Recherche  “Data-­‐Intensive  So3ware  System  EvoluIon”   – Interuniversity  collaboraIon  with  Anthony  Cleve  and  Loup  Meurice   (University  of  Namur)   • Overall  goal   – Expand  empirical  MSR  research  to  include  database-­‐related  acIviIes     – Analyse  and  support  co-­‐evoluIon  between  program  code  and   database  (schema)  in  data-­‐intensive  so3ware  systems   • Approach   – Develop  generic  framework   – Implement  dedicated  analysis  and  visualisaIon  tools   – Carry  out  empirical  case  studies 2
  3. 3. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Research  QuesIons • Code-­‐centric  focus   – RQ1:  How  does  the  relaIon  between
 how  source  code  files  and
 database-­‐related  files  evolve?   – RQ2:  What’s  the  effect  of  introducing
 a  parIcular  persistency  technology   (Hibernate,  JPA)?   • Social  focus   – RQ3:  How  do  developers  divide  their
 work  and  how  does  this  evolve  over  Ime? 3
  4. 4. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Case  Study 4 characteris/c value duraIon 3,939  days  (  >  129  months) dates from  Nov  2002  Ill  Aug  2013
 number  of  commits 18,727 number  of  disInct  files 20,718  (of  which  54%  code  files) number  of  file  touches 93,721 number  of  disInct  developers 100 Official  repository hDps://github.com/scoophealth/oscar.git A  Java  Electronical  Medical  Record  system
  5. 5. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR
  Code  dimension • 3-­‐Ier  web  applicaIon  wriDen  in  Java  and  JSP 5 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# 2003-07# 2004-01# 2004-07# 2005-01# 2005-07# 2006-01# 2006-07# 2007-01# 2007-07# 2008-01# 2008-07# 2009-01# 2009-07# 2010-01# 2010-07# 2011-01# 2011-07# 2012-01# 2012-07# 2013-01# jsp# java# Monthly  aggregated  proporIon  of  JSP  and  Java  files
  6. 6. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR
 Social  Dimension • Monthly  number  of  disInct  acIve  developers   for  OSCAR     (a3er  idenIty  merging) 6 0" 5" 10" 15" 20" 25" 2003'07" 2004'01" 2004'07" 2005'01" 2005'07" 2006'01" 2006'07" 2007'01" 2007'07" 2008'01" 2008'07" 2009'01" 2009'07" 2010'01" 2010'07" 2011'01" 2011'07" 2012'01" 2012'07" 2013'01"
  7. 7. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR
 Code  dimension • Number  of  source  code  files  w.r.t.
 database-­‐related  files 7 0" 1000" 2000" 3000" 4000" 5000" 6000" 2003)07" 2004)01" 2004)07" 2005)01" 2005)07" 2006)01" 2006)07" 2007)01" 2007)07" 2008)01" 2008)07" 2009)01" 2009)07" 2010)01" 2010)07" 2011)01" 2011)07" 2012)01" 2012)07" 2013)01" pure" sql" sql  =  code  files  containing  embedded  SQL  statements
  8. 8. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR   Social  Dimension • How  does  the  acIvity  of  developers  evolve   over  Ime? 8 Developer monthly  aggregated  number  of  file  touches  
  9. 9. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Introducing  Persistency
 Provider  in  OSCAR • Hibernate   – introduced  in  OSCAR  since  July  2006   – Java  object-­‐relaIonal  mapping  (ORM)  library   • XML  files  map  Java  classes  to  database  tables  and  Java  data  types  to  SQL   data  types   • facilitates  data  query  and  retrieval   • generates  SQL  calls  and  relieves  the  developer  from  manual  result  set   handling  and  object  conversion   • Java  Persistency  Architecture  (JPA)   – introduced  in  OSCAR  since  July  2008   – industry  standard  ORM  persistency  API   – Uses  Java  annotaIons  instead  of  XML  files 9
  10. 10. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Introducing  Persistency  Provider
 Code  dimension SQL  =  code  file  containing  embedded  SQL  query   HIB  =  Java  file  targeted  by  Hibernate  XML  file   JPA  =  Java  file  containing  JPA  annotaIon   pure  =  code  files  not  containing  any  of  these   ! ! ! ! ! ! 10 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# 2003-07-01# 2004-01-01# 2004-07-01# 2005-01-01# 2005-07-01# 2006-01-01# 2006-07-01# 2007-01-01# 2007-07-01# 2008-01-01# 2008-07-01# 2009-01-01# 2009-07-01# 2010-01-01# 2010-07-01# 2011-01-01# 2011-07-01# 2012-01-01# 2012-07-01# 2013-01-01# 2013-07-01# JPA# HIB# SQL# pure# Monthly  aggregated  number  of  acIve  code  files 0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" 2003)07)01" 2003)11)01" 2004)03)01" 2004)07)01" 2004)11)01" 2005)03)01" 2005)07)01" 2005)11)01" 2006)03)01" 2006)07)01" 2006)11)01" 2007)03)01" 2007)07)01" 2007)11)01" 2008)03)01" 2008)07)01" 2008)11)01" 2009)03)01" 2009)07)01" 2009)11)01" 2010)03)01" 2010)07)01" 2010)11)01" 2011)03)01" 2011)07)01" 2011)11)01" 2012)03)01" 2012)07)01" 2012)11)01" 2013)03)01" 2013)07)01" JPA" HIB" SQL"
  11. 11. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Introducing  Persistency  Provider
 Social  dimension • Who  is  involved  in  introducing  changes  in   database-­‐related  code? 11 Developer Bubble  size  =  log(monthly  aggregated  number  of  touched  files)
  12. 12. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR
 Social  Dimension • How  do  developers  divide  their  work? 12 Java$(87)$ JSP$(86)$ OSCAR$developers$(100)$ 3" 2" 5" 1" 6" 9" SQL$(89)$ HIB$ 31" JPA$ 16" 15" 12" Number of OSCAR developers involved in file touches per activity type
  13. 13. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 EvoluIon  of  OSCAR
 Social  Dimension • How  do  developers  divide  their  work? 13 Number of developers that introduce database-related code in some file for the first time Java$(87)$ JSP$(86)$ OSCAR$developers$(100)$ 3" 24" 10" 10" 1" 0" SQL$(53)$ HIB$ 24" JPA$ 9" 8" 11"
  14. 14. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Preliminary  Conclusions RQ1:  Code-­‐related  and  database-­‐related  files  evolve   together  (no  “phased”  co-­‐evoluIon)   ! RQ2:  IntroducIon  of  Hibernate,  then  JPA  which  takes  over   Hibernate,  but  embedded  SQL  sIll  remains  very  important   ! ! RQ3:  No  clear  separaIon  of  acIviIes  between  developers   Majority  of  developers  changes  both  db-­‐related  and  db-­‐ unrelated  code   No  observed  periods  dedicated  to  a  specific  acIvity 14
  15. 15. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Future  Work • Analyse  file  changes  at  finer  granularity     • Study  of  other  data-­‐intensive  so3ware  systems   • Study  the  evoluIon  of  DISS  quality   – Unit  tests  involving  database-­‐related  classes   – Revisited  modularity,  coupling,  cohesion   – Database  inconsistencies   • Study  the  evoluIon  of  social  aspects   – Are  there  disInct  sub-­‐communiIes?   – How  is  the  effort  distributed  in  each  community?  Are  there   different  teamwork  paDerns  in  these  communiIes? 15
  16. 16. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 References • M.  Goeminne,  A.  Decan,  T.  Mens,  Co-­‐evolving  Code-­‐ Related  and  Database-­‐Related  Changes  in  a  Data-­‐Intensive   So6ware  System,  CSMR-­‐WCRE  2014  ERA  track     • L.  Meurice,  A.  Cleve,  DAHLIA:  A  Visual  Analyzer  of   Database  Schema  EvoluKon,  CSMR-­‐WCRE  2014  Tool  Demo   • A.  Cleve,  T.  Mens,  J.-­‐L.  Hainaut,  Data-­‐Intensive  System   EvoluIon,  IEEE  Computer  43(8):  110-­‐112  (2010)   • A.  Cleve,  M.  Gobert,  L.  Meurice,  J.  Maes,  J.  Weber,   Understanding  database  schema  evoluIon:  A  case  study,   Science  of  Computer  Programming  (2013) 16
  17. 17. A  Historical  Dataset  for  the  Gnome   Ecosystem Mathieu  Goeminne,  Tom  Mens   Service  de  Génie  Logiciel,  Université  de  Mons hDp://informaIque.umons.ac.be/genlog/projects/disse EOSESE  2014  European  Open  Symposium  on  Empirical  So6ware  Engineering  -­‐  July  2014
  18. 18. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome  ecosystem   Historical  Dataset • Goal   – Study  the  evoluIon  of  social  aspects  of  the  Gnome   ecosystem  (1,418  projects,  11,094  contributors,   1,315,997  commits).   • Methodology   1. Clone  the  source  code  repository  of  each  Gnome   project   2. Store  its  history  in  a  MySQL  database   3. Add  extra  informaIon  to  facilitate  empirical  studies   18
  19. 19. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome  ecosystem   About  the  dataset 19 FLOSSMetrics  MySQL  datase:   hDps://bitbucket.org/mgoeminne/sgl-­‐flossmetric-­‐dbmerge
  20. 20. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome  ecosystem   Extra  informaIon 20 • IdenIty  merging   • CSVAnalY2  hack   • Semi-­‐automaIc  idenIty  merging  based  on  name  and  e-­‐ mail   • 5,923  /  11,094  contributors  a3er  merging   • AcIvity  types   • Tool  for  associaIng  an  acIvity  type  (coding,  translaIon,   documentaIon,  etc.)  to  each  file.   • Regular  expressions  on  file  extension,  file  name  and  path.
  21. 21. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome:  On  the  variaIon  and   specialisaIon  of  workload 21 Empir Software Eng DOI 10.1007/s10664-013-9244-1 On the variation and specialisation of workload—A case study of the GNOME ecosystem community Bogdan Vasilescu · Alexander Serebrenik · Mathieu Goeminne · Tom Mens © Springer Science+Business Media New York 2013 Abstract Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach— T-graphs—for reporting the results of comparing multiple distributions. We used these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored to which extent projects and contributors specialise in particular activity types. Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the ecosystem. We also observed notable differences between frequent and occasional contributors in terms of the activity types they are involved in and the number of projects they contribute to. Occasional contributors and contributors that are involved in many different projects tend to be more involved in the localization activity, while frequent Communicated by: Margaret-Anne Storey B. Vasilescu · A. Serebrenik MDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven, The Netherlands B. Vasilescu e-mail: b.n.vasilescu@tue.nl A. Serebrenik e-mail: a.serebrenik@tue.nl M. Goeminne · T. Mens (B) COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgium e-mail: tom.mens@umons.ac.be M. Goeminne e-mail: mathieu.goeminne@umons.ac.be RATW  <  14 RATW  >=  14
  22. 22. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome:  On  the  variaIon  and   specialisaIon  of  workload 22 Empir Software Eng DOI 10.1007/s10664-013-9244-1 On the variation and specialisation of workload—A case study of the GNOME ecosystem community Bogdan Vasilescu · Alexander Serebrenik · Mathieu Goeminne · Tom Mens © Springer Science+Business Media New York 2013 Abstract Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach— T-graphs—for reporting the results of comparing multiple distributions. We used these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored to which extent projects and contributors specialise in particular activity types. Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the ecosystem. We also observed notable differences between frequent and occasional contributors in terms of the activity types they are involved in and the number of projects they contribute to. Occasional contributors and contributors that are involved in many different projects tend to be more involved in the localization activity, while frequent Communicated by: Margaret-Anne Storey B. Vasilescu · A. Serebrenik MDSE, Eindhoven University of Technology, PO Box 513, 5600 MB, Eindhoven, The Netherlands B. Vasilescu e-mail: b.n.vasilescu@tue.nl A. Serebrenik e-mail: a.serebrenik@tue.nl M. Goeminne · T. Mens (B) COMPLEXYS Research Institute, Université de Mons, Place du Parc 20, 7000 Mons, Belgium e-mail: tom.mens@umons.ac.be M. Goeminne e-mail: mathieu.goeminne@umons.ac.be
  23. 23. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 Gnome  ecosystem   References • M.  Goeminne,  M.  Claes,  and  T.  Mens.  A  historical   dataset  for  the  Gnome  ecosystem,  MSR  2013,  pp.  225– 228   hDps://bitbucket.org/mgoeminne/sgl-­‐flossmetric-­‐dbmerge   • B.  Vasilescu,  A.  Serebrenik,  M.  Goeminne,  and  T.   Mens.  On  the  variaKon  and  specialisaKon  of  workload —a  case  study  of  the  Gnome  ecosystem  community,   Empirical  So3ware  Engineering,  pp.  955–1008,  2014   hDp://dx.doi.org/10.1007/s10664-­‐013-­‐9244-­‐1   23
  24. 24. European  Open  Symposium  on  Empirical  So3ware  Engineering  —  Lille,  France,  July  2014 References 24 ! ! Evolving Software Systems Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. ! Springer, ISBN 978-3-642-45398-4

×