Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The dynamics of software evolution - EVOLUMONS 2011

2,417 views

Published on

Slides of my talk at EVOLUMONS 2011
http://informatique.umons.ac.be/genlog/EvolMons/EvolMons2011.html

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

The dynamics of software evolution - EVOLUMONS 2011

  1. 1. The dynamics of software evolution EVOLUMONS 2011 Research Seminar on Software Evolution Université de Mons, Belgium January 26th 2011 Israel Herraiz Universidad Alfonso X el Sabio <isra@herraiz.org> <herraiz@uax.es> 1http://www.uax.es http://herraiz.org
  2. 2. (c) 2011 Israel Herraiz This work is licensed under the Creative Commons Attribution-Share Alike 3.0 To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Get the full bibliographic references listed in this slides at http://herraiz.org/stuff/evolumons_references_20110126.txthttp://www.uax.es http://herraiz.org
  3. 3. Outline ● The laws of software evolution ● The nature of software evolution (for libre software) ● How to accurately forecast software evolution. And why it works. ● Whats next? ● And what did I learn during all these years of work? 3http://www.uax.es http://herraiz.org
  4. 4. The laws of software evolution 4http://www.uax.es http://herraiz.org
  5. 5. My background ● Educated as a chemical and mechanical engineer ● Wasted my time in the chemical industry. But I did (and do) love doing software! – http://caflur.sf.net http://gpinch.sf.net ● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group – http://libresoft.es 5http://www.uax.es http://herraiz.org
  6. 6. How it all started ● Godfrey and Tu ● My supervisors and I [GT00] [GT01] wrote a paper on the studied the Linux topic [RAGBH05] kernel ● At the time, I thought ● They say that the it was just one more laws of software paper evolution were not ● It turned out to be our valid for Linux most cited paper – Laws of software evolution. What is ● Completely puzzled that? me 6http://www.uax.es http://herraiz.org
  7. 7. The topic background: Software evolution ● How and why does software evolve? ● Meir M. Lehman Laws of software evolution ● “Program evolution. Processes of software change” published in 1985 7http://www.uax.es http://herraiz.org
  8. 8. The laws in the seventies ● Laws of Program Evolution Dynamics (1974) 8 [Leh74] [Leh85b]http://www.uax.es http://herraiz.org
  9. 9. The evolution of the laws of software evolution [Leh96] [LRW+97] [MFRP06] [Leh78] [Leh80] [Leh85c] [LB85] [Leh74] [Leh85b] 9http://www.uax.es http://herraiz.org
  10. 10. The laws in the present day (I – IV) 10http://www.uax.es http://herraiz.org
  11. 11. The laws in the present day (V – VIII) 11http://www.uax.es http://herraiz.org
  12. 12. Empirical studies of software evolution See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi 12 in Tom Mens, Serge Demeyer (eds.) Software Evolutionhttp://www.uax.es http://herraiz.org
  13. 13. Why the controversy about the laws of software evolution? ● Fernandez-Ramil et al. found empirical validation for the I, VI, VII (partially) and VIII (partially) ● The most interesting part (for me) – Statistical analysis of software projects and their evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b] – “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c] ● 13http://www.uax.es http://herraiz.org
  14. 14. The nature of (libre) software evolution 14http://www.uax.es http://herraiz.org
  15. 15. The nature of (libre) software evolution ● The goal is to develop a theoretical model for software evolution ● Long pursued goal ● Lehman and Belady in 1971 [BL71] [LB85] ● Woodside progressive and anti-regressive work [Woo80] (included in [LB85]) ● Turski models [Tur96] [Tur02] – Growth is inversely proportional to complexity – Complexity is proportional to the square of size 15http://www.uax.es http://herraiz.org
  16. 16. More recent models ● Self-Organized criticality [Wu06] [WHH07] ● Power laws for the size of the system ● Long range correlations in the time series of changes ● Maintenance Guidance Model [CFR07] ● Those functions that have suffered more changes in the past are more likely to be changed in the future ● Assumptions: – Distribution of accumulated changes is asymmetrical – Developers prioritize changes using past number of changes and complexity 16http://www.uax.es http://herraiz.org
  17. 17. Determinism and evolution ● Self Organized Criticality ● This means that current events are influenced by very old events ● Against Lehman suggestions [Leh78] [Leh85c] ● In my opinion, counter intuitive 17http://www.uax.es http://herraiz.org
  18. 18. Long range correlated processeshttp://www.uax.es http://herraiz.org
  19. 19. Long range correlated processeshttp://www.uax.es http://herraiz.org
  20. 20. Long range correlated processes Unreachablehttp://www.uax.es http://herraiz.org
  21. 21. Short range correlatedhttp://www.uax.es http://herraiz.org
  22. 22. Short range correlatedhttp://www.uax.es http://herraiz.org
  23. 23. Short range correlatedhttp://www.uax.es http://herraiz.org
  24. 24. Short range correlatedhttp://www.uax.es http://herraiz.org
  25. 25. Question or ?http://www.uax.es http://herraiz.org
  26. 26. Autocorrelation coefficients ... 1 2 3 4 5 r(1) ... 1 2 3 4 r(2) ... 1 2 3 . . .http://www.uax.es http://herraiz.org
  27. 27. r(k) Autocorrelation coefficients 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 khttp://www.uax.es http://herraiz.org
  28. 28. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~C 1−k  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 khttp://www.uax.es http://herraiz.org
  29. 29. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~ Ai 1−k  Logarithmic scale 0 khttp://www.uax.es http://herraiz.org
  30. 30. Empirical study ● 3,821 software projects – More than 3 developers – More than 1 year of active history – 9,234,104 commits / 2,357,438 modification requests – Projects registered between Nov. 1999 and Dec. 2004 – Datasets publicly available ● See Determinism and evolution – 5th International Working Conference on Mining Software Repositories (MSR 2008) FLOSSMole + CVSAnalY-SFhttp://www.uax.es http://herraiz.org
  31. 31. Methodology ● Liner correlation to calculate linearity ● Distribution of the Pearson coefficients ● Smoothing applied to the series before calculating ACFhttp://www.uax.es http://herraiz.org
  32. 32. Resultshttp://www.uax.es http://herraiz.org
  33. 33. Resultshttp://www.uax.es http://herraiz.org
  34. 34. Results Long memory processes Short memory processeshttp://www.uax.es http://herraiz.org
  35. 35. Looking at the numbers Quantile Commits MRs 0 0.3235 0.2886 20 0.7394 0.7248 40 0.8178 0.8036 60 0.8906 0.8705 80 0.9783 0.9464 100 0.9998 0.9998 Long memory process Short memory process 35http://www.uax.es http://herraiz.org
  36. 36. Implications for evolution ● Short memory -> Yesterdays weather http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788 ● When deciding, current situation should have more influence ● As Lehman said in 1978http://www.uax.es http://herraiz.org
  37. 37. How to forecast software evolution 37http://www.uax.es http://herraiz.org
  38. 38. Background ● Forecasting traditionally done using very simple statistical models ● Regression ● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution ● Lets compare time series analysis against regression models 38http://www.uax.es http://herraiz.org
  39. 39. Case studies Training set Test set 39http://www.uax.es http://herraiz.org
  40. 40. Case studies Training set Test set PostgreSQL FreeBSD NetBSD 1993 1995 1997 1999 2001 2003 2005 2007 Time 40http://www.uax.es http://herraiz.org
  41. 41. Time Series Analysis Original Yes ACF Clear time series PACF pattern? data No Kernel smoothing ARIMA p, d, q Predictions model based on fitting ACF / PACFhttp://www.uax.es http://herraiz.org
  42. 42. Parameters of the modelhttp://www.uax.es http://herraiz.org
  43. 43. Autocorrelation coefficients. No smoothinghttp://www.uax.es http://herraiz.org
  44. 44. Autocorrelation coefficients. After smoothinghttp://www.uax.es http://herraiz.org
  45. 45. Parameters of all the models ● Time series ARIMA model ● d=1 q=0 p = 6, 7 or 9 ● Regression model ● r > 0.99http://www.uax.es http://herraiz.org
  46. 46. How does the model look like?     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 i i B =B =x t−i xt ∇ x t =x t −x t−1=1−B x t d d ∇ x t =1−B x thttp://www.uax.es http://herraiz.org
  47. 47. How does the model look like? Predicted / Actual values Estimation Coefficients Linear component errors     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 Parameters of the model Linear componenthttp://www.uax.es http://herraiz.org
  48. 48. Results Time series (ARIMA) vs. regression ARIMA Regression FreeBSD 3.93 16.89 NetBSD 1.80 15.94 PostgreSQL 1.48 6.86 Mean Squared Relative Errorhttp://www.uax.es http://herraiz.org
  49. 49. Conclusions ● Time Series more accurate than Regression Analysis for macroscopic predictions ● Basic model. More components can be added. ● Seasonality ● Multi-variable, combining different factorshttp://www.uax.es http://herraiz.org
  50. 50. More results ● Ok, so you predicted last year...which is past... ● What about predicting real future? MSR Challenge 2007 winners Goal: predicting the number of changes in Eclipse in the next three months http://dx.doi.org/10.1109/MSR.2007.10http://www.uax.es http://herraiz.org
  51. 51. Why this works? ● Isnt it too accurate? ● Why do you think this works?http://www.uax.es http://herraiz.org
  52. 52. Whats next? 52http://www.uax.es http://herraiz.org
  53. 53. Further work ● Write a paper about the controversy around the validation of the laws of software evolution ● In progress ● Write a paper about the short memory nature of evolution ● Using Time Series Analysis to show it ● And ARIMA as a forecasting tool ● Extracting principles and guidelines for software projects management 53http://www.uax.es http://herraiz.org
  54. 54. And what I did learn during all these years? 54http://www.uax.es http://herraiz.org
  55. 55. Things I appreciate my advisors did ● Freedom of movements ● Pressure to get my own funding ● Unconditional support ● Demanding and challenging environment ● Opportunity to coordinate projects ● And to participate in many meetings alone 55http://www.uax.es http://herraiz.org
  56. 56. Things that I did not know and I do now ● Know-how about conferences and journals ● English skills ● Writing skills (papers and proposals) ● Presentation skills ● Self-motivation – Brick walls are there for the rest of people – Experience is what you get when you dont get what you want – Never give up – http://www.youtube.com/watch?v=ji5_MqicxSo 56http://www.uax.es http://herraiz.org
  57. 57. Take away Laws of Statistical Software Evolution approach Controversy Replicable study Short memory Brick walls are dynamics a good thing ARIMA Keep working. accurate forecast Dont give up 57http://www.uax.es http://herraiz.org

×