The dynamics of software evolution                      EVOLUMONS 2011             Research Seminar on Software Evolution ...
(c) 2011 Israel Herraiz                                              This work is licensed under the                      ...
Outline     ●   The laws of software evolution     ●   The nature of software evolution (for libre         software)     ●...
The laws of software evolution                                             4http://www.uax.es http://herraiz.org
My background     ●   Educated as a chemical and mechanical         engineer     ●   Wasted my time in the chemical indust...
How it all started     ●   Godfrey and Tu                 ●   My supervisors and I         [GT00] [GT01]                  ...
The topic background:                        Software evolution     ●   How and why does         software evolve?     ●   ...
The laws in the seventies     ●   Laws of Program Evolution Dynamics (1974)                                               ...
The evolution of the laws of                    software evolution [Leh96] [LRW+97]                                       ...
The laws in the present day                         (I – IV)                                             10http://www.uax....
The laws in the present day                        (V – VIII)                                             11http://www.uax...
Empirical studies of software                        evolution                  See “Empirical Studies of Open Source Evol...
Why the controversy about the laws           of software evolution?     ●   Fernandez-Ramil et al. found empirical        ...
The nature of (libre) software                       evolution                                               14http://www....
The nature of (libre) software                       evolution     ●   The goal is to develop a theoretical model for     ...
More recent models     ●   Self-Organized criticality [Wu06] [WHH07]          ●   Power laws for the size of the system   ...
Determinism and evolution     ●   Self Organized Criticality          ●   This means that current events are influenced by...
Long range correlated processeshttp://www.uax.es http://herraiz.org
Long range correlated processeshttp://www.uax.es http://herraiz.org
Long range correlated processes                                       Unreachablehttp://www.uax.es http://herraiz.org
Short range correlatedhttp://www.uax.es http://herraiz.org
Short range correlatedhttp://www.uax.es http://herraiz.org
Short range correlatedhttp://www.uax.es http://herraiz.org
Short range correlatedhttp://www.uax.es http://herraiz.org
Question                                        or        ?http://www.uax.es http://herraiz.org
Autocorrelation coefficients                                                   ...     1             2              3    4...
r(k)           Autocorrelation coefficients       1       0              1    2    3    4    5    6   7   8   9   10 11 12...
r(k)           Autocorrelation coefficients       1                                                           Long range  ...
r(k)         Autocorrelation coefficients       1                                       Long range                        ...
Empirical study     ●   3,821 software projects               –   More than 3 developers               –   More than 1 yea...
Methodology     ●   Liner correlation to calculate linearity     ●   Distribution of the Pearson coefficients     ●   Smoo...
Resultshttp://www.uax.es http://herraiz.org
Resultshttp://www.uax.es http://herraiz.org
Results                          Long                          memory                          processes              Shor...
Looking at the numbers                    Quantile Commits                    MRs                           0 0.3235      ...
Implications for evolution     ●   Short memory -> Yesterdays weather         http://doi.ieeecomputersociety.org/10.1109/I...
How to forecast software evolution                                            37http://www.uax.es http://herraiz.org
Background     ●   Forecasting traditionally done using very simple         statistical models          ●   Regression    ...
Case studies                                              Training set   Test set                                         ...
Case studies                                       Training set                  Test set                                 ...
Time Series Analysis                      Original                                  Yes                                   ...
Parameters of the modelhttp://www.uax.es http://herraiz.org
Autocorrelation coefficients.                     No smoothinghttp://www.uax.es http://herraiz.org
Autocorrelation coefficients.                    After smoothinghttp://www.uax.es http://herraiz.org
Parameters of all the models     ●   Time series ARIMA model          ●   d=1 q=0                  p = 6, 7 or 9     ●   R...
How does the model look like?                                                                                         ...
How does the model look like?     Predicted / Actual values                             Estimation                        ...
Results     Time series (ARIMA) vs. regression                     ARIMA Regression            FreeBSD 3.93        16.89  ...
Conclusions     ●   Time Series more accurate than Regression         Analysis for macroscopic predictions     ●   Basic m...
More results     ●   Ok, so you predicted last year...which is past...     ●   What about predicting real future?         ...
Why this works?     ●   Isnt it too accurate?     ●   Why do you think this works?http://www.uax.es http://herraiz.org
Whats next?                                                  52http://www.uax.es http://herraiz.org
Further work     ●   Write a paper about the controversy around the         validation of the laws of software evolution  ...
And what I did learn during all these                   years?                                         54http://www.uax.es...
Things I appreciate my advisors did     ●   Freedom of movements     ●   Pressure to get my own funding     ●   Unconditio...
Things that I did not know and I do                         now     ●   Know-how about conferences and journals     ●   En...
Take away            Laws of                                Statistical       Software Evolution                          ...
Upcoming SlideShare
Loading in …5
×

The dynamics of software evolution - EVOLUMONS 2011

2,258 views

Published on

Slides of my talk at EVOLUMONS 2011
http://informatique.umons.ac.be/genlog/EvolMons/EvolMons2011.html

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,258
On SlideShare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The dynamics of software evolution - EVOLUMONS 2011

  1. 1. The dynamics of software evolution EVOLUMONS 2011 Research Seminar on Software Evolution Université de Mons, Belgium January 26th 2011 Israel Herraiz Universidad Alfonso X el Sabio <isra@herraiz.org> <herraiz@uax.es> 1http://www.uax.es http://herraiz.org
  2. 2. (c) 2011 Israel Herraiz This work is licensed under the Creative Commons Attribution-Share Alike 3.0 To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Get the full bibliographic references listed in this slides at http://herraiz.org/stuff/evolumons_references_20110126.txthttp://www.uax.es http://herraiz.org
  3. 3. Outline ● The laws of software evolution ● The nature of software evolution (for libre software) ● How to accurately forecast software evolution. And why it works. ● Whats next? ● And what did I learn during all these years of work? 3http://www.uax.es http://herraiz.org
  4. 4. The laws of software evolution 4http://www.uax.es http://herraiz.org
  5. 5. My background ● Educated as a chemical and mechanical engineer ● Wasted my time in the chemical industry. But I did (and do) love doing software! – http://caflur.sf.net http://gpinch.sf.net ● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group – http://libresoft.es 5http://www.uax.es http://herraiz.org
  6. 6. How it all started ● Godfrey and Tu ● My supervisors and I [GT00] [GT01] wrote a paper on the studied the Linux topic [RAGBH05] kernel ● At the time, I thought ● They say that the it was just one more laws of software paper evolution were not ● It turned out to be our valid for Linux most cited paper – Laws of software evolution. What is ● Completely puzzled that? me 6http://www.uax.es http://herraiz.org
  7. 7. The topic background: Software evolution ● How and why does software evolve? ● Meir M. Lehman Laws of software evolution ● “Program evolution. Processes of software change” published in 1985 7http://www.uax.es http://herraiz.org
  8. 8. The laws in the seventies ● Laws of Program Evolution Dynamics (1974) 8 [Leh74] [Leh85b]http://www.uax.es http://herraiz.org
  9. 9. The evolution of the laws of software evolution [Leh96] [LRW+97] [MFRP06] [Leh78] [Leh80] [Leh85c] [LB85] [Leh74] [Leh85b] 9http://www.uax.es http://herraiz.org
  10. 10. The laws in the present day (I – IV) 10http://www.uax.es http://herraiz.org
  11. 11. The laws in the present day (V – VIII) 11http://www.uax.es http://herraiz.org
  12. 12. Empirical studies of software evolution See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi 12 in Tom Mens, Serge Demeyer (eds.) Software Evolutionhttp://www.uax.es http://herraiz.org
  13. 13. Why the controversy about the laws of software evolution? ● Fernandez-Ramil et al. found empirical validation for the I, VI, VII (partially) and VIII (partially) ● The most interesting part (for me) – Statistical analysis of software projects and their evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b] – “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c] ● 13http://www.uax.es http://herraiz.org
  14. 14. The nature of (libre) software evolution 14http://www.uax.es http://herraiz.org
  15. 15. The nature of (libre) software evolution ● The goal is to develop a theoretical model for software evolution ● Long pursued goal ● Lehman and Belady in 1971 [BL71] [LB85] ● Woodside progressive and anti-regressive work [Woo80] (included in [LB85]) ● Turski models [Tur96] [Tur02] – Growth is inversely proportional to complexity – Complexity is proportional to the square of size 15http://www.uax.es http://herraiz.org
  16. 16. More recent models ● Self-Organized criticality [Wu06] [WHH07] ● Power laws for the size of the system ● Long range correlations in the time series of changes ● Maintenance Guidance Model [CFR07] ● Those functions that have suffered more changes in the past are more likely to be changed in the future ● Assumptions: – Distribution of accumulated changes is asymmetrical – Developers prioritize changes using past number of changes and complexity 16http://www.uax.es http://herraiz.org
  17. 17. Determinism and evolution ● Self Organized Criticality ● This means that current events are influenced by very old events ● Against Lehman suggestions [Leh78] [Leh85c] ● In my opinion, counter intuitive 17http://www.uax.es http://herraiz.org
  18. 18. Long range correlated processeshttp://www.uax.es http://herraiz.org
  19. 19. Long range correlated processeshttp://www.uax.es http://herraiz.org
  20. 20. Long range correlated processes Unreachablehttp://www.uax.es http://herraiz.org
  21. 21. Short range correlatedhttp://www.uax.es http://herraiz.org
  22. 22. Short range correlatedhttp://www.uax.es http://herraiz.org
  23. 23. Short range correlatedhttp://www.uax.es http://herraiz.org
  24. 24. Short range correlatedhttp://www.uax.es http://herraiz.org
  25. 25. Question or ?http://www.uax.es http://herraiz.org
  26. 26. Autocorrelation coefficients ... 1 2 3 4 5 r(1) ... 1 2 3 4 r(2) ... 1 2 3 . . .http://www.uax.es http://herraiz.org
  27. 27. r(k) Autocorrelation coefficients 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 khttp://www.uax.es http://herraiz.org
  28. 28. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~C 1−k  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 khttp://www.uax.es http://herraiz.org
  29. 29. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~ Ai 1−k  Logarithmic scale 0 khttp://www.uax.es http://herraiz.org
  30. 30. Empirical study ● 3,821 software projects – More than 3 developers – More than 1 year of active history – 9,234,104 commits / 2,357,438 modification requests – Projects registered between Nov. 1999 and Dec. 2004 – Datasets publicly available ● See Determinism and evolution – 5th International Working Conference on Mining Software Repositories (MSR 2008) FLOSSMole + CVSAnalY-SFhttp://www.uax.es http://herraiz.org
  31. 31. Methodology ● Liner correlation to calculate linearity ● Distribution of the Pearson coefficients ● Smoothing applied to the series before calculating ACFhttp://www.uax.es http://herraiz.org
  32. 32. Resultshttp://www.uax.es http://herraiz.org
  33. 33. Resultshttp://www.uax.es http://herraiz.org
  34. 34. Results Long memory processes Short memory processeshttp://www.uax.es http://herraiz.org
  35. 35. Looking at the numbers Quantile Commits MRs 0 0.3235 0.2886 20 0.7394 0.7248 40 0.8178 0.8036 60 0.8906 0.8705 80 0.9783 0.9464 100 0.9998 0.9998 Long memory process Short memory process 35http://www.uax.es http://herraiz.org
  36. 36. Implications for evolution ● Short memory -> Yesterdays weather http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788 ● When deciding, current situation should have more influence ● As Lehman said in 1978http://www.uax.es http://herraiz.org
  37. 37. How to forecast software evolution 37http://www.uax.es http://herraiz.org
  38. 38. Background ● Forecasting traditionally done using very simple statistical models ● Regression ● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution ● Lets compare time series analysis against regression models 38http://www.uax.es http://herraiz.org
  39. 39. Case studies Training set Test set 39http://www.uax.es http://herraiz.org
  40. 40. Case studies Training set Test set PostgreSQL FreeBSD NetBSD 1993 1995 1997 1999 2001 2003 2005 2007 Time 40http://www.uax.es http://herraiz.org
  41. 41. Time Series Analysis Original Yes ACF Clear time series PACF pattern? data No Kernel smoothing ARIMA p, d, q Predictions model based on fitting ACF / PACFhttp://www.uax.es http://herraiz.org
  42. 42. Parameters of the modelhttp://www.uax.es http://herraiz.org
  43. 43. Autocorrelation coefficients. No smoothinghttp://www.uax.es http://herraiz.org
  44. 44. Autocorrelation coefficients. After smoothinghttp://www.uax.es http://herraiz.org
  45. 45. Parameters of all the models ● Time series ARIMA model ● d=1 q=0 p = 6, 7 or 9 ● Regression model ● r > 0.99http://www.uax.es http://herraiz.org
  46. 46. How does the model look like?     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 i i B =B =x t−i xt ∇ x t =x t −x t−1=1−B x t d d ∇ x t =1−B x thttp://www.uax.es http://herraiz.org
  47. 47. How does the model look like? Predicted / Actual values Estimation Coefficients Linear component errors     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 Parameters of the model Linear componenthttp://www.uax.es http://herraiz.org
  48. 48. Results Time series (ARIMA) vs. regression ARIMA Regression FreeBSD 3.93 16.89 NetBSD 1.80 15.94 PostgreSQL 1.48 6.86 Mean Squared Relative Errorhttp://www.uax.es http://herraiz.org
  49. 49. Conclusions ● Time Series more accurate than Regression Analysis for macroscopic predictions ● Basic model. More components can be added. ● Seasonality ● Multi-variable, combining different factorshttp://www.uax.es http://herraiz.org
  50. 50. More results ● Ok, so you predicted last year...which is past... ● What about predicting real future? MSR Challenge 2007 winners Goal: predicting the number of changes in Eclipse in the next three months http://dx.doi.org/10.1109/MSR.2007.10http://www.uax.es http://herraiz.org
  51. 51. Why this works? ● Isnt it too accurate? ● Why do you think this works?http://www.uax.es http://herraiz.org
  52. 52. Whats next? 52http://www.uax.es http://herraiz.org
  53. 53. Further work ● Write a paper about the controversy around the validation of the laws of software evolution ● In progress ● Write a paper about the short memory nature of evolution ● Using Time Series Analysis to show it ● And ARIMA as a forecasting tool ● Extracting principles and guidelines for software projects management 53http://www.uax.es http://herraiz.org
  54. 54. And what I did learn during all these years? 54http://www.uax.es http://herraiz.org
  55. 55. Things I appreciate my advisors did ● Freedom of movements ● Pressure to get my own funding ● Unconditional support ● Demanding and challenging environment ● Opportunity to coordinate projects ● And to participate in many meetings alone 55http://www.uax.es http://herraiz.org
  56. 56. Things that I did not know and I do now ● Know-how about conferences and journals ● English skills ● Writing skills (papers and proposals) ● Presentation skills ● Self-motivation – Brick walls are there for the rest of people – Experience is what you get when you dont get what you want – Never give up – http://www.youtube.com/watch?v=ji5_MqicxSo 56http://www.uax.es http://herraiz.org
  57. 57. Take away Laws of Statistical Software Evolution approach Controversy Replicable study Short memory Brick walls are dynamics a good thing ARIMA Keep working. accurate forecast Dont give up 57http://www.uax.es http://herraiz.org

×