BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
The dynamics of software evolution - EVOLUMONS 2011
1. The dynamics of software evolution
EVOLUMONS 2011
Research Seminar on Software Evolution
Université de Mons, Belgium
January 26th 2011
Israel Herraiz
Universidad Alfonso X el Sabio
<isra@herraiz.org>
<herraiz@uax.es>
1
http://www.uax.es http://herraiz.org
2. (c) 2011 Israel Herraiz
This work is licensed under the
Creative Commons Attribution-Share Alike 3.0
To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/
or send a letter to
Creative Commons,
171 Second Street, Suite 300,
San Francisco, California,
94105, USA.
Get the full bibliographic references listed in these slides at
http://herraiz.org/stuff/evolumons_references_20110126.txt
http://www.uax.es http://herraiz.org
3. Outline
● The laws of software evolution
● The nature of software evolution (for libre
software)
● How to accurately forecast software evolution.
And why it works.
● What's next?
● And what did I learn during all these years of
work?
3
http://www.uax.es http://herraiz.org
4. The laws of software evolution
4
http://www.uax.es http://herraiz.org
5. My background
● Educated as a chemical and mechanical
engineer
● Wasted my time in the chemical industry. But I
did (and do) love doing software!
– http://caflur.sf.net http://gpinch.sf.net
● Involved in the open source community since
around 2001, started a PhD in 2004 in the
Libresoft research group
– http://libresoft.es
5
http://www.uax.es http://herraiz.org
6. How it all started
● Godfrey and Tu ● My supervisors and I
[GT00] [GT01] wrote a paper on the
studied the evolution topic [RAGBH05]
of the Linux kernel ● At the time, I thought
● They said that the it was just one more
laws of software paper
evolution were not ● It turned out to be our
valid for Linux most cited paper
– Laws of software
evolution. What is
● Completely puzzled
that? me
6
http://www.uax.es http://herraiz.org
7. The topic background:
Software evolution
● How and why does
software evolve?
● Meir M. Lehman
Laws of software
evolution
● “Program evolution.
Processes of
software change”
published in 1985
7
http://www.uax.es http://herraiz.org
8. The laws in the seventies
● Laws of Program Evolution Dynamics (1974)
8
[Leh74] [Leh85b]
http://www.uax.es http://herraiz.org
9. The evolution of the laws of
software evolution [Leh96] [LRW+97]
[MFRP06]
[Leh78] [Leh80]
[Leh85c] [LB85]
[Leh74]
[Leh85b]
9
http://www.uax.es http://herraiz.org
10. The laws in the present day
(I – IV)
10
http://www.uax.es http://herraiz.org
11. The laws in the present day
(V – VIII)
11
http://www.uax.es http://herraiz.org
12. Empirical studies of software
evolution
See “Empirical Studies of Open Source Evolution” by
Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi 12
in Tom Mens, Serge Demeyer (eds.) Software Evolution
http://www.uax.es http://herraiz.org
13. Why the controversy about the laws
of software evolution?
● Fernandez-Ramil et al. found in the literature
empirical validation for the I, VI, VII (partially)
and VIII (partially)
● The most interesting part (for me)
– Statistical analysis of software projects and their
evolution, using time series analysis among other
techniques (suggested in ¡1974!) [Leh74] [Leh85b]
– “For maximum cost-effectiveness, management
consideration and judgement should include the entire
history of the project with the current state having the
strongest, but not exclusive, influence”
[Leh78] [Leh85c]
●
13
http://www.uax.es http://herraiz.org
14. The nature of (libre) software
evolution
14
http://www.uax.es http://herraiz.org
15. The nature of (libre) software
evolution
● The goal is to develop a theoretical model for
software evolution
● Long pursued goal
● Lehman and Belady in 1971 [BL71] [LB85]
● Woodside progressive and anti-regressive work
[Woo80] (included in [LB85])
● Turski models [Tur96] [Tur02]
– Growth is inversely proportional to complexity
– Complexity is proportional to the square of size
15
http://www.uax.es http://herraiz.org
16. More recent models
● Self-Organized criticality [Wu06] [WHH07]
● Power laws for the size of the system
● Long range correlations in the time series of
changes
● Maintenance Guidance Model [CFR07]
● Those functions that have suffered more changes in
the past are more likely to be changed in the future
● Assumptions:
– Distribution of accumulated changes is asymmetrical
– Developers prioritize changes using past number of
changes and complexity 16
http://www.uax.es http://herraiz.org
17. Determinism and evolution
● Self Organized Criticality
● This means that current events are influenced by
very old events
● Against Lehman suggestions [Leh78] [Leh85c]
● In my opinion, counter intuitive
17
http://www.uax.es http://herraiz.org
28. r(k) Autocorrelation coefficients
1
Long range
correlated
r k ~k 2d−1
0d 0.5
Short range
correlated
(ARIMA process)
r k ~C 1−k
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
k
http://www.uax.es http://herraiz.org
29. r(k) Autocorrelation coefficients
1
Long range
correlated
r k ~k 2d−1
0d 0.5
Short range
correlated
(ARIMA process)
r k ~ Ai 1−k Logarithmic
scale
0
k
http://www.uax.es http://herraiz.org
30. Empirical study
● 3,821 software projects
– More than 3 developers
– More than 1 year of active history
– 9,234,104 commits / 2,357,438 modification requests
– Projects registered between Nov. 1999 and Dec. 2004
– Datasets publicly available
● See Determinism and evolution
– 5th International Working Conference on
Mining Software Repositories (MSR 2008)
FLOSSMole
+
CVSAnalY-SF
http://www.uax.es http://herraiz.org
31. Methodology
● Liner correlation to calculate linearity
● Distribution of the Pearson coefficients
● Smoothing applied to the series before
calculating ACF
http://www.uax.es http://herraiz.org
34. Results
Long
memory
processes Short
memory
processes
http://www.uax.es http://herraiz.org
35. Looking at the numbers
Quantile Commits MRs
0 0.3235 0.2886
20 0.7394 0.7248
40 0.8178 0.8036
60 0.8906 0.8705
80 0.9783 0.9464
100 0.9998 0.9998
Long memory process
Short memory process
35
http://www.uax.es http://herraiz.org
36. Implications for evolution
● Short memory -> Yesterday's weather
http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788
● When deciding, current situation should have
more influence
● As Lehman said in 1978
http://www.uax.es http://herraiz.org
37. How to forecast software evolution
37
http://www.uax.es http://herraiz.org
38. Background
● Forecasting traditionally done using very simple
statistical models
● Regression
● Lehman suggested in 1974 that Time Series
Analysis was the best approach to study
software evolution
● Let's compare time series analysis against
regression models
38
http://www.uax.es http://herraiz.org
39. Case studies
Training set Test set
PostgreSQL
FreeBSD
NetBSD
1993 1995 1997 1999 2001 2003 2005 2007
Time
39
http://www.uax.es http://herraiz.org
40. Case studies
Training set Test set
40
http://www.uax.es http://herraiz.org
41. Time Series Analysis
Original Yes
ACF Clear
time series
PACF pattern?
data
No
Kernel
smoothing
ARIMA p, d, q
Predictions model based on
fitting ACF / PACF
http://www.uax.es http://herraiz.org
45. Parameters of all the models
● Time series ARIMA model
● d=1 q=0 p = 6, 7 or 9
● Regression model
● r > 0.99
http://www.uax.es http://herraiz.org
46. How does the model look like?
q p
d j i
∇ x t 1−∑ j B =t 1−∑ i B
j=1 i=1
i i
B =B =x t−i
xt
∇ x t =x t −x t−1=1−B x t
d d
∇ x t =1−B x t
http://www.uax.es http://herraiz.org
47. How does the model look like?
Predicted / Actual values Estimation
Coefficients Linear component
errors
q p
d j i
∇ x t 1−∑ j B =t 1−∑ i B
j=1 i=1
Parameters of
the model Linear component
http://www.uax.es http://herraiz.org
48. Results
Time series (ARIMA) vs. regression
ARIMA Regression
FreeBSD 3.93 16.89
NetBSD 1.80 15.94
PostgreSQL 1.48 6.86
Mean Squared Relative Error
http://www.uax.es http://herraiz.org
49. Conclusions
● Time Series more accurate than Regression
Analysis for macroscopic predictions
● Basic model. More components can be added.
● Seasonality
● Multi-variable, combining different factors
http://www.uax.es http://herraiz.org
50. More results
● Ok, so you predicted last year...which is past...
● What about predicting real future?
MSR Challenge 2007 winners
Goal:
predicting the number of changes
in Eclipse in the next three months
http://dx.doi.org/10.1109/MSR.2007.10
http://www.uax.es http://herraiz.org
51. Why this works?
● Isn't it too accurate?
● Why do you think this works?
http://www.uax.es http://herraiz.org
53. Further work
● Write a paper about the controversy around the
validation of the laws of software evolution
● In progress
● Write a paper about the short memory nature of
evolution
● Using Time Series Analysis to show it
● And ARIMA as a forecasting tool
● Extracting principles and guidelines for software
projects management
53
http://www.uax.es http://herraiz.org
54. And what I did learn during all these
years?
54
http://www.uax.es http://herraiz.org
55. Things I appreciate my advisors did
● Freedom of movements
● Pressure to get my own funding
● Unconditional support
● Demanding and challenging environment
● Opportunity to coordinate projects
● And to participate in many meetings alone
55
http://www.uax.es http://herraiz.org
56. Things that I did not know and I do
now
● Know-how about conferences and journals
● English skills
● Writing skills (papers and proposals)
● Presentation skills
● Self-motivation
– Brick walls are there for the rest of people
– Experience is what you get when you don't get what
you want
– Never give up
– http://www.youtube.com/watch?v=ji5_MqicxSo 56
http://www.uax.es http://herraiz.org
57. Take away
Laws of Statistical
Software Evolution approach
Controversy Replicable study
Short memory Brick walls are
dynamics a good thing
ARIMA Keep working.
accurate forecast Don't give up
57
http://www.uax.es http://herraiz.org