SlideShare a Scribd company logo
1 of 57
Download to read offline
The dynamics of software evolution
                      EVOLUMONS 2011
             Research Seminar on Software Evolution

                           Université de Mons, Belgium
                               January 26th 2011

                                   Israel Herraiz
                           Universidad Alfonso X el Sabio
                                <isra@herraiz.org>
                                 <herraiz@uax.es>

                                                            1

http://www.uax.es http://herraiz.org
(c) 2011 Israel Herraiz
                                              This work is licensed under the
                                  Creative Commons Attribution-Share Alike 3.0

                                              To view a copy of this license, visit
                                 http://creativecommons.org/licenses/by-sa/3.0/

                                                                          or send a letter to

                                                                 Creative Commons,
                                                        171 Second Street, Suite 300,
                                                            San Francisco, California,
                                                                         94105, USA.

                                  Get the full bibliographic references listed in these slides at
                                  http://herraiz.org/stuff/evolumons_references_20110126.txt


http://www.uax.es http://herraiz.org
Outline
     ●   The laws of software evolution
     ●   The nature of software evolution (for libre
         software)
     ●   How to accurately forecast software evolution.
         And why it works.
     ●   What's next?
     ●   And what did I learn during all these years of
         work?

                                                          3

http://www.uax.es http://herraiz.org
The laws of software evolution




                                             4

http://www.uax.es http://herraiz.org
My background
     ●   Educated as a chemical and mechanical
         engineer
     ●   Wasted my time in the chemical industry. But I
         did (and do) love doing software!
               –   http://caflur.sf.net http://gpinch.sf.net
     ●   Involved in the open source community since
         around 2001, started a PhD in 2004 in the
         Libresoft research group
               –   http://libresoft.es

                                                               5

http://www.uax.es http://herraiz.org
How it all started
     ●   Godfrey and Tu                 ●   My supervisors and I
         [GT00] [GT01]                      wrote a paper on the
         studied the evolution              topic [RAGBH05]
         of the Linux kernel            ●   At the time, I thought
     ●   They said that the                 it was just one more
         laws of software                   paper
         evolution were not             ●   It turned out to be our
         valid for Linux                    most cited paper
               –   Laws of software
                   evolution. What is
                                            ●   Completely puzzled
                   that?                        me
                                                                      6

http://www.uax.es http://herraiz.org
The topic background:
                        Software evolution
     ●   How and why does
         software evolve?
     ●   Meir M. Lehman
         Laws of software
         evolution
     ●   “Program evolution.
         Processes of
         software change”
         published in 1985

                                               7

http://www.uax.es http://herraiz.org
The laws in the seventies
     ●   Laws of Program Evolution Dynamics (1974)




                                                           8
                                        [Leh74] [Leh85b]
http://www.uax.es http://herraiz.org
The evolution of the laws of
                    software evolution [Leh96] [LRW+97]
                                          [MFRP06]

                               [Leh78]    [Leh80]
                               [Leh85c]   [LB85]
    [Leh74]
    [Leh85b]




                                                    9

http://www.uax.es http://herraiz.org
The laws in the present day
                         (I – IV)




                                             10

http://www.uax.es http://herraiz.org
The laws in the present day
                        (V – VIII)




                                             11

http://www.uax.es http://herraiz.org
Empirical studies of software
                        evolution




                  See “Empirical Studies of Open Source Evolution” by
                      Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi   12
                      in Tom Mens, Serge Demeyer (eds.) Software Evolution

http://www.uax.es http://herraiz.org
Why the controversy about the laws
           of software evolution?
     ●   Fernandez-Ramil et al. found in the literature
         empirical validation for the I, VI, VII (partially)
         and VIII (partially)
     ●   The most interesting part (for me)
               –   Statistical analysis of software projects and their
                   evolution, using time series analysis among other
                   techniques (suggested in ¡1974!) [Leh74] [Leh85b]
               –   “For maximum cost-effectiveness, management
                   consideration and judgement should include the entire
                   history of the project with the current state having the
                   strongest, but not exclusive, influence”
                   [Leh78] [Leh85c]
          ●
                                                                              13

http://www.uax.es http://herraiz.org
The nature of (libre) software
                       evolution




                                               14

http://www.uax.es http://herraiz.org
The nature of (libre) software
                       evolution
     ●   The goal is to develop a theoretical model for
         software evolution
     ●   Long pursued goal
          ●   Lehman and Belady in 1971 [BL71] [LB85]
          ●   Woodside progressive and anti-regressive work
              [Woo80] (included in [LB85])
          ●   Turski models [Tur96] [Tur02]
               –   Growth is inversely proportional to complexity
               –   Complexity is proportional to the square of size

                                                                      15

http://www.uax.es http://herraiz.org
More recent models
     ●   Self-Organized criticality [Wu06] [WHH07]
          ●   Power laws for the size of the system
          ●   Long range correlations in the time series of
              changes
     ●   Maintenance Guidance Model [CFR07]
          ●   Those functions that have suffered more changes in
              the past are more likely to be changed in the future
          ●   Assumptions:
               –   Distribution of accumulated changes is asymmetrical
               –   Developers prioritize changes using past number of
                   changes and complexity                                16

http://www.uax.es http://herraiz.org
Determinism and evolution
     ●   Self Organized Criticality
          ●   This means that current events are influenced by
              very old events
          ●   Against Lehman suggestions [Leh78] [Leh85c]
     ●   In my opinion, counter intuitive




                                                                 17

http://www.uax.es http://herraiz.org
Long range correlated processes




http://www.uax.es http://herraiz.org
Long range correlated processes




http://www.uax.es http://herraiz.org
Long range correlated processes




                                       Unreachable
http://www.uax.es http://herraiz.org
Short range correlated




http://www.uax.es http://herraiz.org
Short range correlated




http://www.uax.es http://herraiz.org
Short range correlated




http://www.uax.es http://herraiz.org
Short range correlated




http://www.uax.es http://herraiz.org
How is software evolution?




                                       or     ?




http://www.uax.es http://herraiz.org
Autocorrelation coefficients

                                                   ...
     1             2              3    4   5

                                                         r(1)
                                                   ...
                   1              2    3   4                    r(2)



                                                   ...
                                  1    2   3

                                               .
                                               .
                                               .




http://www.uax.es http://herraiz.org
r(k)           Autocorrelation coefficients
       1




       0

              1    2    3    4    5    6   7   8   9   10 11 12 13   14 15
                                                                             k

http://www.uax.es http://herraiz.org
r(k)           Autocorrelation coefficients
       1
                                                           Long range
                                                           correlated
                                                            r k ~k 2d−1
                                                            0d 0.5



              Short range
               correlated
            (ARIMA process)
              r k ~C 1−k 
       0

              1    2    3    4    5    6   7   8   9   10 11 12 13   14 15
                                                                             k

http://www.uax.es http://herraiz.org
r(k)         Autocorrelation coefficients
       1
                                       Long range
                                       correlated
                                        r k ~k 2d−1
                                         0d 0.5

                 Short range
                  correlated
               (ARIMA process)
                 r k ~ Ai 1−k          Logarithmic
                                               scale

       0


                                                        k

http://www.uax.es http://herraiz.org
Empirical study
     ●   3,821 software projects
               –   More than 3 developers
               –   More than 1 year of active history
               –   9,234,104 commits / 2,357,438 modification requests
               –   Projects registered between Nov. 1999 and Dec. 2004
               –   Datasets publicly available
     ●   See Determinism and evolution
               –   5th International Working Conference on
                   Mining Software Repositories (MSR 2008)
                                                                FLOSSMole
                                                                    +
                                                               CVSAnalY-SF


http://www.uax.es http://herraiz.org
Methodology
     ●   Liner correlation to calculate linearity
     ●   Distribution of the Pearson coefficients
     ●   Smoothing applied to the series before
         calculating ACF




http://www.uax.es http://herraiz.org
Results




http://www.uax.es http://herraiz.org
Results




http://www.uax.es http://herraiz.org
Results



                          Long
                          memory
                          processes              Short
                                                 memory
                                                 processes




http://www.uax.es http://herraiz.org
Looking at the numbers
                    Quantile Commits                    MRs
                           0 0.3235                     0.2886
                          20 0.7394                     0.7248
                          40 0.8178                     0.8036
                          60 0.8906                     0.8705
                          80 0.9783                     0.9464
                        100 0.9998                      0.9998

                                       Long memory process


                                       Short memory process
                                                                 35

http://www.uax.es http://herraiz.org
Implications for evolution
     ●   Short memory -> Yesterday's weather
         http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788
     ●   When deciding, current situation should have
         more influence
          ●   As Lehman said in 1978




http://www.uax.es http://herraiz.org
How to forecast software evolution




                                            37

http://www.uax.es http://herraiz.org
Background
     ●   Forecasting traditionally done using very simple
         statistical models
          ●   Regression
     ●   Lehman suggested in 1974 that Time Series
         Analysis was the best approach to study
         software evolution
     ●   Let's compare time series analysis against
         regression models


                                                        38

http://www.uax.es http://herraiz.org
Case studies

                                       Training set                  Test set



                                                                                PostgreSQL


                                                                                FreeBSD


                                                                                NetBSD

          1993      1995      1997     1999     2001   2003   2005     2007
                                              Time



                                                                                          39

http://www.uax.es http://herraiz.org
Case studies




                                              Training set   Test set




                                                                        40

http://www.uax.es http://herraiz.org
Time Series Analysis
                      Original                                  Yes
                                       ACF           Clear
                     time series
                                       PACF         pattern?
                        data


                                                          No


                                                     Kernel
                                                    smoothing




                                         ARIMA              p, d, q
                        Predictions       model            based on
                                          fitting         ACF / PACF



http://www.uax.es http://herraiz.org
Parameters of the model




http://www.uax.es http://herraiz.org
Autocorrelation coefficients.
                     No smoothing




http://www.uax.es http://herraiz.org
Autocorrelation coefficients.
                    After smoothing




http://www.uax.es http://herraiz.org
Parameters of all the models
     ●   Time series ARIMA model
          ●   d=1 q=0                  p = 6, 7 or 9
     ●   Regression model
          ●   r > 0.99




http://www.uax.es http://herraiz.org
How does the model look like?


                                                                  
                                 q                         p
             d                                j                  i
        ∇ x t 1−∑  j B =t 1−∑  i B
                               j=1                         i=1

                                          i       i
                                         B =B =x t−i
                                                  xt

                         ∇ x t =x t −x t−1=1−B x t
                                     d                 d
                               ∇ x t =1−B x t

http://www.uax.es http://herraiz.org
How does the model look like?

     Predicted / Actual values                             Estimation
                                       Coefficients                         Linear component
                                                             errors




                                                                                    
                                 q                                      p
             d                               j                                     i
        ∇ x t 1−∑  j B =t 1−∑  i B
                               j=1                                  i=1

      Parameters of
        the model                                Linear component




http://www.uax.es http://herraiz.org
Results
     Time series (ARIMA) vs. regression

                     ARIMA Regression
            FreeBSD 3.93        16.89
             NetBSD   1.80      15.94
           PostgreSQL 1.48       6.86

                        Mean Squared Relative Error




http://www.uax.es http://herraiz.org
Conclusions
     ●   Time Series more accurate than Regression
         Analysis for macroscopic predictions
     ●   Basic model. More components can be added.
     ●   Seasonality
     ●   Multi-variable, combining different factors




http://www.uax.es http://herraiz.org
More results
     ●   Ok, so you predicted last year...which is past...
     ●   What about predicting real future?
                          MSR Challenge 2007 winners

                          Goal:
                          predicting the number of changes
                          in Eclipse in the next three months
                          http://dx.doi.org/10.1109/MSR.2007.10




http://www.uax.es http://herraiz.org
Why this works?
     ●   Isn't it too accurate?
     ●   Why do you think this works?




http://www.uax.es http://herraiz.org
What's next?




                                                  52

http://www.uax.es http://herraiz.org
Further work
     ●   Write a paper about the controversy around the
         validation of the laws of software evolution
          ●   In progress
     ●   Write a paper about the short memory nature of
         evolution
          ●   Using Time Series Analysis to show it
          ●   And ARIMA as a forecasting tool
          ●   Extracting principles and guidelines for software
              projects management

                                                                  53

http://www.uax.es http://herraiz.org
And what I did learn during all these
                   years?




                                         54

http://www.uax.es http://herraiz.org
Things I appreciate my advisors did
     ●   Freedom of movements
     ●   Pressure to get my own funding
     ●   Unconditional support
     ●   Demanding and challenging environment
     ●   Opportunity to coordinate projects
     ●   And to participate in many meetings alone



                                                     55

http://www.uax.es http://herraiz.org
Things that I did not know and I do
                         now
     ●   Know-how about conferences and journals
     ●   English skills
     ●   Writing skills (papers and proposals)
     ●   Presentation skills
     ●   Self-motivation
               –   Brick walls are there for the rest of people
               –   Experience is what you get when you don't get what
                   you want
               –   Never give up
               –   http://www.youtube.com/watch?v=ji5_MqicxSo           56

http://www.uax.es http://herraiz.org
Take away
            Laws of                                Statistical
       Software Evolution                          approach

              Controversy                     Replicable study

            Short memory                      Brick walls are
              dynamics                         a good thing

             ARIMA                             Keep working.
        accurate forecast                      Don't give up
                                                                 57

http://www.uax.es http://herraiz.org

More Related Content

Similar to The dynamics of software evolution - EVOLUMONS 2011

Open Source, Sourceforge Projects, & Apache Foundation
Open Source, Sourceforge Projects, & Apache FoundationOpen Source, Sourceforge Projects, & Apache Foundation
Open Source, Sourceforge Projects, & Apache FoundationMohammad Kotb
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Artefactual Systems - AtoM
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)dmgerman
 
Prospero: A Web-based Document Delivery System
Prospero: A Web-based Document Delivery SystemProspero: A Web-based Document Delivery System
Prospero: A Web-based Document Delivery SystemEric Schnell
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaFriprogsenteret
 
XOOPS 2.5.x Installation Guide
XOOPS 2.5.x Installation GuideXOOPS 2.5.x Installation Guide
XOOPS 2.5.x Installation Guidexoopsproject
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopArtefactual Systems - AtoM
 
Overview of oss(open source software library) and its pros and cons
Overview of oss(open source software library) and its pros and consOverview of oss(open source software library) and its pros and cons
Overview of oss(open source software library) and its pros and consYuga Priya Satheesh
 
Community catalysts value of open source
Community catalysts   value of open sourceCommunity catalysts   value of open source
Community catalysts value of open sourceDave Neary
 
Open source software
Open source softwareOpen source software
Open source softwarejaimeacurry
 
Opensource Development
Opensource DevelopmentOpensource Development
Opensource Developmentpetr_havel
 
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...WSO2
 
Day 2-presentation
Day 2-presentationDay 2-presentation
Day 2-presentationDeb Forsten
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...cresco
 
open source technology
open source technologyopen source technology
open source technologyLila Ram Yadav
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareRoss Gardler
 

Similar to The dynamics of software evolution - EVOLUMONS 2011 (20)

Open Source, Sourceforge Projects, & Apache Foundation
Open Source, Sourceforge Projects, & Apache FoundationOpen Source, Sourceforge Projects, & Apache Foundation
Open Source, Sourceforge Projects, & Apache Foundation
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
 
W3 c semantic web activity
W3 c semantic web activityW3 c semantic web activity
W3 c semantic web activity
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)
 
Prospero: A Web-based Document Delivery System
Prospero: A Web-based Document Delivery SystemProspero: A Web-based Document Delivery System
Prospero: A Web-based Document Delivery System
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
XOOPS 2.5.x Installation Guide
XOOPS 2.5.x Installation GuideXOOPS 2.5.x Installation Guide
XOOPS 2.5.x Installation Guide
 
Workshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and ArchivematicaWorkshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and Archivematica
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshop
 
Overview of oss(open source software library) and its pros and cons
Overview of oss(open source software library) and its pros and consOverview of oss(open source software library) and its pros and cons
Overview of oss(open source software library) and its pros and cons
 
Community catalysts value of open source
Community catalysts   value of open sourceCommunity catalysts   value of open source
Community catalysts value of open source
 
Open source software
Open source softwareOpen source software
Open source software
 
Artefactual and Open Source Development
Artefactual and Open Source DevelopmentArtefactual and Open Source Development
Artefactual and Open Source Development
 
Opensource Development
Opensource DevelopmentOpensource Development
Opensource Development
 
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...
[Workshop] Building an Integration Agile Digital Enterprise with Open Source ...
 
Day 2-presentation
Day 2-presentationDay 2-presentation
Day 2-presentation
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
 
Nonsoftwareoss
NonsoftwareossNonsoftwareoss
Nonsoftwareoss
 
open source technology
open source technologyopen source technology
open source technology
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source Software
 

More from Israel Herraiz

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolutionIsrael Herraiz
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyIsrael Herraiz
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of MetricsIsrael Herraiz
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPMIsrael Herraiz
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian DatabaseIsrael Herraiz
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costIsrael Herraiz
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptographyIsrael Herraiz
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 

More from Israel Herraiz (9)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of Metrics
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
The Ultimate Debian Database
The Ultimate Debian DatabaseThe Ultimate Debian Database
The Ultimate Debian Database
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

The dynamics of software evolution - EVOLUMONS 2011

  • 1. The dynamics of software evolution EVOLUMONS 2011 Research Seminar on Software Evolution Université de Mons, Belgium January 26th 2011 Israel Herraiz Universidad Alfonso X el Sabio <isra@herraiz.org> <herraiz@uax.es> 1 http://www.uax.es http://herraiz.org
  • 2. (c) 2011 Israel Herraiz This work is licensed under the Creative Commons Attribution-Share Alike 3.0 To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Get the full bibliographic references listed in these slides at http://herraiz.org/stuff/evolumons_references_20110126.txt http://www.uax.es http://herraiz.org
  • 3. Outline ● The laws of software evolution ● The nature of software evolution (for libre software) ● How to accurately forecast software evolution. And why it works. ● What's next? ● And what did I learn during all these years of work? 3 http://www.uax.es http://herraiz.org
  • 4. The laws of software evolution 4 http://www.uax.es http://herraiz.org
  • 5. My background ● Educated as a chemical and mechanical engineer ● Wasted my time in the chemical industry. But I did (and do) love doing software! – http://caflur.sf.net http://gpinch.sf.net ● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group – http://libresoft.es 5 http://www.uax.es http://herraiz.org
  • 6. How it all started ● Godfrey and Tu ● My supervisors and I [GT00] [GT01] wrote a paper on the studied the evolution topic [RAGBH05] of the Linux kernel ● At the time, I thought ● They said that the it was just one more laws of software paper evolution were not ● It turned out to be our valid for Linux most cited paper – Laws of software evolution. What is ● Completely puzzled that? me 6 http://www.uax.es http://herraiz.org
  • 7. The topic background: Software evolution ● How and why does software evolve? ● Meir M. Lehman Laws of software evolution ● “Program evolution. Processes of software change” published in 1985 7 http://www.uax.es http://herraiz.org
  • 8. The laws in the seventies ● Laws of Program Evolution Dynamics (1974) 8 [Leh74] [Leh85b] http://www.uax.es http://herraiz.org
  • 9. The evolution of the laws of software evolution [Leh96] [LRW+97] [MFRP06] [Leh78] [Leh80] [Leh85c] [LB85] [Leh74] [Leh85b] 9 http://www.uax.es http://herraiz.org
  • 10. The laws in the present day (I – IV) 10 http://www.uax.es http://herraiz.org
  • 11. The laws in the present day (V – VIII) 11 http://www.uax.es http://herraiz.org
  • 12. Empirical studies of software evolution See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi 12 in Tom Mens, Serge Demeyer (eds.) Software Evolution http://www.uax.es http://herraiz.org
  • 13. Why the controversy about the laws of software evolution? ● Fernandez-Ramil et al. found in the literature empirical validation for the I, VI, VII (partially) and VIII (partially) ● The most interesting part (for me) – Statistical analysis of software projects and their evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b] – “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c] ● 13 http://www.uax.es http://herraiz.org
  • 14. The nature of (libre) software evolution 14 http://www.uax.es http://herraiz.org
  • 15. The nature of (libre) software evolution ● The goal is to develop a theoretical model for software evolution ● Long pursued goal ● Lehman and Belady in 1971 [BL71] [LB85] ● Woodside progressive and anti-regressive work [Woo80] (included in [LB85]) ● Turski models [Tur96] [Tur02] – Growth is inversely proportional to complexity – Complexity is proportional to the square of size 15 http://www.uax.es http://herraiz.org
  • 16. More recent models ● Self-Organized criticality [Wu06] [WHH07] ● Power laws for the size of the system ● Long range correlations in the time series of changes ● Maintenance Guidance Model [CFR07] ● Those functions that have suffered more changes in the past are more likely to be changed in the future ● Assumptions: – Distribution of accumulated changes is asymmetrical – Developers prioritize changes using past number of changes and complexity 16 http://www.uax.es http://herraiz.org
  • 17. Determinism and evolution ● Self Organized Criticality ● This means that current events are influenced by very old events ● Against Lehman suggestions [Leh78] [Leh85c] ● In my opinion, counter intuitive 17 http://www.uax.es http://herraiz.org
  • 18. Long range correlated processes http://www.uax.es http://herraiz.org
  • 19. Long range correlated processes http://www.uax.es http://herraiz.org
  • 20. Long range correlated processes Unreachable http://www.uax.es http://herraiz.org
  • 25. How is software evolution? or ? http://www.uax.es http://herraiz.org
  • 26. Autocorrelation coefficients ... 1 2 3 4 5 r(1) ... 1 2 3 4 r(2) ... 1 2 3 . . . http://www.uax.es http://herraiz.org
  • 27. r(k) Autocorrelation coefficients 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 k http://www.uax.es http://herraiz.org
  • 28. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~C 1−k  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 k http://www.uax.es http://herraiz.org
  • 29. r(k) Autocorrelation coefficients 1 Long range correlated r k ~k 2d−1 0d 0.5 Short range correlated (ARIMA process) r k ~ Ai 1−k  Logarithmic scale 0 k http://www.uax.es http://herraiz.org
  • 30. Empirical study ● 3,821 software projects – More than 3 developers – More than 1 year of active history – 9,234,104 commits / 2,357,438 modification requests – Projects registered between Nov. 1999 and Dec. 2004 – Datasets publicly available ● See Determinism and evolution – 5th International Working Conference on Mining Software Repositories (MSR 2008) FLOSSMole + CVSAnalY-SF http://www.uax.es http://herraiz.org
  • 31. Methodology ● Liner correlation to calculate linearity ● Distribution of the Pearson coefficients ● Smoothing applied to the series before calculating ACF http://www.uax.es http://herraiz.org
  • 34. Results Long memory processes Short memory processes http://www.uax.es http://herraiz.org
  • 35. Looking at the numbers Quantile Commits MRs 0 0.3235 0.2886 20 0.7394 0.7248 40 0.8178 0.8036 60 0.8906 0.8705 80 0.9783 0.9464 100 0.9998 0.9998 Long memory process Short memory process 35 http://www.uax.es http://herraiz.org
  • 36. Implications for evolution ● Short memory -> Yesterday's weather http://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788 ● When deciding, current situation should have more influence ● As Lehman said in 1978 http://www.uax.es http://herraiz.org
  • 37. How to forecast software evolution 37 http://www.uax.es http://herraiz.org
  • 38. Background ● Forecasting traditionally done using very simple statistical models ● Regression ● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution ● Let's compare time series analysis against regression models 38 http://www.uax.es http://herraiz.org
  • 39. Case studies Training set Test set PostgreSQL FreeBSD NetBSD 1993 1995 1997 1999 2001 2003 2005 2007 Time 39 http://www.uax.es http://herraiz.org
  • 40. Case studies Training set Test set 40 http://www.uax.es http://herraiz.org
  • 41. Time Series Analysis Original Yes ACF Clear time series PACF pattern? data No Kernel smoothing ARIMA p, d, q Predictions model based on fitting ACF / PACF http://www.uax.es http://herraiz.org
  • 42. Parameters of the model http://www.uax.es http://herraiz.org
  • 43. Autocorrelation coefficients. No smoothing http://www.uax.es http://herraiz.org
  • 44. Autocorrelation coefficients. After smoothing http://www.uax.es http://herraiz.org
  • 45. Parameters of all the models ● Time series ARIMA model ● d=1 q=0 p = 6, 7 or 9 ● Regression model ● r > 0.99 http://www.uax.es http://herraiz.org
  • 46. How does the model look like?     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 i i B =B =x t−i xt ∇ x t =x t −x t−1=1−B x t d d ∇ x t =1−B x t http://www.uax.es http://herraiz.org
  • 47. How does the model look like? Predicted / Actual values Estimation Coefficients Linear component errors     q p d j i ∇ x t 1−∑  j B =t 1−∑  i B j=1 i=1 Parameters of the model Linear component http://www.uax.es http://herraiz.org
  • 48. Results Time series (ARIMA) vs. regression ARIMA Regression FreeBSD 3.93 16.89 NetBSD 1.80 15.94 PostgreSQL 1.48 6.86 Mean Squared Relative Error http://www.uax.es http://herraiz.org
  • 49. Conclusions ● Time Series more accurate than Regression Analysis for macroscopic predictions ● Basic model. More components can be added. ● Seasonality ● Multi-variable, combining different factors http://www.uax.es http://herraiz.org
  • 50. More results ● Ok, so you predicted last year...which is past... ● What about predicting real future? MSR Challenge 2007 winners Goal: predicting the number of changes in Eclipse in the next three months http://dx.doi.org/10.1109/MSR.2007.10 http://www.uax.es http://herraiz.org
  • 51. Why this works? ● Isn't it too accurate? ● Why do you think this works? http://www.uax.es http://herraiz.org
  • 52. What's next? 52 http://www.uax.es http://herraiz.org
  • 53. Further work ● Write a paper about the controversy around the validation of the laws of software evolution ● In progress ● Write a paper about the short memory nature of evolution ● Using Time Series Analysis to show it ● And ARIMA as a forecasting tool ● Extracting principles and guidelines for software projects management 53 http://www.uax.es http://herraiz.org
  • 54. And what I did learn during all these years? 54 http://www.uax.es http://herraiz.org
  • 55. Things I appreciate my advisors did ● Freedom of movements ● Pressure to get my own funding ● Unconditional support ● Demanding and challenging environment ● Opportunity to coordinate projects ● And to participate in many meetings alone 55 http://www.uax.es http://herraiz.org
  • 56. Things that I did not know and I do now ● Know-how about conferences and journals ● English skills ● Writing skills (papers and proposals) ● Presentation skills ● Self-motivation – Brick walls are there for the rest of people – Experience is what you get when you don't get what you want – Never give up – http://www.youtube.com/watch?v=ji5_MqicxSo 56 http://www.uax.es http://herraiz.org
  • 57. Take away Laws of Statistical Software Evolution approach Controversy Replicable study Short memory Brick walls are dynamics a good thing ARIMA Keep working. accurate forecast Don't give up 57 http://www.uax.es http://herraiz.org