How History Justifies System Architecture (or Not)

Thomas Zimmermann
Thomas ZimmermannResearcher at Microsoft Research
1/12

International Workshop on Principles of Software Evolution · Helsinki, Finland, 1 September 2003




How History Justifies
System Architecture (or not)
Thomas Zimmermann
(with Stephan Diehl and Andreas Zeller)
Lehrstuhl Softwaretechnik
Universit¨t des Saarlandes, Saarbr¨cken, Germany
         a                        u
2/12
The Problem
Your task: extend the debug component in GCC!
You identify the variable xcoff debug hooks.
What else do you need to change?
2/12
The Problem
Your task: extend the debug component in GCC!
You identify the variable xcoff debug hooks.
What else do you need to change?

General issue: only change coupled entities!
You can detect existing coupling by

 • Program Analysis—e.g. def-use associations.
 • Learning from History—entities changed together.
3/12
Evolutionary Coupling
                                   34
     gcc/gcc/dbxout.c [134]             gcc/gcc/sdbout.c [74]
                dbx_debug_hooks         sdb_debug_hooks

                      [12]                    [12]



                      [10]
               xcoff_debug_hooks
3/12
Evolutionary Coupling
                                      34
        gcc/gcc/dbxout.c [134]                  gcc/gcc/sdbout.c [74]
                   dbx_debug_hooks              sdb_debug_hooks
                                      12
                         [12]                         [12]
                                           10
                       10

                         [10]
                  xcoff_debug_hooks




Support: How much evidence (= simultaneous changes)?
Confidence: How relevant is coupling for participants?
3/12
Evolutionary Coupling
                                             34
        gcc/gcc/dbxout.c [134]                         gcc/gcc/sdbout.c [74]
                   dbx_debug_hooks                     sdb_debug_hooks
                                             12
                            [12]                              [12]
                                         4        10
                                                                 4
                         10

                            [10]                               [4]
                                             4
                   xcoff_debug_hooks                    sdb_global_decl()

                   dbx_functions_end()

             [6]            [7]
                     2
        dbx_symbol_name()


Support: How much evidence (= simultaneous changes)?
Confidence: How relevant is coupling for participants?
4/12
What We Do
Our ROSE prototype analyzes evolution of CVS archives.


                            ROSE                       Couplings
              Reengineering Of Software Evolution
                                                       Graphs
 CVS
              Step 1: Restore Transactions from CVS
                                                       Metrics
              Step 2: Identify Modified Entities


ROSE determines entities at different granularities:

coarse-granular entities: directories, modules, files
fine-granular entities: methods, variables, sections
5/12
Step 1: Restoring Transactions
Two atomic changes δi and δi+1 are part of one
transaction ∆ = (δ1 , . . . , δn ) if:

                   author(δi ) = author(δi+1 ) ∧
              log message(δi ) = log message(δi+1 ) ∧
        |time(δi+1 ) − time(δi )| < 200 seconds

We use a sliding window instead of a fixed one.


GNU C Compiler (GCC):
The average transaction length is 6.2 seconds.
The maximal transaction length is 1 hour 32 minutes.
6/12
Step 2: Light-Weight Analysis
      File: Animals.java

      1 class Cat {
      3   public String[] COLORS = {
              ...
     23   }

     25   public Cat() {
              ...
     30   }
          ...
     56 }
     58 class Dog {
     60   public String[] COLORS = {
              ...
     80   }
          ...
     99 }
6/12
Step 2: Light-Weight Analysis
      File: Animals.java               Step A: Map to Entities

      1 class Cat {
      3   public String[] COLORS = {   Cat.COLORS
                ...
                                       lines 3-23
     23     }
                                                     Class Cat
     25                                              lines 1-56
            public Cat() {             Cat.Cat()
                ...
                                       lines 25-30
     30   }
          ...
     56 }
     58 class Dog {
     60   public String[] COLORS = {   Dog.COLORS    Class Dog
              ...
                                       lines 60-80   lines 58-99
     80     }
            ...
     99 }
6/12
Step 2: Light-Weight Analysis
         File: Animals.java              Step A: Map to Entities

        1 class Cat {
        3   public String[] COLORS = {
                                         Cat.COLORS
       17     ...
                                         lines 3-23
       23   }
                                                       Class Cat
       25                                              lines 1-56
            public Cat() {
                                         Cat.Cat()
                ...
                                         lines 25-30
       30   }
            ...
       56 }
       58 class Dog {
       60   public String[] COLORS = {
                                         Dog.COLORS    Class Dog
                ...
                                         lines 60-80   lines 58-99
       80     }
              ...
       99 }

                                         Step B: Filter Entities


We analyze C/C++, JAVA, PYTHON, TEX and TEXINFO files.
We get the modified methods, variables and subsections.
7/12
Example: GCC

                               i386.c
           pentiumpro_cost
               [12]
 i486_cost
                       pentium_cost
   [11]
                      [11]


    [12]
   i386_cost [14]
            k6_cost
7/12
Example: GCC

                               i386.c
           pentiumpro_cost
               [12]
 i486_cost
                       pentium_cost
   [11]
                      [11]
              11
    [12]
   i386_cost [14]
            k6_cost
7/12
Example: GCC

                               i386.c       i386.h
           pentiumpro_cost
               [12]
 i486_cost
                       pentium_cost
   [11]
                                        9
                      [11]                            [11]
              11
                                                     processor_cost
    [12]
   i386_cost [14]
            k6_cost
8/12
Visualizing Coupling
    A   B   C   D
                       High Confidence
A

B
C
                       Low Confidence
D
                       No Coupling (No Support)
8/12
Visualizing Coupling
     A     B     C   D
                                High Confidence
A

B
C
                                Low Confidence
D
                                No Coupling (No Support)
     A               C
               [3]         A ⇒ C: Confidence 3/10 = 30%
    [10]             [4]
                           C ⇒ A: Confidence 3/4 = 75%
9/12
Comparing Architecture with Evolution


                               Directory
                                 ddd/




                              DDD Source



                                  Libraries
           Pics
            Icons

                    Patches

                     Tests
9/12
Comparing Architecture with Evolution


                               Directory
                                 ddd/




                              DDD Source
                                               Bad architecture



                                  Libraries
           Pics
            Icons

                    Patches

                     Tests
                                              Better architecture
10/12
Measuring Evolutionary Coupling
Evolutionary Coupling Index (ECI).
   Different levels: entity/file or file/directory level.
                            # external couplings
                    ECI =
                            # internal couplings

The lower the ECI, the better the modularity.
10/12
Measuring Evolutionary Coupling
Evolutionary Coupling Index (ECI).
   Different levels: entity/file or file/directory level.
                            # external couplings
                    ECI =
                            # internal couplings

The lower the ECI, the better the modularity.

                File/Directory         Entity/File
                      ECI             ECI     ECIfiltered
        GCC          5.757            3.615    1.504
        DDD          0.250            4.462    1.922
        APACHE       2.827           11.815    0.675
        OPENSSL      8.665          101.053    7.859

Comparing only one level may be misleading (DDD).
11/12
Guiding the Programmer
12/12
Conclusion
Fine-grained evolutionary coupling. . .

 • detects coupling between non-program entities.
   e.g. coupling between a function and a database schema
 • guides developers while making changes.
   Programmers who changed this function also changed. . .
 • gives better(?) results than coarse-grained coupling.
   Coupling between files doesn’t tell you that much
 • can be compared with given coupling (= architecture).
   Results are mixed—what is coupling, anyway?

Those who cannot learn from history are doomed to repeat it.
                    (George Santayana)
1 of 22

More Related Content

What's hot(20)

Advanced Java Practical FileAdvanced Java Practical File
Advanced Java Practical File
Soumya Behera14.3K views
Ad java prac sol setAd java prac sol set
Ad java prac sol set
Iram Ramrajkar31.9K views
Developing for Node.JS with MySQL and NoSQLDeveloping for Node.JS with MySQL and NoSQL
Developing for Node.JS with MySQL and NoSQL
John David Duncan8.4K views
HibernateHibernate
Hibernate
ksain721 views
Threads v3Threads v3
Threads v3
Sunil OS104.8K views
The uniform interface is 42The uniform interface is 42
The uniform interface is 42
Yevhen Bobrov345 views
Swift for TensorFlow - CoreML PersonalizationSwift for TensorFlow - CoreML Personalization
Swift for TensorFlow - CoreML Personalization
Jacopo Mangiavacchi176 views
Clojure: a LISP for the JVMClojure: a LISP for the JVM
Clojure: a LISP for the JVM
Knowledge Engineering and Machine Learning Group1.9K views
Vaadin 7Vaadin 7
Vaadin 7
Joonas Lehtinen3.1K views
Talk - Query monad Talk - Query monad
Talk - Query monad
Fabernovel1.7K views
Jdbc oracleJdbc oracle
Jdbc oracle
yazidds2528 views
Drools & jBPM Info SheetDrools & jBPM Info Sheet
Drools & jBPM Info Sheet
Mark Proctor5.9K views
Unit Testing with FoqUnit Testing with Foq
Unit Testing with Foq
Phillip Trelford4.9K views
JavaJava
Java
Frida Herencia Quispe233 views
Metrics for example Java projectMetrics for example Java project
Metrics for example Java project
Zarko Acimovic275 views

Viewers also liked(19)

Aspect Mining for Large SystemsAspect Mining for Large Systems
Aspect Mining for Large Systems
Thomas Zimmermann453 views
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann632 views
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
Thomas Zimmermann1.5K views
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann1.6K views
esolang: Esoterische Programmiersprachenesolang: Esoterische Programmiersprachen
esolang: Esoterische Programmiersprachen
Thomas Zimmermann1.5K views
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann1.5K views
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann5.9K views
When do changes induce fixes?When do changes induce fixes?
When do changes induce fixes?
Thomas Zimmermann3.1K views
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann3.3K views
Unit testing with JUnitUnit testing with JUnit
Unit testing with JUnit
Thomas Zimmermann16.1K views

Similar to How History Justifies System Architecture (or Not)(20)

More from Thomas Zimmermann(13)

MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
Thomas Zimmermann21.8K views
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann2.6K views
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
Thomas Zimmermann1.3K views
Data driven games user researchData driven games user research
Data driven games user research
Thomas Zimmermann1.5K views
Analytics for software developmentAnalytics for software development
Analytics for software development
Thomas Zimmermann4.6K views
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann1.9K views
HATARI: Raising Risk AwarenessHATARI: Raising Risk Awareness
HATARI: Raising Risk Awareness
Thomas Zimmermann584 views

Recently uploaded(20)

How History Justifies System Architecture (or Not)

  • 1. 1/12 International Workshop on Principles of Software Evolution · Helsinki, Finland, 1 September 2003 How History Justifies System Architecture (or not) Thomas Zimmermann (with Stephan Diehl and Andreas Zeller) Lehrstuhl Softwaretechnik Universit¨t des Saarlandes, Saarbr¨cken, Germany a u
  • 2. 2/12 The Problem Your task: extend the debug component in GCC! You identify the variable xcoff debug hooks. What else do you need to change?
  • 3. 2/12 The Problem Your task: extend the debug component in GCC! You identify the variable xcoff debug hooks. What else do you need to change? General issue: only change coupled entities! You can detect existing coupling by • Program Analysis—e.g. def-use associations. • Learning from History—entities changed together.
  • 4. 3/12 Evolutionary Coupling 34 gcc/gcc/dbxout.c [134] gcc/gcc/sdbout.c [74] dbx_debug_hooks sdb_debug_hooks [12] [12] [10] xcoff_debug_hooks
  • 5. 3/12 Evolutionary Coupling 34 gcc/gcc/dbxout.c [134] gcc/gcc/sdbout.c [74] dbx_debug_hooks sdb_debug_hooks 12 [12] [12] 10 10 [10] xcoff_debug_hooks Support: How much evidence (= simultaneous changes)? Confidence: How relevant is coupling for participants?
  • 6. 3/12 Evolutionary Coupling 34 gcc/gcc/dbxout.c [134] gcc/gcc/sdbout.c [74] dbx_debug_hooks sdb_debug_hooks 12 [12] [12] 4 10 4 10 [10] [4] 4 xcoff_debug_hooks sdb_global_decl() dbx_functions_end() [6] [7] 2 dbx_symbol_name() Support: How much evidence (= simultaneous changes)? Confidence: How relevant is coupling for participants?
  • 7. 4/12 What We Do Our ROSE prototype analyzes evolution of CVS archives. ROSE Couplings Reengineering Of Software Evolution Graphs CVS Step 1: Restore Transactions from CVS Metrics Step 2: Identify Modified Entities ROSE determines entities at different granularities: coarse-granular entities: directories, modules, files fine-granular entities: methods, variables, sections
  • 8. 5/12 Step 1: Restoring Transactions Two atomic changes δi and δi+1 are part of one transaction ∆ = (δ1 , . . . , δn ) if: author(δi ) = author(δi+1 ) ∧ log message(δi ) = log message(δi+1 ) ∧ |time(δi+1 ) − time(δi )| < 200 seconds We use a sliding window instead of a fixed one. GNU C Compiler (GCC): The average transaction length is 6.2 seconds. The maximal transaction length is 1 hour 32 minutes.
  • 9. 6/12 Step 2: Light-Weight Analysis File: Animals.java 1 class Cat { 3 public String[] COLORS = { ... 23 } 25 public Cat() { ... 30 } ... 56 } 58 class Dog { 60 public String[] COLORS = { ... 80 } ... 99 }
  • 10. 6/12 Step 2: Light-Weight Analysis File: Animals.java Step A: Map to Entities 1 class Cat { 3 public String[] COLORS = { Cat.COLORS ... lines 3-23 23 } Class Cat 25 lines 1-56 public Cat() { Cat.Cat() ... lines 25-30 30 } ... 56 } 58 class Dog { 60 public String[] COLORS = { Dog.COLORS Class Dog ... lines 60-80 lines 58-99 80 } ... 99 }
  • 11. 6/12 Step 2: Light-Weight Analysis File: Animals.java Step A: Map to Entities 1 class Cat { 3 public String[] COLORS = { Cat.COLORS 17 ... lines 3-23 23 } Class Cat 25 lines 1-56 public Cat() { Cat.Cat() ... lines 25-30 30 } ... 56 } 58 class Dog { 60 public String[] COLORS = { Dog.COLORS Class Dog ... lines 60-80 lines 58-99 80 } ... 99 } Step B: Filter Entities We analyze C/C++, JAVA, PYTHON, TEX and TEXINFO files. We get the modified methods, variables and subsections.
  • 12. 7/12 Example: GCC i386.c pentiumpro_cost [12] i486_cost pentium_cost [11] [11] [12] i386_cost [14] k6_cost
  • 13. 7/12 Example: GCC i386.c pentiumpro_cost [12] i486_cost pentium_cost [11] [11] 11 [12] i386_cost [14] k6_cost
  • 14. 7/12 Example: GCC i386.c i386.h pentiumpro_cost [12] i486_cost pentium_cost [11] 9 [11] [11] 11 processor_cost [12] i386_cost [14] k6_cost
  • 15. 8/12 Visualizing Coupling A B C D High Confidence A B C Low Confidence D No Coupling (No Support)
  • 16. 8/12 Visualizing Coupling A B C D High Confidence A B C Low Confidence D No Coupling (No Support) A C [3] A ⇒ C: Confidence 3/10 = 30% [10] [4] C ⇒ A: Confidence 3/4 = 75%
  • 17. 9/12 Comparing Architecture with Evolution Directory ddd/ DDD Source Libraries Pics Icons Patches Tests
  • 18. 9/12 Comparing Architecture with Evolution Directory ddd/ DDD Source Bad architecture Libraries Pics Icons Patches Tests Better architecture
  • 19. 10/12 Measuring Evolutionary Coupling Evolutionary Coupling Index (ECI). Different levels: entity/file or file/directory level. # external couplings ECI = # internal couplings The lower the ECI, the better the modularity.
  • 20. 10/12 Measuring Evolutionary Coupling Evolutionary Coupling Index (ECI). Different levels: entity/file or file/directory level. # external couplings ECI = # internal couplings The lower the ECI, the better the modularity. File/Directory Entity/File ECI ECI ECIfiltered GCC 5.757 3.615 1.504 DDD 0.250 4.462 1.922 APACHE 2.827 11.815 0.675 OPENSSL 8.665 101.053 7.859 Comparing only one level may be misleading (DDD).
  • 22. 12/12 Conclusion Fine-grained evolutionary coupling. . . • detects coupling between non-program entities. e.g. coupling between a function and a database schema • guides developers while making changes. Programmers who changed this function also changed. . . • gives better(?) results than coarse-grained coupling. Coupling between files doesn’t tell you that much • can be compared with given coupling (= architecture). Results are mixed—what is coupling, anyway? Those who cannot learn from history are doomed to repeat it. (George Santayana)