The document discusses how analyzing the evolution history of software systems through version control data can help detect coupled entities that often change together, which can help guide programmers making changes and validate whether a system's actual architecture matches its intended architecture. It presents a technique called evolutionary coupling that analyzes version control transactions to build graphs and metrics showing coupled entities at both fine-grained and coarse-grained levels.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
How History Justifies System Architecture (or Not)
1. 1/12
International Workshop on Principles of Software Evolution · Helsinki, Finland, 1 September 2003
How History Justifies
System Architecture (or not)
Thomas Zimmermann
(with Stephan Diehl and Andreas Zeller)
Lehrstuhl Softwaretechnik
Universit¨t des Saarlandes, Saarbr¨cken, Germany
a u
2. 2/12
The Problem
Your task: extend the debug component in GCC!
You identify the variable xcoff debug hooks.
What else do you need to change?
3. 2/12
The Problem
Your task: extend the debug component in GCC!
You identify the variable xcoff debug hooks.
What else do you need to change?
General issue: only change coupled entities!
You can detect existing coupling by
• Program Analysis—e.g. def-use associations.
• Learning from History—entities changed together.
5. 3/12
Evolutionary Coupling
34
gcc/gcc/dbxout.c [134] gcc/gcc/sdbout.c [74]
dbx_debug_hooks sdb_debug_hooks
12
[12] [12]
10
10
[10]
xcoff_debug_hooks
Support: How much evidence (= simultaneous changes)?
Confidence: How relevant is coupling for participants?
6. 3/12
Evolutionary Coupling
34
gcc/gcc/dbxout.c [134] gcc/gcc/sdbout.c [74]
dbx_debug_hooks sdb_debug_hooks
12
[12] [12]
4 10
4
10
[10] [4]
4
xcoff_debug_hooks sdb_global_decl()
dbx_functions_end()
[6] [7]
2
dbx_symbol_name()
Support: How much evidence (= simultaneous changes)?
Confidence: How relevant is coupling for participants?
7. 4/12
What We Do
Our ROSE prototype analyzes evolution of CVS archives.
ROSE Couplings
Reengineering Of Software Evolution
Graphs
CVS
Step 1: Restore Transactions from CVS
Metrics
Step 2: Identify Modified Entities
ROSE determines entities at different granularities:
coarse-granular entities: directories, modules, files
fine-granular entities: methods, variables, sections
8. 5/12
Step 1: Restoring Transactions
Two atomic changes δi and δi+1 are part of one
transaction ∆ = (δ1 , . . . , δn ) if:
author(δi ) = author(δi+1 ) ∧
log message(δi ) = log message(δi+1 ) ∧
|time(δi+1 ) − time(δi )| < 200 seconds
We use a sliding window instead of a fixed one.
GNU C Compiler (GCC):
The average transaction length is 6.2 seconds.
The maximal transaction length is 1 hour 32 minutes.
9. 6/12
Step 2: Light-Weight Analysis
File: Animals.java
1 class Cat {
3 public String[] COLORS = {
...
23 }
25 public Cat() {
...
30 }
...
56 }
58 class Dog {
60 public String[] COLORS = {
...
80 }
...
99 }
10. 6/12
Step 2: Light-Weight Analysis
File: Animals.java Step A: Map to Entities
1 class Cat {
3 public String[] COLORS = { Cat.COLORS
...
lines 3-23
23 }
Class Cat
25 lines 1-56
public Cat() { Cat.Cat()
...
lines 25-30
30 }
...
56 }
58 class Dog {
60 public String[] COLORS = { Dog.COLORS Class Dog
...
lines 60-80 lines 58-99
80 }
...
99 }
11. 6/12
Step 2: Light-Weight Analysis
File: Animals.java Step A: Map to Entities
1 class Cat {
3 public String[] COLORS = {
Cat.COLORS
17 ...
lines 3-23
23 }
Class Cat
25 lines 1-56
public Cat() {
Cat.Cat()
...
lines 25-30
30 }
...
56 }
58 class Dog {
60 public String[] COLORS = {
Dog.COLORS Class Dog
...
lines 60-80 lines 58-99
80 }
...
99 }
Step B: Filter Entities
We analyze C/C++, JAVA, PYTHON, TEX and TEXINFO files.
We get the modified methods, variables and subsections.
16. 8/12
Visualizing Coupling
A B C D
High Confidence
A
B
C
Low Confidence
D
No Coupling (No Support)
A C
[3] A ⇒ C: Confidence 3/10 = 30%
[10] [4]
C ⇒ A: Confidence 3/4 = 75%
22. 12/12
Conclusion
Fine-grained evolutionary coupling. . .
• detects coupling between non-program entities.
e.g. coupling between a function and a database schema
• guides developers while making changes.
Programmers who changed this function also changed. . .
• gives better(?) results than coarse-grained coupling.
Coupling between files doesn’t tell you that much
• can be compared with given coupling (= architecture).
Results are mixed—what is coupling, anyway?
Those who cannot learn from history are doomed to repeat it.
(George Santayana)