Measuring Disruption from Software
 Evolution Activities Using Graph-
 Based Metrics

Prashant Paymal, Rajvardhan Patil, Sanjukta Bhowmick, Harvey Siy

                  Department of Computer Science,
                  University of Nebraska at Omaha
Introduction
• Real world software systems have large numbers
  of components (e.g. classes, functions, etc.)

• It is difficult to get a quick summary of how
  system evolved after a major change such as
  perfective maintenance activity or new software
  release
Case Study
Version     Date                         Commit Messages
  V1       3/9/2001   Merge to JHotDraw 5.2 (using JFC/Swing GUI components)
  V2      10/24/2001 Before merge for version 5.3 (dnd, undo…) merge dnd
                     (before 5.3)
  V3      8/4/2002    After various merges… (before 5.4 relaease)
  V4      11/8/2002   Refactor to use StandardStorageFormat as a superclass
  V5      5/8/2003    Refactoring of Cursor. – java.awt.Cursor(class) has been
                      systematically replaced
  V6       1/9/2004   After renaming the CH.ifa.draw to org.jhotdraw


• Our case study consists of six versions of JHotDraw from
  March 2001 to January 2004
Network Construction
• Extracted relationships from these versions
 (inheritance, implementation, method calls and class member
 access, object declaration and instantiation)


• Network was created by connecting class
  dependencies, where each edge (u, v) is a
  dependency from class „u‟ to class „v‟
Vertex Properties
• Degree Distribution
  ▫ Frequency of vertices per degree, scale
    free for most real world networks

• Clustering Coefficient
  ▫ Connections between neighbors

• Betweenness Centrality
  ▫ Ratio of shortest paths through a
    vertex

• Articulation Points
  ▫ It‟s removal would cause the network to
    become disconnected
• Network representing Version 1,
  ▫ Lighter Nodes: High Betweenness Centrality
  ▫ Larger Nodes: High Clustering Coefficient
Objective
• Extract key combinatorial properties from these
  six networks that would enable us to detect
  evolutionary characteristics such as

 ▫ Points of significant change in the software

 ▫ How these changes affect crucial classes in the
   network
Change in Vertex Properties
• All properties increased with version number
Correlation Between Properties




 ▫ Positive correlation between degree and betweenness centrality
 ▫ Correlation between clustering coefficient and betweenness
   centrality changes across versions
Disruption in Values and Rank
• We examine how the relationships between
  these properties changed from one version to the
  next
Disruption in Values and Rank
Disruption in Values and Rank
Identifying Crucial Vertices
• High
  ▫ If vertex has high rank (within top 25) in at least one of the
    following categories
• Extra High
  ▫ If vertex has high rank in at least two categories
• Low
  ▫ If vertex has zero value for any one vertex based properties
    and is not marked as a High vertex
• Extra Low
  ▫ If it has zero value for both betweenness centrality and
    clustering coefficient

• (High Betweenness Centrality, High Indegree, High Outdegree,
  High Clustering Coefficient / Articulation Point)
Percentage Breakdown of All Vertices
in Each Version
Percentage Breakdown of Vertices
(Common to All Versions)
  Other




  Extra
  Low




   Low




  Extra
  High




  High
Analysis of Newly Added Vertices
Bug Frequencies




• Changes that have the keywords “bug fix” in the change log

• The periods with high percentage are also the periods after the high
  disruption
Conclusion
• The significant evolutionary changes occur between
  Version 2 – Version 3 and Version 4 – Version 5

• The network has grown cumulatively. Newer vertices
  tend to get added to the peripheries of the network

• The top 25 ranking of vertices was generally stable
  across versions. Important nodes stay important. This
  indicates stability in the design.

• The bug frequency is higher after Version 3 and Version
  5. The degree of disruption can help explain why bug
  incidence increases (future work)
Acknowledgement
• Nebraska EPSCoR

• College of IS&T, University of Nebraska at
  Omaha
Thank you!

ERA - Measuring Disruption from Software Evolution Activities Using Graph-Based Metrics

  • 1.
    Measuring Disruption fromSoftware Evolution Activities Using Graph- Based Metrics Prashant Paymal, Rajvardhan Patil, Sanjukta Bhowmick, Harvey Siy Department of Computer Science, University of Nebraska at Omaha
  • 2.
    Introduction • Real worldsoftware systems have large numbers of components (e.g. classes, functions, etc.) • It is difficult to get a quick summary of how system evolved after a major change such as perfective maintenance activity or new software release
  • 3.
    Case Study Version Date Commit Messages V1 3/9/2001 Merge to JHotDraw 5.2 (using JFC/Swing GUI components) V2 10/24/2001 Before merge for version 5.3 (dnd, undo…) merge dnd (before 5.3) V3 8/4/2002 After various merges… (before 5.4 relaease) V4 11/8/2002 Refactor to use StandardStorageFormat as a superclass V5 5/8/2003 Refactoring of Cursor. – java.awt.Cursor(class) has been systematically replaced V6 1/9/2004 After renaming the CH.ifa.draw to org.jhotdraw • Our case study consists of six versions of JHotDraw from March 2001 to January 2004
  • 4.
    Network Construction • Extractedrelationships from these versions (inheritance, implementation, method calls and class member access, object declaration and instantiation) • Network was created by connecting class dependencies, where each edge (u, v) is a dependency from class „u‟ to class „v‟
  • 5.
    Vertex Properties • DegreeDistribution ▫ Frequency of vertices per degree, scale free for most real world networks • Clustering Coefficient ▫ Connections between neighbors • Betweenness Centrality ▫ Ratio of shortest paths through a vertex • Articulation Points ▫ It‟s removal would cause the network to become disconnected
  • 6.
    • Network representingVersion 1, ▫ Lighter Nodes: High Betweenness Centrality ▫ Larger Nodes: High Clustering Coefficient
  • 7.
    Objective • Extract keycombinatorial properties from these six networks that would enable us to detect evolutionary characteristics such as ▫ Points of significant change in the software ▫ How these changes affect crucial classes in the network
  • 8.
    Change in VertexProperties • All properties increased with version number
  • 9.
    Correlation Between Properties ▫ Positive correlation between degree and betweenness centrality ▫ Correlation between clustering coefficient and betweenness centrality changes across versions
  • 10.
    Disruption in Valuesand Rank • We examine how the relationships between these properties changed from one version to the next
  • 11.
  • 12.
  • 13.
    Identifying Crucial Vertices •High ▫ If vertex has high rank (within top 25) in at least one of the following categories • Extra High ▫ If vertex has high rank in at least two categories • Low ▫ If vertex has zero value for any one vertex based properties and is not marked as a High vertex • Extra Low ▫ If it has zero value for both betweenness centrality and clustering coefficient • (High Betweenness Centrality, High Indegree, High Outdegree, High Clustering Coefficient / Articulation Point)
  • 14.
    Percentage Breakdown ofAll Vertices in Each Version
  • 15.
    Percentage Breakdown ofVertices (Common to All Versions) Other Extra Low Low Extra High High
  • 16.
    Analysis of NewlyAdded Vertices
  • 17.
    Bug Frequencies • Changesthat have the keywords “bug fix” in the change log • The periods with high percentage are also the periods after the high disruption
  • 18.
    Conclusion • The significantevolutionary changes occur between Version 2 – Version 3 and Version 4 – Version 5 • The network has grown cumulatively. Newer vertices tend to get added to the peripheries of the network • The top 25 ranking of vertices was generally stable across versions. Important nodes stay important. This indicates stability in the design. • The bug frequency is higher after Version 3 and Version 5. The degree of disruption can help explain why bug incidence increases (future work)
  • 19.
    Acknowledgement • Nebraska EPSCoR •College of IS&T, University of Nebraska at Omaha
  • 20.