Properties of  Signature Change Patterns Sung Kim,  Jim Whitehead,  Jennifer Bevan  University of California, Santa Cruz
Motivation <ul><li>In languages like C, functions are the primary abstraction for behavior </li></ul><ul><ul><li>Signature...
Nature of Contribution: Characterization <ul><li>Main contribution: characterization of signature change properties </li><...
No hypotheses… yet <ul><li>Empirical software engineering work tends to explore hypotheses </li></ul><ul><li>We believe it...
Analyzed Projects (all in C) 6,029 185,006 Aug 2001-Mar 2005 Subversion (SVN) 3,012 298,236 Nov 1988-Feb 1993 GCC 2,873 62...
Per-Project Analysis Process SCM Repository Filesystem Extract Automated transaction extraction Save  Persist signature da...
<ul><li>How often do signatures change? </li></ul>
Distribution of Signature Changes <ul><li>A small number of functions experience the majority of changes </li></ul><ul><ul...
Distribution of Signature Changes <ul><li>Across all projects, most functions never change their signatures, and almost al...
Function body changes and signature changes <ul><li>How often is the body of a function changed before its signature chang...
How does the number of parameters per function change over time? Does this show evidence of code decay?
Function Parameter List Lengths <ul><li>The percentage of functions with a given parameter list length (most recent revisi...
Number of Parameters Over Time
Parameter Lists Growing or Shrinking <ul><li>Apache 2 project, functions that had changes to their parameter list length <...
Signature Changes as Code Decay Measure? <ul><li>Mixed evidence for signature changes as a way to demonstrate code decay. ...
What kinds of signature changes  are most common?
Signature Change Taxonomy <ul><li>Characterize signature changes based on their impact to data flow between caller and cal...
Signature Change Kinds <ul><li>Data Flow Invariant </li></ul><ul><ul><li>Function name change </li></ul></ul><ul><ul><li>P...
Signature Change Kinds (cont’d) <ul><li>Data Flow Increasing </li></ul><ul><ul><li>Parameter addition </li></ul></ul><ul><...
Signature Change Frequencies Frequency is number of changes of specific kind over total number of signature changes for a ...
Do the frequencies of signature change kinds evolve over the history of a project? Are there project specific patterns to ...
Signature Change Frequencies Over Time - Subversion
Signature Change Frequencies Over Time – Apache 1.3
Signature Change Frequencies Over Time Can see that Subversion and Apache 1.3 exhibit different frequencies of ordering ch...
Do signature changes occur in regular sequences?
Change Kind Sequences (APR) <ul><li>A – Addition, D – Deletion, O – Ordering, C – Complex Type Change </li></ul><ul><li>Co...
Change Kind Sequences (Apache 2) <ul><li>A – Addition, D – Deletion, O – Ordering, C – Complex Type Change </li></ul><ul><...
Threats to Validity <ul><li>Limited data set </li></ul><ul><ul><li>Only 7 open source projects </li></ul></ul><ul><ul><li>...
Summary
Properties of Signature Changes <ul><li>The most common signature change kinds are, in order, complex data type, parameter...
Properties of Signature Changes (cont’d) <ul><li>Functions typically have parameter lists with 1, 2, or 3 parameters. </li...
Questions?
Upcoming SlideShare
Loading in …5
×

Signature Change Analysis

2,810 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,810
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Signature Change Analysis

  1. 1. Properties of Signature Change Patterns Sung Kim, Jim Whitehead, Jennifer Bevan University of California, Santa Cruz
  2. 2. Motivation <ul><li>In languages like C, functions are the primary abstraction for behavior </li></ul><ul><ul><li>Signatures are the interface to these abstractions </li></ul></ul><ul><li>Changes to signatures over a project’s lifetime provide insight into the evolution of behavioral abstractions . </li></ul><ul><ul><li>Frequent kinds of changes indicate evolutionary stresses </li></ul></ul><ul><ul><li>Counts of signature changes give insight into stability of a project’s modular decomposition </li></ul></ul><ul><ul><li>Patterns in signature changes might indicate patterns in changes to functional abstractions </li></ul></ul>
  3. 3. Nature of Contribution: Characterization <ul><li>Main contribution: characterization of signature change properties </li></ul><ul><ul><li>Frequency distributions </li></ul></ul><ul><ul><li>Observable patterns </li></ul></ul><ul><ul><li>Correlations with other project aspects </li></ul></ul><ul><li>That is, we examine the revision histories of 7 projects </li></ul><ul><ul><li>Extract out interesting facts about signature changes </li></ul></ul><ul><ul><li>Collate and describe them </li></ul></ul>
  4. 4. No hypotheses… yet <ul><li>Empirical software engineering work tends to explore hypotheses </li></ul><ul><li>We believe it is useful to characterize important properties of evolving systems, at fine grain </li></ul><ul><ul><li>Builds understanding of evolutionary phenomena </li></ul></ul><ul><ul><li>Can view this as pre-hypothesis stage information gathering </li></ul></ul><ul><ul><li>Can use observations to develop models of evolutionary change </li></ul></ul><ul><li>In some cases, these observations do not exist in the literature </li></ul><ul><ul><li>Signature changes are one example </li></ul></ul><ul><ul><li>We found no studies of fine-grain signature changes </li></ul></ul>
  5. 5. Analyzed Projects (all in C) 6,029 185,006 Aug 2001-Mar 2005 Subversion (SVN) 3,012 298,236 Nov 1988-Feb 1993 GCC 2,873 62,415 Dec 1994-Sep 2003 CVS 1,353 33,600 Jul 1999-Feb 2005 APR Utility (APU) 5,990 72,630 Jan 1999-Feb 2005 Apache Portable Runtime (APR) 3,877 104,417 Jul 1999-Aug 2003 Apache 2.0 (A 2) 7,747 116,398 Jan 1996-Mar 2005 Apache 1.3 (A1.3) Commits LOC Period Project
  6. 6. Per-Project Analysis Process SCM Repository Filesystem Extract Automated transaction extraction Save Persist signature data to database Relational database Analyze Query DB to observe signature properties Analysis Software Compute Extract function signatures Kenyon Origin analysis Process Reconstruct change history of functions across name changes
  7. 7. <ul><li>How often do signatures change? </li></ul>
  8. 8. Distribution of Signature Changes <ul><li>A small number of functions experience the majority of changes </li></ul><ul><ul><li>Many functions never change their signatures </li></ul></ul><ul><ul><li>Subversion project is shown below (Zipf distribution) </li></ul></ul>
  9. 9. Distribution of Signature Changes <ul><li>Across all projects, most functions never change their signatures, and almost all have fewer than three changes. </li></ul>94 58 Subversion 99 92 GCC 97 61 CVS 96 67 APU 86 49 APR 90 57 Apache 2 96 56 Apache 1.3 % changed less than 3 times % never changed Project
  10. 10. Function body changes and signature changes <ul><li>How often is the body of a function changed before its signature changes? </li></ul><ul><li>A measure of the dependence between these kinds of changes </li></ul><ul><li>Significant variation per project </li></ul><ul><ul><li>CVS and GCC are high, since they have relatively few signature changes </li></ul></ul>6.2 SVN 13.2 GCC 14.9 CVS 6.3 APU 3.5 APR 5.95 Apache 2 7.6 Apache 1.3 # of body changes before signature change Project
  11. 11. How does the number of parameters per function change over time? Does this show evidence of code decay?
  12. 12. Function Parameter List Lengths <ul><li>The percentage of functions with a given parameter list length (most recent revision) </li></ul><ul><li>1, 2, and 3 parameters are most common </li></ul>
  13. 13. Number of Parameters Over Time
  14. 14. Parameter Lists Growing or Shrinking <ul><li>Apache 2 project, functions that had changes to their parameter list length </li></ul><ul><li>Each arrow is a function </li></ul><ul><li>Up (down) arrow is growth (shrinkage) to parameter list length </li></ul>
  15. 15. Signature Changes as Code Decay Measure? <ul><li>Mixed evidence for signature changes as a way to demonstrate code decay. </li></ul><ul><ul><li>Average number of parameters per project does not always grow. </li></ul></ul><ul><ul><li>Generally stays pretty steady, with slight upward trend </li></ul></ul><ul><ul><li>Perhaps swamped by large number of functions that never change </li></ul></ul><ul><li>Of functions that do change, trend is towards increased parameters at longer parameter lengths </li></ul><ul><ul><li>Indicates functions are being asked to do more </li></ul></ul><ul><ul><li>But, smaller parameter lengths have downward trend </li></ul></ul>
  16. 16. What kinds of signature changes are most common?
  17. 17. Signature Change Taxonomy <ul><li>Characterize signature changes based on their impact to data flow between caller and callee: </li></ul><ul><ul><li>Data flow invariant </li></ul></ul><ul><ul><ul><li>Signature change does not affect flow of data </li></ul></ul></ul><ul><ul><ul><li>Examples: name changes, modifier changes </li></ul></ul></ul><ul><ul><li>Data flow increasing </li></ul></ul><ul><ul><ul><li>Examples: addition of new parameters or return type </li></ul></ul></ul><ul><ul><li>Data flow decreasing </li></ul></ul><ul><ul><ul><li>Examples: deleting parameter </li></ul></ul></ul>
  18. 18. Signature Change Kinds <ul><li>Data Flow Invariant </li></ul><ul><ul><li>Function name change </li></ul></ul><ul><ul><li>Parameter ordering change only </li></ul></ul><ul><ul><li>Parameter name change </li></ul></ul><ul><ul><li>Parameter modifier change </li></ul></ul><ul><ul><li>Concept merge/splitting change (*) </li></ul></ul><ul><ul><li>Array/pointer operation change </li></ul></ul><ul><ul><li>Return type change </li></ul></ul><ul><ul><li>Primitive type change </li></ul></ul><ul><ul><li>Complex type (structure) name change </li></ul></ul><ul><ul><li>(*) Manual detection only </li></ul></ul>
  19. 19. Signature Change Kinds (cont’d) <ul><li>Data Flow Increasing </li></ul><ul><ul><li>Parameter addition </li></ul></ul><ul><ul><li>Return type addition </li></ul></ul><ul><ul><li>Complex type inner variable addition (*) </li></ul></ul><ul><li>Data Flow Decreasing </li></ul><ul><ul><li>Parameter deletion </li></ul></ul><ul><ul><li>Return type deletion </li></ul></ul><ul><ul><li>Complex inner variable deletion (*) </li></ul></ul><ul><ul><li>(*) - manual detection only </li></ul></ul>
  20. 20. Signature Change Frequencies Frequency is number of changes of specific kind over total number of signature changes for a project. A single signature change event can include multiple signature change kinds.
  21. 21. Do the frequencies of signature change kinds evolve over the history of a project? Are there project specific patterns to the frequency and evolution of signature change kinds?
  22. 22. Signature Change Frequencies Over Time - Subversion
  23. 23. Signature Change Frequencies Over Time – Apache 1.3
  24. 24. Signature Change Frequencies Over Time Can see that Subversion and Apache 1.3 exhibit different frequencies of ordering changes and modifier changes Also, different trajectories for addition and deletion changes. There appear to be distinct per-project patterns of signature change evolution.
  25. 25. Do signature changes occur in regular sequences?
  26. 26. Change Kind Sequences (APR) <ul><li>A – Addition, D – Deletion, O – Ordering, C – Complex Type Change </li></ul><ul><li>Complex type changes very commonly occur immediately after another complex type change, for the APR project </li></ul>
  27. 27. Change Kind Sequences (Apache 2) <ul><li>A – Addition, D – Deletion, O – Ordering, C – Complex Type Change </li></ul><ul><li>Complex type changes still very common, but not as prevalent as for APR </li></ul><ul><li>Each project has distinct change sequence probabilities </li></ul>
  28. 28. Threats to Validity <ul><li>Limited data set </li></ul><ul><ul><li>Only 7 open source projects </li></ul></ul><ul><ul><li>4 of the projects from the same project family (Apache) </li></ul></ul><ul><ul><li>Commercial projects might have different qualities </li></ul></ul><ul><ul><li>Larger dataset would be nice </li></ul></ul><ul><li>Only analyzed C language projects </li></ul><ul><ul><li>Other programming languages may differ </li></ul></ul><ul><ul><li>OO languages especially might be different, due to method overloading </li></ul></ul><ul><li>Some transactions had uncompilable code </li></ul><ul><ul><li>Best effort signature extraction in this case might not be correct </li></ul></ul><ul><ul><li>But, small number of transactions are uncompilable </li></ul></ul><ul><li>Ignored #ifdef </li></ul><ul><ul><li>Total number of signatures larger as a result </li></ul></ul>
  29. 29. Summary
  30. 30. Properties of Signature Changes <ul><li>The most common signature change kinds are, in order, complex data type, parameter addition, parameter ordering, parameter deletion. </li></ul><ul><li>Over half of all function signatures never change. 90% of functions change their signature less than 3 times. </li></ul><ul><li>For every 5 to 15 function body changes, there is a signature change. </li></ul><ul><li>The average number of parameters stays relatively constant over time. </li></ul>
  31. 31. Properties of Signature Changes (cont’d) <ul><li>Functions typically have parameter lists with 1, 2, or 3 parameters. </li></ul><ul><li>There are weak correlations between signature changes and other changes, such as LOC and function body changes. </li></ul><ul><li>Each project has its own signature change patterns, and these patterns can be discovered after analyzing the first 1000 to 1500 transactions. </li></ul><ul><li>Probability of a change kind depends on previous changes. </li></ul><ul><li>Changes that modify a function signature may be slightly more likely to introduce a bug. </li></ul>
  32. 32. Questions?

×