Static and Adaptive Bug Fix Patterns


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Static and Adaptive Bug Fix Patterns

  1. 1. Static and Adaptive Bug Fix Patterns Jim Whitehead, Sung Kim, Kai Pan University of California, Santa Cruz
  2. 2. Bug and Bug Fix Patterns? <ul><li>Are bugs and bug fixes random in their goal and structure, or do they exhibit patterns? </li></ul><ul><li>We know there are some patterns, since there are existing pattern-oriented static analysis tools that are able to detect some bugs </li></ul><ul><li>Hypothesis: there are both project-specific and project-independent patterns that are detectable in bugs and bug fixes </li></ul>
  3. 3. Static and Adaptive Bug Fix Patterns <ul><li>Static: syntax-driven change patterns </li></ul><ul><ul><li>Example: changing an if condition expression </li></ul></ul><ul><ul><li>Found by statically analyzing code to detect conformance to a pattern </li></ul></ul><ul><ul><li>Horizontal: same pattern can be found in multiple projects </li></ul></ul><ul><li>Adaptive: memory-driven change patterns </li></ul><ul><ul><li>Example: frequent string literal changes </li></ul></ul><ul><ul><li>Found by detecting a previous similar bug fix in a project-specific bug fix database, or “memory” </li></ul></ul><ul><ul><li>Vertical: each pattern is specific to a given project </li></ul></ul>
  4. 4. Promise of Bug and Fix Patterns <ul><li>If bugs exhibit detectable patterns, it would be possible to automatically detect bugs </li></ul><ul><li>If there are common bug to fix mappings, it would be possible to supply a recommended fix for a detected bug </li></ul><ul><li>Using bug fix patterns, it would be possible to see the frequency distribution of the patterns </li></ul><ul><ul><li>Would be useful to understand which kind of patterns occur more frequently </li></ul></ul><ul><li>Broadly, such patterns would contribute to an improved understanding of maintenance activity </li></ul>
  5. 5. Talk Overview <ul><li>Terminology and Detection of Bug Fix Changes </li></ul><ul><li>Static Bug Fix Patterns </li></ul><ul><li>Adaptive Bug Fix Patterns </li></ul><ul><li>Conclusions </li></ul>
  6. 6. Retrieving Bug Fix Changes <ul><li>Software projects today record their development history using Software Configuration Management tools </li></ul><ul><li>As developers make changes, they record a reason along with the change </li></ul><ul><ul><li>In the change log message </li></ul></ul><ul><li>When developers fix a bug in the software, they tend to record log messages with some variation of the words “fixed” or “bug” </li></ul><ul><ul><li>“ Fixed null pointer bug” </li></ul></ul><ul><li>It is possible to mine the change history of a software project to uncover these bug-fix changes </li></ul><ul><li>That is, we retrospectively recover those changes that developers have marked as containing a bug fix </li></ul><ul><ul><li>We assume they are not lying </li></ul></ul>
  7. 7. Bug-introducing and bug-fix changes Development history of SCM log message: “Bug #567 fixed” “ bug fix” Bug #567 entered into issue tracking system (bug finally observed and recorded) Software change that introduces the bug “bug-introducing”
  8. 8. Commits, Transactions & Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Modified button text Added feature Y log message
  9. 9. Hunks, and Hunk Pairs Revision n-1 (has bug hunks) Revision n (has fix hunks) modification addition deletion added hunk hunk pair type deleted hunk empty deleted hunk empty added hunk
  10. 10. Kenyon Processing SCM Repository Filesystem Extract Automated configuration extraction Save Persist gathered metrics & facts Kenyon Repository (RDBMS/ Hibernate) Analyze Query DB, add new facts Analysis Software (e.g., IVA) Compute Fact extraction (metrics, static analysis) Kenyon
  11. 11. Static Bug Fix Patterns
  12. 12. Static Bug Patterns <ul><li>Performed manual analysis of bug fix hunk pairs in Java programs </li></ul><ul><ul><li>Examined bug hunks and corresponding fix hunks </li></ul></ul><ul><ul><li>Looked for syntax patterns of recurring changes </li></ul></ul><ul><ul><li>Identified 27 static bug fix patterns in Java code </li></ul></ul>
  13. 13. Example Pattern <ul><li>Method Call with Different Actual Parameter Values (MC-DAP) </li></ul><ul><ul><li>The bug fix changes the expression passed into one or more parameters of a method call </li></ul></ul><ul><ul><li>- tree.putClientProperty(“JTree.lineStyle”, “Horizontal”); </li></ul></ul><ul><ul><li>+ tree.putClientProperty(“JTree.linStyle”, “Angled”); </li></ul></ul><ul><ul><li>- = bug revision </li></ul></ul><ul><ul><li>+ = fix revision </li></ul></ul>
  14. 14. Static Bug Fix Pattern Categories <ul><li>Eight categories of static bug fix patterns </li></ul><ul><ul><li>If-related </li></ul></ul><ul><ul><li>Method call </li></ul></ul><ul><ul><li>Sequence </li></ul></ul><ul><ul><li>Loop </li></ul></ul><ul><ul><li>Assignment </li></ul></ul><ul><ul><li>Switch </li></ul></ul><ul><ul><li>Try </li></ul></ul><ul><ul><li>Method declaration </li></ul></ul><ul><ul><li>Class field </li></ul></ul>
  15. 15. If Patterns <ul><li>Addition of precondition check </li></ul><ul><ul><li>Adds if around existing statement(s) </li></ul></ul><ul><li>Addition of precondition check with jump </li></ul><ul><ul><li>Adds if before statement(s) with return/continue/break if condition is met </li></ul></ul><ul><li>Addition of postcondition check </li></ul><ul><ul><li>Adds if statement after operation to check results </li></ul></ul><ul><li>Removal of if predicate </li></ul><ul><ul><li>Removal of if surrounding statement(s) </li></ul></ul><ul><li>Addition of else branch </li></ul><ul><ul><li>Adds an else branch to existing if statement </li></ul></ul><ul><li>Removal of else branch </li></ul><ul><ul><li>Remove else branch from existing if statement </li></ul></ul><ul><li>Change of if condition expression </li></ul><ul><ul><li>Modify the conditional part of an if statement </li></ul></ul>
  16. 16. Method Call Patterns <ul><li>Method call with different number of parameters of different types of parameters </li></ul><ul><ul><li>Same method name, but different number of parameters or types of parameters </li></ul></ul><ul><ul><li>Change of method interface, or use of overloaded method </li></ul></ul><ul><li>Method call with different actual parameter values </li></ul><ul><li>Change of class instance method call </li></ul><ul><ul><li>Fix code calls a different member method of a class instance </li></ul></ul>
  17. 17. Sequence Patterns <ul><li>Addition of operations in an operation sequence of method calls to an object </li></ul><ul><ul><li>Many calls to the same object all in sequence – add one or more </li></ul></ul><ul><li>Removal of operations from an operation sequence of method calls to an object </li></ul><ul><ul><li>Many calls to the same object in sequence – remove one or more </li></ul></ul><ul><li>Addition of operations in a field setting sequence </li></ul><ul><li>Removal of operations from a field setting sequence </li></ul><ul><li>Addition or removal of method calls in a short construct body </li></ul><ul><ul><li>A short construct body is a short method (2 or 3 statements), or an if or while body that is short (2 or 3 statements) </li></ul></ul>
  18. 18. Loop and Assignment Patterns <ul><li>Change of loop predicate </li></ul><ul><ul><li>Bug fix changes the loop condition of a loop statement </li></ul></ul><ul><li>Change of expression that modifies the loop variable </li></ul><ul><ul><li>Bug fix changes the expression that modifies the loop variable, or adds a statement that modifies the loop variable </li></ul></ul><ul><li>Change of assignment expression </li></ul><ul><ul><li>Bug fix changes the expression on the right hand side of an assignment statement </li></ul></ul>
  19. 19. Switch and Try Patterns <ul><li>Addition/removal of switch branch </li></ul><ul><ul><li>Bug fix adds/removes a case from a switch statement </li></ul></ul><ul><li>Addition/removal of try statement </li></ul><ul><ul><li>Bug fix adds a try/catch statement to enclose a section of code, or removes a try/catch statement </li></ul></ul><ul><li>Addition/removal of a catch block </li></ul><ul><ul><li>Bug fix adds a catch block to an existing try statement </li></ul></ul>
  20. 20. Method Declaration and Class Field Patterns <ul><li>Change of method delcaration </li></ul><ul><ul><li>Change to the declared interface for a method </li></ul></ul><ul><li>Addition of method declaration </li></ul><ul><ul><li>Adding new method to existing class </li></ul></ul><ul><li>Removal of method declaration </li></ul><ul><ul><li>Removal of an existing method </li></ul></ul><ul><li>Addition of a class field </li></ul><ul><li>Removal of a class field </li></ul><ul><li>Change of class field declaration </li></ul>
  21. 21. Ignored Patterns <ul><li>Some bug fixes fall into observable patterns, but are ignored </li></ul><ul><ul><li>These changes either do not affect the semantics of the program, or cause little change in behavior </li></ul></ul><ul><li>Ignored patterns: </li></ul><ul><ul><li>Changes to comments </li></ul></ul><ul><ul><li>Addition/removal of debug information </li></ul></ul><ul><ul><li>Code cleanup </li></ul></ul><ul><ul><li>Code formatting </li></ul></ul><ul><ul><li>Addition/removal of output statements </li></ul></ul><ul><ul><li>Changes to import statements </li></ul></ul>
  22. 22. Evolutionary Pattern Analysis <ul><li>How many bug fixes contain a pattern? </li></ul><ul><li>How frequently do these patterns occur in actual bug fixes? </li></ul><ul><li>Are pattern frequencies consistent across projects? </li></ul><ul><li>Analyzed five Java open source project histories </li></ul><ul><li>Ran bug fix pattern detector program over bug fix changes </li></ul>535 2,962 Scarab 557 1,190 JEdit 2,807 6,394 Eclipse 797 2,362 Columba 1,310 4,685 ArgoUML Bug Fixes Revisions Project
  23. 23. Pattern Coverage <ul><li>What percentage of bug fixes contain at least one pattern? (About half) </li></ul>
  24. 24. Frequency of pattern categories 6.7% 8.7% 7.0% 8.4% 7.6% Class field 16.6% 13.4% 13.2% 17.2% 16.2% Method declaration 1.4% 1.0% 2.6% 1.9% 1.0% Try 0.0% 0.6% 1.6% 0.3% 0.0% Switch 7.4% 8.4% 6.4% 7.6% 8.6% Assignment 1.4% 1.6% 2.2% 0.8% 1.9% Loop 9.5% 13.5% 6.5% 17.5% 9.7% Sequence 33.9% 22.2% 26.5% 26.2% 30.3% Method call 23.0% 30.5% 34.0% 20.0% 23.2% If-related Scarab JEdit Eclipse Columba ArgoUML Category
  25. 25. Cross project similarity <ul><li>Pearson correlation between the pattern frequencies across projects. (p-value < 0.001) </li></ul><ul><li>Projects have surprisingly similar pattern frequencies </li></ul>1 0.92 0.89 0.93 0.99 Scarab 0.92 1 0.94 0.87 0.93 JEdit 0.89 0.94 1 0.76 0.89 Eclipse 0.93 0.87 0.76 1 0.94 Columba 0.99 0.93 0.89 0.94 1 ArgoUML Scarab JEdit Eclipse Columba ArgoUML
  26. 26. Most frequent individual patterns <ul><li>Only two patterns consistently occur at over 10% frequency </li></ul>11.0% 13.1% 18.7% 7.0% 10.9% Change of if condition expression 26.1% 15.1% 18.0% 19.9% 24.0% Method call with different actual parameters Scarab JEdit Eclipse Columba ArgoUML Pattern
  27. 27. Diving into if conditionals <ul><li>What is causing if conditionals to be such a prevalent bug fix type? (no clear answer yet) </li></ul>21.0% 15.1% 14.6% Decreased number of operators 38.0% 22.3% 22.4% Increased number of operators 15.1% 9.7% 12.0% Removed existing variable 23.7% 14.3% 8.3% Added new variable 11.2% 6.9% 11.5% Removed condition clause 23.1% 20.8% 13.1% Added condition clause JEdit Eclipse ArgoUML
  28. 28. Static Pattern Summary <ul><li>Can automatically detect 27 static bug fix patterns </li></ul><ul><li>About 50% of all bug fix changes match at least one pattern </li></ul><ul><li>If conditionals and method call parameter changes are the two most prevalent patterns </li></ul><ul><li>Pattern frequencies are remarkably similar across analyzed projects </li></ul>
  29. 29. <ul><li>Adaptive Bug Fix Patterns </li></ul>
  30. 30. Project-Specific Bug Fix Patterns <ul><li>There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns </li></ul><ul><li>Example from Eclipse project: </li></ul><ul><ul><li>, transaction 2024 (“Fix for bug 28434”) </li></ul></ul><ul><ul><li>- if (requiredProjectRsc.exists() && requiredProjectRsc.isOpen()) { </li></ul></ul><ul><ul><li>+ if (JavaProject.hasJavaNature(requiredProjectRsc)) </li></ul></ul><ul><ul><li>, transaction 1945 (“Fix for bug 27499”) </li></ul></ul><ul><ul><li>- boolean isOpened=proj.isOpen(); </li></ul></ul><ul><ul><li>- if (isOpened && this.hasJavaNature(proj)) </li></ul></ul><ul><ul><li>+ if (JavaProject.hasJavaNature(proj)) </li></ul></ul>
  31. 31. Detecting Non-Static Patterns <ul><li>Detecting non-static patterns </li></ul><ul><ul><li>Saving exact code in bug and fix hunks doesn’t work, since there is rarely an exact match. </li></ul></ul><ul><ul><li>Need a method for abstracting changes to find patterns </li></ul></ul><ul><li>Approach </li></ul><ul><ul><li>Abstract code in each bug fix change </li></ul></ul><ul><ul><li>Save abstracted bug and fix code in a database (the “bug fix memory”) </li></ul></ul><ul><ul><li>Can search existing code to see if it matches a bug fix pattern </li></ul></ul><ul><ul><li>Can suggest code to fix the bug </li></ul></ul>
  32. 32. Adaptive Patterns <ul><li>Since the contents of the bug fix memory comes from a specific project its contained patterns adapt to that project. </li></ul><ul><li>The set of known patterns changes over time, as information from new bug fixes is added. </li></ul><ul><li>Can view the bug fix memory as a kind of online algorithm for learning project-specific bug fix patterns </li></ul>
  33. 33. Process for Abstracting Code <ul><li>Four step process </li></ul><ul><ul><li>Raw component extraction </li></ul></ul><ul><ul><ul><li>Parse source code in a hunk, and burst out individual syntactic elements </li></ul></ul></ul><ul><ul><li>Normalization </li></ul></ul><ul><ul><ul><li>Substitute type names for variables, string literals, constants (abstract to types) </li></ul></ul></ul><ul><ul><li>Information filtering </li></ul></ul><ul><ul><ul><li>Remove elements that are too common to yield project-specific patterns </li></ul></ul></ul><ul><ul><li>Diff filtering </li></ul></ul><ul><ul><ul><li>Remove code components that are common in bug and fix hunks, yielding only code unique to the change </li></ul></ul></ul>
  34. 34. Raw Component Extraction <ul><li>Step 1: Convert statements inside change hunks so they lie on a single line </li></ul><ul><ul><li>Eliminate whitespace </li></ul></ul><ul><ul><li>Concatenate multi-line statements to one line </li></ul></ul><ul><ul><li>Concatenate conditionals for complex statements (if, while, etc.) to one line </li></ul></ul><ul><li>Step 2: Extract raw components </li></ul><ul><ul><li>Component is a non-leaf node in the syntax tree of a single line </li></ul></ul><ul><ul><li>Bursts out complex statements into constituent parts </li></ul></ul><ul><ul><ul><li>Each portion of a complex conditional is a separate component </li></ul></ul></ul><ul><ul><li>Additionally, separate out a method call and its parameters </li></ul></ul>
  35. 35. Component Extraction Example <ul><li>Initial code </li></ul><ul><ul><li>if (foo.flag >= 5 && </li></ul></ul><ul><ul><li>foo.ready()) { </li></ul></ul><ul><ul><li>i=1; </li></ul></ul><ul><ul><li>foo.create(“example”); </li></ul></ul><ul><ul><li>initiate(5,bar); </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><li>Extracted Components </li></ul><ul><ul><li>foo.flag </li></ul></ul><ul><ul><li>foo.flag >= 5 </li></ul></ul><ul><ul><li>foo.ready() </li></ul></ul><ul><ul><li>foo.flag >= 5 && foo.ready () </li></ul></ul><ul><ul><li>if (foo.flag >=5 && foo.ready()) </li></ul></ul><ul><ul><li>i=1 </li></ul></ul><ul><ul><li>“ example” </li></ul></ul><ul><ul><li>foo.create() “example” </li></ul></ul><ul><ul><li>initiate(,) 5, bar </li></ul></ul>if >= && . . foo flag 5 foo ready()
  36. 36. Normalization <ul><li>To further improve the ability to match code, perform abstraction of instances to types </li></ul><ul><ul><li>Replace variable instance with its type </li></ul></ul><ul><ul><ul><li>Permits matching on type, rather than instance </li></ul></ul></ul><ul><ul><ul><li>foo.flag >= 5  Foo.flag >= 5 (type of foo is Foo) </li></ul></ul></ul><ul><ul><li>For literals, insert new component with type </li></ul></ul><ul><ul><ul><li>i=1 yields int=1 and int=int </li></ul></ul></ul><ul><ul><li>For method calls, replace each parameter with type of parameter </li></ul></ul><ul><ul><ul><li>Use “*” for unknown types (we only do one-pass parse) </li></ul></ul></ul><ul><ul><ul><li>initiate(,) 5, bar  initiate(,) int,* (type of bar is unknown) </li></ul></ul></ul>
  37. 37. Information Filtering Goal <ul><li>After normalization, resulting components are candidates for insertion into database </li></ul><ul><ul><li>Problem: many commonly occurring statement types </li></ul></ul><ul><ul><ul><li>int=int </li></ul></ul></ul><ul><ul><li>Want to eliminate these, and others that don’t contribute unique information about bug fixes </li></ul></ul>
  38. 38. Information Filtering Approach <ul><li>Assign an “information value” to component elements </li></ul><ul><ul><li>Value 2: </li></ul></ul><ul><ul><ul><li>method call, string literal longer than 8 chars </li></ul></ul></ul><ul><ul><li>Value 1: </li></ul></ul><ul><ul><ul><li>predicates for: if, do, while, for, as well as conditional expressions </li></ul></ul></ul><ul><ul><ul><li>return, case, switch, synchronized, throw </li></ul></ul></ul><ul><ul><ul><li>string literal, length 3-8 chars </li></ul></ul></ul><ul><ul><ul><li>variable name, field name, class name, variable type </li></ul></ul></ul><ul><ul><li>Value 0: </li></ul></ul><ul><ul><ul><li>Everything else </li></ul></ul></ul><ul><li>Information value for an entire component is the sum of its elemental information values </li></ul><ul><li>We remove components with information value < 2 </li></ul><ul><ul><li>int=1 (info value = 1), int=int (info value = 0) </li></ul></ul><ul><ul><li>“ example” (info value = 1), String (info value = 0) </li></ul></ul>
  39. 39. Diff Filtering and Storing Memories <ul><li>As a final filtering step, keep only those components that are unique to either bug or fix hunks </li></ul><ul><ul><li>Duplicate components are eliminated, since they do not represent the bug or its fix </li></ul></ul><ul><li>After diff filtering step, store all components into the database (“memory”) </li></ul><ul><ul><li>Components record their transaction, file name, bug or fix hunk, etc. </li></ul></ul><ul><ul><li>Also store initial source code of bug and fix hunks </li></ul></ul>
  40. 40. Searching the Memory <ul><li>The memory database contains extracted adaptive bug and fix patterns for a given project </li></ul><ul><li>Can use this memory to find code that matches bug code in the memory </li></ul><ul><li>Use scenario </li></ul><ul><ul><li>Developer working in their favorite development environment </li></ul></ul><ul><ul><li>Receives feedback when code they are developing matches a stored bug pattern </li></ul></ul><ul><ul><li>Can also suggest potential fixes from stored bug fix code </li></ul></ul>
  41. 41. Evaluation <ul><li>We evaluated the memory to determine how well it captures new bug fix changes </li></ul><ul><ul><li>Specifically, we create a memory for transactions 1 to n-1 </li></ul></ul><ul><ul><li>At transaction n , for bug fix changes we examine whether the bug hunks are found in the memory </li></ul></ul><ul><ul><ul><li>This is a “half hit” </li></ul></ul></ul><ul><ul><li>If found, we also examine whether the fix hunk is found too </li></ul></ul><ul><ul><ul><li>This is a “full hit” </li></ul></ul></ul><ul><ul><li>Examined same 5 project histories as for static patterns </li></ul></ul><ul><ul><ul><li>ArgoUML, Columba, Eclipse, jEdit, Scarab </li></ul></ul></ul><ul><li>This can be viewed as a proxy for how well the approach might work for bug and fix prediction </li></ul>
  42. 42. True and False Positives Build memories based on transaction 1 .. n-1 …… False positive half hit, if found True positive half hit, if found Transaction 1 .. n-1 Memories Non-fix change case at transaction n Fix change case at transaction n
  43. 43. True Positive Hit Rates
  44. 44. False Positive Hit Rates
  45. 45. True Positive and False Positive Full Hit Rates
  46. 46. Adaptive Pattern Discussion <ul><li>Adaptive bug patterns work well </li></ul><ul><ul><li>Captures 19.3%-40.3% of bugs (half-hits) </li></ul></ul><ul><ul><li>But, also captures a lot of non-bug changes (20.8%-32.5%) </li></ul></ul><ul><ul><li>High full hit rate for non-fix changes could be due to changes with no added hunk </li></ul></ul><ul><ul><ul><li>Since there is no code to match in the database, we automatically call this a full hit (might be better to ignore) </li></ul></ul></ul><ul><li>Adaptive patterns are more project specific than static patterns </li></ul><ul><ul><li>Better suited for presenting possible bug fixes </li></ul></ul>
  47. 47. Patterns Overall <ul><li>If you were to examine all project transactions </li></ul><ul><ul><li>Not by time, grouping fix and non-fix changes together </li></ul></ul><ul><li>A fine-grain characterization of the kinds of changes made over the evolution of a software project? </li></ul>Fix Non-Fix Static Adaptive Static Adaptive
  48. 48. Conclusion <ul><li>It is now possible to reliably extract static and adaptive bug fix patterns from software project evolution data </li></ul><ul><li>Static patterns are useful for characterizing bug fixes at a fine grain syntactic level </li></ul><ul><li>Adaptive patterns are useful for identifying potentially buggy code, and making bug fix recommendations at fine granularity </li></ul>