Static and Adaptive Bug Fix Patterns Jim Whitehead, Sung Kim, Kai Pan University of California, Santa Cruz
Bug and Bug Fix Patterns? Are bugs and bug fixes random in their goal and structure, or do they exhibit patterns? We know there are  some  patterns, since there are existing pattern-oriented static analysis tools that are able to detect some bugs Hypothesis: there are both project-specific and project-independent patterns that are detectable in bugs and bug fixes
Static and Adaptive Bug Fix Patterns Static:  syntax-driven change patterns Example: changing an if condition expression Found by statically analyzing code to detect conformance to a pattern Horizontal: same pattern can be found in multiple projects Adaptive:  memory-driven change patterns Example: frequent string literal changes Found by detecting a previous similar bug fix in a project-specific bug fix database, or “memory” Vertical: each pattern is specific to a given project
Promise of Bug and Fix Patterns If bugs exhibit detectable patterns, it would be possible to automatically detect bugs If there are common bug to fix mappings, it would be possible to supply a recommended fix for a detected bug Using bug fix patterns, it would be possible to see the frequency distribution of the patterns Would be useful to understand which kind of patterns occur more frequently Broadly, such patterns would contribute to an improved understanding of maintenance activity
Talk Overview Terminology and Detection of Bug Fix Changes Static Bug Fix Patterns Adaptive Bug Fix Patterns Conclusions
Retrieving Bug Fix Changes Software projects today record their development history using Software Configuration Management tools As developers make changes, they record a reason along with the change In the change log message When developers fix a bug in the software, they tend to record log messages with some variation of the words “fixed” or “bug” “ Fixed null pointer bug” It is possible to mine the change history of a software project to uncover these  bug-fix changes That is, we retrospectively recover those changes that developers have marked as containing a bug fix We assume they are not lying
Bug-introducing and bug-fix changes Development history of foo.java SCM log message:  “Bug #567 fixed” “ bug fix” Bug #567 entered into issue tracking system (bug finally observed and recorded) Software change that introduces the bug  “bug-introducing”
Commits, Transactions & Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Modified button text Added feature Y log message
Hunks, and Hunk Pairs Revision  n-1 (has bug hunks) Revision  n (has fix hunks) modification addition deletion added hunk hunk pair type deleted hunk empty deleted hunk empty added hunk
Kenyon Processing SCM Repository Filesystem Extract Automated configuration extraction Save  Persist gathered metrics & facts Kenyon Repository (RDBMS/ Hibernate) Analyze  Query DB, add new facts Analysis Software (e.g., IVA) Compute Fact extraction (metrics, static analysis) Kenyon
Static Bug Fix Patterns
Static Bug Patterns Performed manual analysis of bug fix hunk pairs in Java programs Examined bug hunks and corresponding fix hunks Looked for syntax patterns of recurring changes Identified 27 static bug fix patterns in Java code
Example Pattern Method Call with Different Actual Parameter Values (MC-DAP) The bug fix changes the expression passed into one or more parameters of a method call - tree.putClientProperty(“JTree.lineStyle”, “Horizontal”); + tree.putClientProperty(“JTree.linStyle”, “Angled”); - = bug revision + = fix revision
Static Bug Fix Pattern Categories Eight categories of static bug fix patterns If-related Method call Sequence Loop Assignment Switch Try Method declaration Class field
If Patterns Addition of precondition check Adds if around existing statement(s) Addition of precondition check with jump Adds if before statement(s) with return/continue/break if condition is met Addition of postcondition check Adds if statement after operation to check results Removal of if predicate Removal of if surrounding statement(s) Addition of else branch Adds an else branch to existing if statement Removal of else branch Remove else branch from existing if statement Change of if condition expression Modify the conditional part of an if statement
Method Call Patterns Method call with different number of parameters of different types of parameters Same method name, but different number of parameters or types of parameters Change of method interface, or use of overloaded method Method call with different actual parameter values Change of class instance method call Fix code calls a different member method of a class instance
Sequence Patterns Addition of operations in an operation sequence of method calls to an object Many calls to the same object all in sequence – add one or more Removal of operations from an operation sequence of method calls to an object Many calls to the same object in sequence – remove one or more Addition of operations in a field setting sequence Removal of operations from a field setting sequence Addition or removal of method calls in a short construct body A short construct body is a short method (2 or 3 statements), or an if or while body that is short (2 or 3 statements)
Loop and Assignment Patterns Change of loop predicate Bug fix changes the loop condition of a loop statement Change of expression that modifies the loop variable Bug fix changes the expression that modifies the loop variable, or adds a statement that modifies the loop variable Change of assignment expression Bug fix changes the expression on the right hand side of an assignment statement
Switch and Try Patterns Addition/removal of switch branch Bug fix adds/removes a case from a switch statement Addition/removal of try statement Bug fix adds a try/catch statement to enclose a section of code, or removes a try/catch statement Addition/removal of a catch block Bug fix adds a catch block to an existing try statement
Method Declaration and Class Field Patterns Change of method delcaration Change to the declared interface for a method Addition of method declaration Adding new method to existing class Removal of method declaration Removal of an existing method Addition of a class field Removal of a class field Change of class field declaration
Ignored Patterns Some bug fixes fall into observable patterns, but are ignored  These changes either do not affect the semantics of the program, or cause little change in behavior Ignored patterns: Changes to comments Addition/removal of debug information Code cleanup Code formatting Addition/removal of output statements Changes to import statements
Evolutionary Pattern Analysis How many bug fixes contain a pattern?  How frequently do these patterns occur in actual bug fixes? Are pattern frequencies consistent across projects?  Analyzed five Java open source project histories Ran bug fix pattern detector program over bug fix changes 535 2,962 Scarab 557 1,190 JEdit 2,807 6,394 Eclipse 797 2,362 Columba 1,310 4,685 ArgoUML Bug Fixes Revisions Project
Pattern Coverage What percentage of bug fixes contain at least one pattern?  (About half)
Frequency of pattern categories 6.7% 8.7% 7.0% 8.4% 7.6% Class field 16.6% 13.4% 13.2% 17.2% 16.2% Method declaration 1.4% 1.0% 2.6% 1.9% 1.0% Try 0.0% 0.6% 1.6% 0.3% 0.0% Switch 7.4% 8.4% 6.4% 7.6% 8.6% Assignment 1.4% 1.6% 2.2% 0.8% 1.9% Loop 9.5% 13.5% 6.5% 17.5% 9.7% Sequence 33.9% 22.2% 26.5% 26.2% 30.3% Method call 23.0% 30.5% 34.0% 20.0% 23.2% If-related Scarab JEdit Eclipse Columba ArgoUML Category
Cross project similarity Pearson correlation between the pattern frequencies across projects. (p-value < 0.001) Projects have surprisingly similar pattern frequencies 1 0.92 0.89 0.93 0.99 Scarab 0.92 1 0.94 0.87 0.93 JEdit 0.89 0.94 1 0.76 0.89 Eclipse 0.93 0.87 0.76 1 0.94 Columba 0.99 0.93 0.89 0.94 1 ArgoUML Scarab JEdit Eclipse Columba ArgoUML
Most frequent individual patterns Only two patterns consistently occur at over  10% frequency 11.0% 13.1% 18.7% 7.0% 10.9% Change of if condition expression 26.1% 15.1% 18.0% 19.9% 24.0% Method call with different actual parameters Scarab JEdit Eclipse Columba ArgoUML Pattern
Diving into if conditionals What is causing if conditionals to be such a prevalent bug fix type?  (no clear answer yet) 21.0% 15.1% 14.6% Decreased number of operators 38.0% 22.3% 22.4% Increased number of operators 15.1% 9.7% 12.0% Removed existing variable 23.7% 14.3% 8.3% Added new variable 11.2% 6.9% 11.5% Removed condition clause 23.1% 20.8% 13.1% Added condition clause JEdit Eclipse ArgoUML
Static Pattern Summary Can automatically detect 27 static bug fix patterns About 50% of all bug fix changes match at least one pattern If conditionals and method call parameter changes are the two most prevalent patterns Pattern frequencies are remarkably similar across analyzed projects
Adaptive Bug Fix Patterns
Project-Specific Bug Fix Patterns There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns Example from Eclipse project: JavaProject.java, transaction 2024 (“Fix for bug 28434”) - if (requiredProjectRsc.exists() &&  requiredProjectRsc.isOpen()) { + if (JavaProject.hasJavaNature(requiredProjectRsc)) DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”) - boolean isOpened=proj.isOpen(); - if (isOpened && this.hasJavaNature(proj)) + if (JavaProject.hasJavaNature(proj))
Detecting Non-Static Patterns Detecting non-static patterns Saving exact code in bug and fix hunks doesn’t work, since there is rarely an exact match. Need a method for abstracting changes to find patterns Approach Abstract code in each bug fix change Save abstracted bug and fix code in a database (the “bug fix memory”) Can search existing code to see if it matches a bug fix pattern Can suggest code to fix the bug
Adaptive Patterns Since the contents of the bug fix memory comes from a specific project its contained patterns  adapt  to that project. The set of known patterns changes over time, as information from new bug fixes is added. Can view the bug fix memory as a kind of  online algorithm for learning  project-specific bug fix patterns
Process for Abstracting Code Four step process Raw component extraction Parse source code in a hunk, and burst out individual syntactic elements Normalization Substitute type names for variables, string literals, constants (abstract to types) Information filtering Remove elements that are too common to yield project-specific patterns Diff filtering Remove code components that are common in bug and fix hunks, yielding only code unique to the change
Raw Component Extraction Step 1: Convert statements inside change hunks so they lie on a single line Eliminate whitespace Concatenate multi-line statements to one line Concatenate conditionals for complex statements (if, while, etc.) to one line Step 2: Extract  raw components Component is a non-leaf node in the syntax tree of a single line Bursts out complex statements into constituent parts Each portion of a complex conditional is a separate component Additionally, separate out a method call and its parameters
Component Extraction Example Initial code if (foo.flag >= 5 && foo.ready()) { i=1; foo.create(“example”); initiate(5,bar); } Extracted Components foo.flag foo.flag >= 5 foo.ready() foo.flag >= 5 && foo.ready () if (foo.flag >=5 && foo.ready()) i=1 “ example” foo.create()  “example” initiate(,)  5, bar if >= && . . foo flag 5 foo ready()
Normalization To further improve the ability to match code, perform abstraction of instances to types Replace variable instance with its type Permits matching on type, rather than instance foo.flag >= 5    Foo.flag >= 5 (type of foo is Foo) For literals, insert new component with type i=1 yields int=1  and  int=int For method calls, replace each parameter with type of parameter Use “*” for unknown types (we only do one-pass parse) initiate(,) 5, bar    initiate(,) int,*  (type of bar is unknown)
Information Filtering Goal After normalization, resulting components are candidates for insertion into database Problem: many commonly occurring statement types int=int Want to eliminate these, and others that don’t contribute unique information about bug fixes
Information Filtering Approach Assign an “information value” to component elements Value 2: method call, string literal longer than 8 chars Value 1:  predicates for: if, do, while, for, as well as conditional expressions return, case, switch, synchronized, throw string literal, length 3-8 chars variable name, field name, class name, variable type Value 0: Everything else Information value for an entire component is the sum of its elemental information values We remove components with information value < 2 int=1 (info value = 1), int=int (info value = 0) “ example” (info value = 1), String (info value = 0)
Diff Filtering and Storing Memories As a final filtering step, keep only those components that are unique to either bug or fix hunks Duplicate components are eliminated, since they do not represent the bug or its fix After diff filtering step, store all components into the database (“memory”) Components record their transaction, file name, bug or fix hunk, etc.  Also store initial source code of bug and fix hunks
Searching the Memory The memory database contains extracted adaptive bug and fix patterns for a given project Can use this memory to find code that matches bug code in the memory Use scenario Developer working in their favorite development environment Receives feedback when code they are developing matches a stored bug pattern Can also suggest potential fixes from stored bug fix code
Evaluation We evaluated the memory to determine how well it captures new bug fix changes Specifically, we create a memory for transactions 1 to  n-1 At transaction  n , for bug fix changes we examine whether the bug hunks are found in the memory This is a “half hit” If found, we also examine whether the fix hunk is found too This is a “full hit” Examined same 5 project histories as for static patterns ArgoUML, Columba, Eclipse, jEdit, Scarab This can be viewed as a proxy for how well the approach might work for bug and fix prediction
True and False Positives Build memories based on transaction  1  ..  n-1 …… False positive half hit, if found True positive half hit, if found Transaction  1  ..  n-1 Memories Non-fix change case at transaction  n Fix change case at transaction  n
True Positive Hit Rates
False Positive Hit Rates
True Positive and False Positive Full Hit Rates
Adaptive Pattern Discussion Adaptive bug patterns work well Captures 19.3%-40.3% of bugs (half-hits) But, also captures a lot of non-bug changes (20.8%-32.5%) High full hit rate for non-fix changes could be due to changes with no added hunk Since there is no code to match in the database, we automatically call this a full hit (might be better to ignore) Adaptive patterns are more project specific than static patterns Better suited for presenting possible bug fixes
Patterns Overall If you were to examine all project transactions Not by time, grouping fix and non-fix changes together A fine-grain characterization of the kinds of changes made over the evolution of a software project? Fix Non-Fix Static Adaptive Static Adaptive
Conclusion It is now possible to reliably extract static and adaptive bug fix patterns from software project evolution data Static patterns are useful for characterizing bug fixes at a fine grain syntactic level Adaptive patterns are useful for identifying potentially buggy code, and making bug fix recommendations at fine granularity

Static and Adaptive Bug Fix Patterns

  • 1.
    Static and AdaptiveBug Fix Patterns Jim Whitehead, Sung Kim, Kai Pan University of California, Santa Cruz
  • 2.
    Bug and BugFix Patterns? Are bugs and bug fixes random in their goal and structure, or do they exhibit patterns? We know there are some patterns, since there are existing pattern-oriented static analysis tools that are able to detect some bugs Hypothesis: there are both project-specific and project-independent patterns that are detectable in bugs and bug fixes
  • 3.
    Static and AdaptiveBug Fix Patterns Static: syntax-driven change patterns Example: changing an if condition expression Found by statically analyzing code to detect conformance to a pattern Horizontal: same pattern can be found in multiple projects Adaptive: memory-driven change patterns Example: frequent string literal changes Found by detecting a previous similar bug fix in a project-specific bug fix database, or “memory” Vertical: each pattern is specific to a given project
  • 4.
    Promise of Bugand Fix Patterns If bugs exhibit detectable patterns, it would be possible to automatically detect bugs If there are common bug to fix mappings, it would be possible to supply a recommended fix for a detected bug Using bug fix patterns, it would be possible to see the frequency distribution of the patterns Would be useful to understand which kind of patterns occur more frequently Broadly, such patterns would contribute to an improved understanding of maintenance activity
  • 5.
    Talk Overview Terminologyand Detection of Bug Fix Changes Static Bug Fix Patterns Adaptive Bug Fix Patterns Conclusions
  • 6.
    Retrieving Bug FixChanges Software projects today record their development history using Software Configuration Management tools As developers make changes, they record a reason along with the change In the change log message When developers fix a bug in the software, they tend to record log messages with some variation of the words “fixed” or “bug” “ Fixed null pointer bug” It is possible to mine the change history of a software project to uncover these bug-fix changes That is, we retrospectively recover those changes that developers have marked as containing a bug fix We assume they are not lying
  • 7.
    Bug-introducing and bug-fixchanges Development history of foo.java SCM log message: “Bug #567 fixed” “ bug fix” Bug #567 entered into issue tracking system (bug finally observed and recorded) Software change that introduces the bug “bug-introducing”
  • 8.
    Commits, Transactions &Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Modified button text Added feature Y log message
  • 9.
    Hunks, and HunkPairs Revision n-1 (has bug hunks) Revision n (has fix hunks) modification addition deletion added hunk hunk pair type deleted hunk empty deleted hunk empty added hunk
  • 10.
    Kenyon Processing SCMRepository Filesystem Extract Automated configuration extraction Save Persist gathered metrics & facts Kenyon Repository (RDBMS/ Hibernate) Analyze Query DB, add new facts Analysis Software (e.g., IVA) Compute Fact extraction (metrics, static analysis) Kenyon
  • 11.
  • 12.
    Static Bug PatternsPerformed manual analysis of bug fix hunk pairs in Java programs Examined bug hunks and corresponding fix hunks Looked for syntax patterns of recurring changes Identified 27 static bug fix patterns in Java code
  • 13.
    Example Pattern MethodCall with Different Actual Parameter Values (MC-DAP) The bug fix changes the expression passed into one or more parameters of a method call - tree.putClientProperty(“JTree.lineStyle”, “Horizontal”); + tree.putClientProperty(“JTree.linStyle”, “Angled”); - = bug revision + = fix revision
  • 14.
    Static Bug FixPattern Categories Eight categories of static bug fix patterns If-related Method call Sequence Loop Assignment Switch Try Method declaration Class field
  • 15.
    If Patterns Additionof precondition check Adds if around existing statement(s) Addition of precondition check with jump Adds if before statement(s) with return/continue/break if condition is met Addition of postcondition check Adds if statement after operation to check results Removal of if predicate Removal of if surrounding statement(s) Addition of else branch Adds an else branch to existing if statement Removal of else branch Remove else branch from existing if statement Change of if condition expression Modify the conditional part of an if statement
  • 16.
    Method Call PatternsMethod call with different number of parameters of different types of parameters Same method name, but different number of parameters or types of parameters Change of method interface, or use of overloaded method Method call with different actual parameter values Change of class instance method call Fix code calls a different member method of a class instance
  • 17.
    Sequence Patterns Additionof operations in an operation sequence of method calls to an object Many calls to the same object all in sequence – add one or more Removal of operations from an operation sequence of method calls to an object Many calls to the same object in sequence – remove one or more Addition of operations in a field setting sequence Removal of operations from a field setting sequence Addition or removal of method calls in a short construct body A short construct body is a short method (2 or 3 statements), or an if or while body that is short (2 or 3 statements)
  • 18.
    Loop and AssignmentPatterns Change of loop predicate Bug fix changes the loop condition of a loop statement Change of expression that modifies the loop variable Bug fix changes the expression that modifies the loop variable, or adds a statement that modifies the loop variable Change of assignment expression Bug fix changes the expression on the right hand side of an assignment statement
  • 19.
    Switch and TryPatterns Addition/removal of switch branch Bug fix adds/removes a case from a switch statement Addition/removal of try statement Bug fix adds a try/catch statement to enclose a section of code, or removes a try/catch statement Addition/removal of a catch block Bug fix adds a catch block to an existing try statement
  • 20.
    Method Declaration andClass Field Patterns Change of method delcaration Change to the declared interface for a method Addition of method declaration Adding new method to existing class Removal of method declaration Removal of an existing method Addition of a class field Removal of a class field Change of class field declaration
  • 21.
    Ignored Patterns Somebug fixes fall into observable patterns, but are ignored These changes either do not affect the semantics of the program, or cause little change in behavior Ignored patterns: Changes to comments Addition/removal of debug information Code cleanup Code formatting Addition/removal of output statements Changes to import statements
  • 22.
    Evolutionary Pattern AnalysisHow many bug fixes contain a pattern? How frequently do these patterns occur in actual bug fixes? Are pattern frequencies consistent across projects? Analyzed five Java open source project histories Ran bug fix pattern detector program over bug fix changes 535 2,962 Scarab 557 1,190 JEdit 2,807 6,394 Eclipse 797 2,362 Columba 1,310 4,685 ArgoUML Bug Fixes Revisions Project
  • 23.
    Pattern Coverage Whatpercentage of bug fixes contain at least one pattern? (About half)
  • 24.
    Frequency of patterncategories 6.7% 8.7% 7.0% 8.4% 7.6% Class field 16.6% 13.4% 13.2% 17.2% 16.2% Method declaration 1.4% 1.0% 2.6% 1.9% 1.0% Try 0.0% 0.6% 1.6% 0.3% 0.0% Switch 7.4% 8.4% 6.4% 7.6% 8.6% Assignment 1.4% 1.6% 2.2% 0.8% 1.9% Loop 9.5% 13.5% 6.5% 17.5% 9.7% Sequence 33.9% 22.2% 26.5% 26.2% 30.3% Method call 23.0% 30.5% 34.0% 20.0% 23.2% If-related Scarab JEdit Eclipse Columba ArgoUML Category
  • 25.
    Cross project similarityPearson correlation between the pattern frequencies across projects. (p-value < 0.001) Projects have surprisingly similar pattern frequencies 1 0.92 0.89 0.93 0.99 Scarab 0.92 1 0.94 0.87 0.93 JEdit 0.89 0.94 1 0.76 0.89 Eclipse 0.93 0.87 0.76 1 0.94 Columba 0.99 0.93 0.89 0.94 1 ArgoUML Scarab JEdit Eclipse Columba ArgoUML
  • 26.
    Most frequent individualpatterns Only two patterns consistently occur at over 10% frequency 11.0% 13.1% 18.7% 7.0% 10.9% Change of if condition expression 26.1% 15.1% 18.0% 19.9% 24.0% Method call with different actual parameters Scarab JEdit Eclipse Columba ArgoUML Pattern
  • 27.
    Diving into ifconditionals What is causing if conditionals to be such a prevalent bug fix type? (no clear answer yet) 21.0% 15.1% 14.6% Decreased number of operators 38.0% 22.3% 22.4% Increased number of operators 15.1% 9.7% 12.0% Removed existing variable 23.7% 14.3% 8.3% Added new variable 11.2% 6.9% 11.5% Removed condition clause 23.1% 20.8% 13.1% Added condition clause JEdit Eclipse ArgoUML
  • 28.
    Static Pattern SummaryCan automatically detect 27 static bug fix patterns About 50% of all bug fix changes match at least one pattern If conditionals and method call parameter changes are the two most prevalent patterns Pattern frequencies are remarkably similar across analyzed projects
  • 29.
  • 30.
    Project-Specific Bug FixPatterns There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns Example from Eclipse project: JavaProject.java, transaction 2024 (“Fix for bug 28434”) - if (requiredProjectRsc.exists() && requiredProjectRsc.isOpen()) { + if (JavaProject.hasJavaNature(requiredProjectRsc)) DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”) - boolean isOpened=proj.isOpen(); - if (isOpened && this.hasJavaNature(proj)) + if (JavaProject.hasJavaNature(proj))
  • 31.
    Detecting Non-Static PatternsDetecting non-static patterns Saving exact code in bug and fix hunks doesn’t work, since there is rarely an exact match. Need a method for abstracting changes to find patterns Approach Abstract code in each bug fix change Save abstracted bug and fix code in a database (the “bug fix memory”) Can search existing code to see if it matches a bug fix pattern Can suggest code to fix the bug
  • 32.
    Adaptive Patterns Sincethe contents of the bug fix memory comes from a specific project its contained patterns adapt to that project. The set of known patterns changes over time, as information from new bug fixes is added. Can view the bug fix memory as a kind of online algorithm for learning project-specific bug fix patterns
  • 33.
    Process for AbstractingCode Four step process Raw component extraction Parse source code in a hunk, and burst out individual syntactic elements Normalization Substitute type names for variables, string literals, constants (abstract to types) Information filtering Remove elements that are too common to yield project-specific patterns Diff filtering Remove code components that are common in bug and fix hunks, yielding only code unique to the change
  • 34.
    Raw Component ExtractionStep 1: Convert statements inside change hunks so they lie on a single line Eliminate whitespace Concatenate multi-line statements to one line Concatenate conditionals for complex statements (if, while, etc.) to one line Step 2: Extract raw components Component is a non-leaf node in the syntax tree of a single line Bursts out complex statements into constituent parts Each portion of a complex conditional is a separate component Additionally, separate out a method call and its parameters
  • 35.
    Component Extraction ExampleInitial code if (foo.flag >= 5 && foo.ready()) { i=1; foo.create(“example”); initiate(5,bar); } Extracted Components foo.flag foo.flag >= 5 foo.ready() foo.flag >= 5 && foo.ready () if (foo.flag >=5 && foo.ready()) i=1 “ example” foo.create() “example” initiate(,) 5, bar if >= && . . foo flag 5 foo ready()
  • 36.
    Normalization To furtherimprove the ability to match code, perform abstraction of instances to types Replace variable instance with its type Permits matching on type, rather than instance foo.flag >= 5  Foo.flag >= 5 (type of foo is Foo) For literals, insert new component with type i=1 yields int=1 and int=int For method calls, replace each parameter with type of parameter Use “*” for unknown types (we only do one-pass parse) initiate(,) 5, bar  initiate(,) int,* (type of bar is unknown)
  • 37.
    Information Filtering GoalAfter normalization, resulting components are candidates for insertion into database Problem: many commonly occurring statement types int=int Want to eliminate these, and others that don’t contribute unique information about bug fixes
  • 38.
    Information Filtering ApproachAssign an “information value” to component elements Value 2: method call, string literal longer than 8 chars Value 1: predicates for: if, do, while, for, as well as conditional expressions return, case, switch, synchronized, throw string literal, length 3-8 chars variable name, field name, class name, variable type Value 0: Everything else Information value for an entire component is the sum of its elemental information values We remove components with information value < 2 int=1 (info value = 1), int=int (info value = 0) “ example” (info value = 1), String (info value = 0)
  • 39.
    Diff Filtering andStoring Memories As a final filtering step, keep only those components that are unique to either bug or fix hunks Duplicate components are eliminated, since they do not represent the bug or its fix After diff filtering step, store all components into the database (“memory”) Components record their transaction, file name, bug or fix hunk, etc. Also store initial source code of bug and fix hunks
  • 40.
    Searching the MemoryThe memory database contains extracted adaptive bug and fix patterns for a given project Can use this memory to find code that matches bug code in the memory Use scenario Developer working in their favorite development environment Receives feedback when code they are developing matches a stored bug pattern Can also suggest potential fixes from stored bug fix code
  • 41.
    Evaluation We evaluatedthe memory to determine how well it captures new bug fix changes Specifically, we create a memory for transactions 1 to n-1 At transaction n , for bug fix changes we examine whether the bug hunks are found in the memory This is a “half hit” If found, we also examine whether the fix hunk is found too This is a “full hit” Examined same 5 project histories as for static patterns ArgoUML, Columba, Eclipse, jEdit, Scarab This can be viewed as a proxy for how well the approach might work for bug and fix prediction
  • 42.
    True and FalsePositives Build memories based on transaction 1 .. n-1 …… False positive half hit, if found True positive half hit, if found Transaction 1 .. n-1 Memories Non-fix change case at transaction n Fix change case at transaction n
  • 43.
  • 44.
  • 45.
    True Positive andFalse Positive Full Hit Rates
  • 46.
    Adaptive Pattern DiscussionAdaptive bug patterns work well Captures 19.3%-40.3% of bugs (half-hits) But, also captures a lot of non-bug changes (20.8%-32.5%) High full hit rate for non-fix changes could be due to changes with no added hunk Since there is no code to match in the database, we automatically call this a full hit (might be better to ignore) Adaptive patterns are more project specific than static patterns Better suited for presenting possible bug fixes
  • 47.
    Patterns Overall Ifyou were to examine all project transactions Not by time, grouping fix and non-fix changes together A fine-grain characterization of the kinds of changes made over the evolution of a software project? Fix Non-Fix Static Adaptive Static Adaptive
  • 48.
    Conclusion It isnow possible to reliably extract static and adaptive bug fix patterns from software project evolution data Static patterns are useful for characterizing bug fixes at a fine grain syntactic level Adaptive patterns are useful for identifying potentially buggy code, and making bug fix recommendations at fine granularity