Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Memories of Bug-Fixes Sunghun Kim, Kai Pan, Jim Whitehead {hunkim, pankai, ejw}@cs.ucsc.edu University of California, Sant...
What is a bug (Zeller 2006)? <ul><li>This pointer, being null, is a bug </li></ul><ul><ul><li>An incorrect program state <...
Bugs? <ul><li>//null dereference </li></ul><ul><li>public nullDeref () { </li></ul><ul><li>    MyObject o = null;     if (...
Bugs? <ul><li>//null dereference </li></ul><ul><li>public nullDeref () { </li></ul><ul><li>    MyObject o = null;     if (...
Bugs? <ul><li>//stack buffer overun for sizes greater than 14  stack_buffer(void* src, int size ) {      char buffer[14]; ...
Bugs? <ul><li>//stack buffer over-run for sizes greater than 14  stack_buffer(void* src, int size ) {      char buffer[14]...
Bugs? <ul><li>if (…) { </li></ul><ul><ul><li>setSelectedText(&quot;	&quot;); </li></ul></ul><ul><ul><li>} </li></ul></ul>
<ul><li>There are many bug fix patterns that are specific to an individual project, and may not match one of the static pa...
Bug? <ul><ul><li>if (requiredProjectRsc.exists() && </li></ul></ul><ul><ul><li>requiredProjectRsc.isOpen()) { </li></ul></...
<ul><li>Example from Eclipse project: </li></ul><ul><ul><li>JavaProject.java, transaction 2024 (“Fix for bug 28434”) </li>...
Horizontal and Vertical Bug Patterns Buffer  over run Horizontal : general bugs Vertical : project specific Null  derefere...
Bug-Fix Memories  – Basic Idea Extract patterns in bug fix change history …… Bug fix changes in revision  1  ..  n-1 Memory
Bug-Fix Memories  – Basic Idea Extract patterns in bug fix change history …… Search for patterns against Memory Bug fix ch...
Talk Overview <ul><li>Detection of bug fix changes </li></ul><ul><li>Mining vertical bugs </li></ul><ul><ul><li>Abstractin...
Retrieving Bug Fix Changes <ul><li>Software projects today record their development history using Software Configuration M...
Bug-introducing and Bug-fix Changes Development history of foo.java SCM log message:  “Bug #567 fixed” “ bug fix” Bug #567...
Kenyon Processing SCM Repository Filesystem Extract Automated configuration extraction Save  Persist gathered metrics & fa...
Commits, Transactions & Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Mod...
Hunks, and Hunk Pairs Revision  n-1 (has  bug  hunks) Revision  n (has  fix  hunks) modification addition deletion added h...
Detecting Vertical Bugs (Patterns) <ul><li>Detecting bug patterns </li></ul><ul><ul><li>Saving exact code in bug and fix h...
Process for Abstracting Code <ul><li>Four step process </li></ul><ul><ul><li>Raw component extraction </li></ul></ul><ul><...
Raw Component Extraction <ul><li>Step 1: Convert statements inside change hunks so they lie on a single line </li></ul><ul...
Raw Component Extraction Example <ul><li>Initial code </li></ul><ul><ul><li>if (foo.flag > 5 &&  foo.ready()) { </li></ul>...
Normalization <ul><li>To further improve the ability to match code, perform abstraction of instances to types </li></ul><u...
Information Filtering Goal <ul><li>After normalization, resulting components are candidates for insertion into database </...
Information Filtering Approach <ul><li>Assign an “information value” to component elements </li></ul><ul><ul><li>Value 2: ...
Diff Filtering and Storing Memories <ul><li>As a final filtering step, keep only those components that are unique to eithe...
Searching the Memory <ul><li>The memory database contains extracted adaptive bug and fix patterns for a given project </li...
IDE Integration Bug  detection Fix  suggestion
Evaluation <ul><li>We evaluated the memory to determine how well it captures new bug fix changes </li></ul><ul><ul><li>Onl...
Half and Full Hit Build memories based on transaction  1  ..  n-1 …… Transaction  1  ..  n-1 Memories Bug  |  Fix Fix  cha...
True and False Positives Build memories based on transaction  1  ..  n-1 …… False positive half hit, if found True positiv...
True Positive Hit Rates
False Positive Hit Rates
True Positive and False Positive Full Hit Rates
True Positive and False Positive Full Hit Rates <ul><li>Bug fix memories work well </li></ul><ul><ul><li>Captures 19.3%-40...
PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug
PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD
PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD Fix Memories
PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD Fix Memories
PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul><ul><li>Found bugs by PMD...
Conclusions <ul><li>It is now possible to reliably extract bug fix memories from software project evolution data </li></ul...
Future Work <ul><li>Developing other pattern extracting algorithms </li></ul><ul><ul><li>To remove false positives </li></...
Upcoming SlideShare
Loading in …5
×

Memories of Bug Fixes

2,229 views

Published on

Published in: Technology
  • Be the first to comment

Memories of Bug Fixes

  1. 1. Memories of Bug-Fixes Sunghun Kim, Kai Pan, Jim Whitehead {hunkim, pankai, ejw}@cs.ucsc.edu University of California, Santa Cruz
  2. 2. What is a bug (Zeller 2006)? <ul><li>This pointer, being null, is a bug </li></ul><ul><ul><li>An incorrect program state </li></ul></ul><ul><li>This software crashes; this is a bug </li></ul><ul><ul><li>An incorrect program execution </li></ul></ul><ul><li>This line 11 is buggy </li></ul><ul><ul><li>An incorrect program code </li></ul></ul>
  3. 3. Bugs? <ul><li>//null dereference </li></ul><ul><li>public nullDeref () { </li></ul><ul><li>    MyObject o = null;    if (isGoodDay) { </li></ul><ul><li>o = new MyObject(“Hi”); </li></ul><ul><li>}     </li></ul><ul><li> System.out.println(o.toString()); } </li></ul>
  4. 4. Bugs? <ul><li>//null dereference </li></ul><ul><li>public nullDeref () { </li></ul><ul><li>    MyObject o = null;    if (isGoodDay) { </li></ul><ul><li>o = new MyObject(“Hi”); </li></ul><ul><li>}     </li></ul><ul><li> System.out.println(o.toString()); } </li></ul>
  5. 5. Bugs? <ul><li>//stack buffer overun for sizes greater than 14 stack_buffer(void* src, int size ) {     char buffer[14];     memcpy(buffer, src, size );   } </li></ul>
  6. 6. Bugs? <ul><li>//stack buffer over-run for sizes greater than 14 stack_buffer(void* src, int size ) {     char buffer[14];     memcpy(buffer, src, size );   } </li></ul>
  7. 7. Bugs? <ul><li>if (…) { </li></ul><ul><ul><li>setSelectedText(&quot; &quot;); </li></ul></ul><ul><ul><li>} </li></ul></ul>
  8. 8. <ul><li>There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns </li></ul><ul><li>Example from jEdit project: </li></ul>Project-Specific Bug Fix Patterns JEditTextArea.java at transaction 114 - setSelectedText(&quot; &quot;); + insertTab(); JEditTextArea.java at transaction 86 - setSelectedText(&quot; &quot;); + insertTab();
  9. 9. Bug? <ul><ul><li>if (requiredProjectRsc.exists() && </li></ul></ul><ul><ul><li>requiredProjectRsc.isOpen()) { </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li>} </li></ul></ul>
  10. 10. <ul><li>Example from Eclipse project: </li></ul><ul><ul><li>JavaProject.java, transaction 2024 (“Fix for bug 28434”) </li></ul></ul><ul><ul><li>- if (requiredProjectRsc.exists() && </li></ul></ul><ul><ul><li>- requiredProjectRsc.isOpen ()) { </li></ul></ul><ul><ul><li>+ if ( JavaProject.hasJavaNature (requiredProjectRsc)) { </li></ul></ul><ul><ul><li>DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”) </li></ul></ul><ul><ul><li>- boolean isOpened= proj.isOpen (); </li></ul></ul><ul><ul><li>- if (isOpened && this.hasJavaNature(proj)) </li></ul></ul><ul><ul><li>+ if ( JavaProject.hasJavaNature (proj)) </li></ul></ul>Project-Specific Bug Fix Patterns
  11. 11. Horizontal and Vertical Bug Patterns Buffer over run Horizontal : general bugs Vertical : project specific Null dereference JEdit example Eclipse example
  12. 12. Bug-Fix Memories – Basic Idea Extract patterns in bug fix change history …… Bug fix changes in revision 1 .. n-1 Memory
  13. 13. Bug-Fix Memories – Basic Idea Extract patterns in bug fix change history …… Search for patterns against Memory Bug fix changes in revision 1 .. n-1 Memory Code to examine
  14. 14. Talk Overview <ul><li>Detection of bug fix changes </li></ul><ul><li>Mining vertical bugs </li></ul><ul><ul><li>Abstracting code </li></ul></ul><ul><li>Evaluation </li></ul><ul><li>Conclusions </li></ul><ul><li>Future Work </li></ul>
  15. 15. Retrieving Bug Fix Changes <ul><li>Software projects today record their development history using Software Configuration Management tools </li></ul><ul><li>As developers make changes, they record a reason along with the change </li></ul><ul><ul><li>In the change log message </li></ul></ul><ul><li>When developers fix a bug in the software, they tend to record log messages with some variation of the words “fixed” or “bug” </li></ul><ul><ul><li>“ Fixed null pointer bug” </li></ul></ul><ul><li>It is possible to mine the change history of a software project to uncover these bug-fix changes </li></ul><ul><li>That is, we retrospectively recover those changes that developers have marked as containing a bug fix </li></ul><ul><ul><li>We assume they are not lying </li></ul></ul>
  16. 16. Bug-introducing and Bug-fix Changes Development history of foo.java SCM log message: “Bug #567 fixed” “ bug fix” Bug #567 entered into issue tracking system (bug finally observed and recorded) Software change that introduces the bug “bug-introducing”
  17. 17. Kenyon Processing SCM Repository Filesystem Extract Automated configuration extraction Save Persist gathered metrics & facts Kenyon Repository (RDBMS/ Hibernate) Analyze Query DB, add new facts Analysis Software (e.g., IVA) Compute Fact extraction (metrics, static analysis) Kenyon
  18. 18. Commits, Transactions & Configurations transactions configurations CVS file commits Added feature X Fixed null ptr bug Modified button text Added feature Y log message
  19. 19. Hunks, and Hunk Pairs Revision n-1 (has bug hunks) Revision n (has fix hunks) modification addition deletion added hunk hunk pair type deleted hunk empty deleted hunk empty added hunk
  20. 20. Detecting Vertical Bugs (Patterns) <ul><li>Detecting bug patterns </li></ul><ul><ul><li>Saving exact code in bug and fix hunks doesn’t work, since there is rarely an exact match. </li></ul></ul><ul><ul><li>Need a method for abstracting changes to find patterns </li></ul></ul><ul><li>Approach </li></ul><ul><ul><li>Abstract code in each bug fix change </li></ul></ul><ul><ul><li>Save abstracted bug and fix code in a database (the “bug fix memory”) </li></ul></ul><ul><ul><li>Can search existing code to see if it matches a bug fix pattern </li></ul></ul><ul><ul><li>Can suggest code to fix the bug </li></ul></ul>
  21. 21. Process for Abstracting Code <ul><li>Four step process </li></ul><ul><ul><li>Raw component extraction </li></ul></ul><ul><ul><ul><li>Parse source code, and burst out individual syntactic elements </li></ul></ul></ul><ul><ul><li>Normalization </li></ul></ul><ul><ul><ul><li>Substitute type names for variables, string literals, constants (abstract to types) </li></ul></ul></ul><ul><ul><li>Information filtering </li></ul></ul><ul><ul><ul><li>Remove elements that are too common to yield project-specific patterns </li></ul></ul></ul><ul><ul><li>Diff filtering </li></ul></ul><ul><ul><ul><li>Remove code components that are common in bug and fix hunks, yielding only code unique to the change </li></ul></ul></ul>
  22. 22. Raw Component Extraction <ul><li>Step 1: Convert statements inside change hunks so they lie on a single line </li></ul><ul><ul><li>Eliminate whitespace </li></ul></ul><ul><ul><li>Concatenate multi-line statements to one line </li></ul></ul><ul><ul><li>Concatenate conditionals for complex statements (if, while, etc.) to one line </li></ul></ul><ul><li>Step 2: Extract raw components </li></ul><ul><ul><li>Component is a non-leaf node in the syntax tree of a single line </li></ul></ul><ul><ul><li>Bursts out complex statements into constituent parts </li></ul></ul><ul><ul><ul><li>Each portion of a complex conditional is a separate component </li></ul></ul></ul><ul><ul><li>Additionally, separate out a method call and its parameters </li></ul></ul>
  23. 23. Raw Component Extraction Example <ul><li>Initial code </li></ul><ul><ul><li>if (foo.flag > 5 && foo.ready()) { </li></ul></ul><ul><ul><li>i=1; </li></ul></ul><ul><ul><li>foo.create(“example”); </li></ul></ul><ul><ul><li>initiate(6,bar); </li></ul></ul><ul><ul><li>} </li></ul></ul><ul><li>Extracted Raw Components </li></ul><ul><ul><li>foo.flag </li></ul></ul><ul><ul><li>foo.flag > 5 </li></ul></ul><ul><ul><li>foo.ready() </li></ul></ul><ul><ul><li>ready() </li></ul></ul><ul><ul><li>foo.flag > 5 && foo.ready () </li></ul></ul><ul><ul><li>if (foo.flag > 5 && foo.ready()) </li></ul></ul><ul><ul><li>i=1 </li></ul></ul><ul><ul><li>“ example” </li></ul></ul><ul><ul><li>foo.create(.) “example” </li></ul></ul><ul><ul><li>create(.) “example” </li></ul></ul><ul><ul><li>initiate(,) 6, bar </li></ul></ul>if > && . . foo flag 5 foo ready() ready
  24. 24. Normalization <ul><li>To further improve the ability to match code, perform abstraction of instances to types </li></ul><ul><ul><li>Replace variable instance with its type </li></ul></ul><ul><ul><ul><li>Permits matching on type, rather than instance </li></ul></ul></ul><ul><ul><ul><li>foo.flag >= 5  Foo.flag >= 5 (type of foo is Foo) </li></ul></ul></ul><ul><ul><li>For literals, insert new component with type </li></ul></ul><ul><ul><ul><li>i=1 yields int=1 and int=int </li></ul></ul></ul><ul><ul><li>For method calls, replace each parameter with type of parameter </li></ul></ul><ul><ul><ul><li>Use “*” for unknown types (we only do one-pass parse) </li></ul></ul></ul><ul><ul><ul><li>initiate(,) 6, bar  initiate(,) int,* (type of bar is unknown) </li></ul></ul></ul>
  25. 25. Information Filtering Goal <ul><li>After normalization, resulting components are candidates for insertion into database </li></ul><ul><ul><li>Problem: many commonly occurring statement types </li></ul></ul><ul><ul><ul><li>int=int </li></ul></ul></ul><ul><ul><li>Want to eliminate these, and others that don’t contribute unique information about bug fixes </li></ul></ul>
  26. 26. Information Filtering Approach <ul><li>Assign an “information value” to component elements </li></ul><ul><ul><li>Value 2: </li></ul></ul><ul><ul><ul><li>method call, string literal longer than 8 chars </li></ul></ul></ul><ul><ul><li>Value 1: </li></ul></ul><ul><ul><ul><li>predicates for: if, do, while, for, as well as conditional expressions </li></ul></ul></ul><ul><ul><ul><li>return, case, switch, synchronized, throw </li></ul></ul></ul><ul><ul><ul><li>string literal, length 3-8 chars </li></ul></ul></ul><ul><ul><ul><li>variable name, field name, class name, variable type </li></ul></ul></ul><ul><ul><li>Value 0: </li></ul></ul><ul><ul><ul><li>Everything else </li></ul></ul></ul><ul><li>Information value for an entire component is the sum of its elemental information values </li></ul><ul><li>We remove components with information value < 2 </li></ul><ul><ul><li>int=1 (info value = 1), int=int (info value = 0) </li></ul></ul><ul><ul><li>“ example” (info value = 1), String (info value = 0) </li></ul></ul>
  27. 27. Diff Filtering and Storing Memories <ul><li>As a final filtering step, keep only those components that are unique to either bug or fix hunks </li></ul><ul><ul><li>Duplicate components are eliminated, since they do not represent the bug or its fix </li></ul></ul><ul><li>After diff filtering step, store all components into the database (“memory”) </li></ul><ul><ul><li>Components record their transaction, file name, bug or fix hunk, etc. </li></ul></ul><ul><ul><li>Also store initial source code of bug and fix hunks </li></ul></ul>
  28. 28. Searching the Memory <ul><li>The memory database contains extracted adaptive bug and fix patterns for a given project </li></ul><ul><li>Can use this memory to find code that matches bug code in the memory </li></ul><ul><li>Use scenario </li></ul><ul><ul><li>Developer working in their favorite development environment </li></ul></ul><ul><ul><li>Receives feedback when code they are developing matches a stored bug pattern </li></ul></ul><ul><ul><li>Can also suggest potential fixes from stored bug fix code </li></ul></ul>
  29. 29. IDE Integration Bug detection Fix suggestion
  30. 30. Evaluation <ul><li>We evaluated the memory to determine how well it captures new bug fix changes </li></ul><ul><ul><li>Online learning approach </li></ul></ul><ul><ul><li>Specifically, we create a memory for transactions 1 to n-1 </li></ul></ul><ul><ul><li>At transaction n , for bug fix changes we examine whether the bug hunks are found in the memory </li></ul></ul><ul><ul><ul><li>This is a “half hit” </li></ul></ul></ul><ul><ul><li>If found, we also examine whether the fix hunk is found too </li></ul></ul><ul><ul><ul><li>This is a “full hit” </li></ul></ul></ul><ul><ul><li>Examined same 5 project histories </li></ul></ul><ul><ul><ul><li>ArgoUML, Columba, Eclipse, jEdit, Scarab </li></ul></ul></ul><ul><li>This can be viewed as a proxy for how well the approach might work for bug and fix prediction </li></ul>
  31. 31. Half and Full Hit Build memories based on transaction 1 .. n-1 …… Transaction 1 .. n-1 Memories Bug | Fix Fix change case at transaction n Half hit Full hit
  32. 32. True and False Positives Build memories based on transaction 1 .. n-1 …… False positive half hit, if found True positive half hit, if found Transaction 1 .. n-1 Memories Non-fix change case at transaction n Fix change case at transaction n
  33. 33. True Positive Hit Rates
  34. 34. False Positive Hit Rates
  35. 35. True Positive and False Positive Full Hit Rates
  36. 36. True Positive and False Positive Full Hit Rates <ul><li>Bug fix memories work well </li></ul><ul><ul><li>Captures 19.3%-40.3% of bugs (half-hits) </li></ul></ul><ul><ul><li>But, also captures a lot of non-bug changes (20.8%-32.5%) </li></ul></ul>
  37. 37. PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug
  38. 38. PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD
  39. 39. PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD Fix Memories
  40. 40. PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul>Bug PMD Fix Memories
  41. 41. PMD VS Fix Memories <ul><li>PMD is a bug finding tool based on a static syntax checker </li></ul><ul><li>Found bugs by PMD and Fix memories are largely exclusive </li></ul>40.3% 6.5% PMD Fix Memories 3% ArgoUML 38.7% 6.5% PMD Fix Memories 2.3% Eclipse
  42. 42. Conclusions <ul><li>It is now possible to reliably extract bug fix memories from software project evolution data </li></ul><ul><li>Bug fix memories work well </li></ul><ul><ul><li>Captures 19.3%-40.3% of bugs (half-hits) </li></ul></ul><ul><ul><li>But, also captures a lot of non-bug changes (20.8%-32.5%) </li></ul></ul><ul><li>Found bugs using fix memories and PMD are mostly exclusive </li></ul><ul><ul><li>Our approach complements other bug finding tools </li></ul></ul>
  43. 43. Future Work <ul><li>Developing other pattern extracting algorithms </li></ul><ul><ul><li>To remove false positives </li></ul></ul><ul><ul><li>AST, Slicing, Control flow, etc. </li></ul></ul><ul><li>Comparing fix memories with more bug finding tools </li></ul><ul><ul><li>FindBugs, JLint, etc. </li></ul></ul>

×