Predicting Defects for Eclipse for Eclipse Thomas Zimmermann • Rahul Premraj •  Andreas Zeller Saarland University
Summary Project   Eclipse (eclipse.org) Content Defect counts Complexity metrics Releases 2.0, 2.1, and 3.0 Level Packages and files URL www.st.cs.uni-sb.de/softevo/ More data Eclipse source code
Data Source Bugs Changes
Identifying Fixes Change history contains ordinary changes as well as bug fixes Use  log messages  to discriminate Search for  keywords  – e.g. “bug” “fix” Look out for  bug IDs  – e.g. “#33547” Bugs Changes
Mapping Bugs Each bug report has a unique  bug ID Bug reports contain  releases (and sometimes components) Associate bugs with changes via bug   ID Bugs Changes
Eclipse Bugs
Obtaining Data
 
Program is open source Plenty of data available – automatically Program data – all sorts of code analyses Process data – changes, bugs, e-mail, etc. Data set provides bugs, metrics, tokens
Predictions at Work
Eclipse Bug Data <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>
Eclipse Bug Data For three Eclipse releases: For all Eclipse components: Defect counts  before  and  after  release Complexity metrics Syntactic tokens Download at http://www.st.cs.uni-sb.de/softevo/ <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>
Eclipse Bugs
Where do bugs come from?
Is it the Developers? Does experience matter? Bug density correlates with experience!
How about Testing? Does  code coverage  predict bug density? Yes –  the more tests,  the more bugs!
History? I found lots of bugs here.  Will there be more? Yes!
How about Metrics? Do  code metrics  predict bug density? Yes! (but only with history)
Syntactic Tokens? Which  tokens predict bug density? imports • extends • implements
Eclipse Imports import org.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*; 14% of all components importing  ui show a post-release defect 71% of all components importing  compiler show a post-release defect Joint work with Adrian Schröter • Tom Zimmermann
Eclipse Imports Correlation with failure Correlation with success import org.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*;
What makes code buggy in the first place?
Eclipse Bug Data For three Eclipse releases: For all Eclipse components: Defect counts  before  and  after  release Complexity metrics Syntactic tokens Download at http://www.st.cs.uni-sb.de/softevo/ <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>

Predicting Defects for Eclipse

  • 1.
    Predicting Defects forEclipse for Eclipse Thomas Zimmermann • Rahul Premraj • Andreas Zeller Saarland University
  • 2.
    Summary Project  Eclipse (eclipse.org) Content Defect counts Complexity metrics Releases 2.0, 2.1, and 3.0 Level Packages and files URL www.st.cs.uni-sb.de/softevo/ More data Eclipse source code
  • 3.
  • 4.
    Identifying Fixes Changehistory contains ordinary changes as well as bug fixes Use log messages to discriminate Search for keywords – e.g. “bug” “fix” Look out for bug IDs – e.g. “#33547” Bugs Changes
  • 5.
    Mapping Bugs Eachbug report has a unique bug ID Bug reports contain releases (and sometimes components) Associate bugs with changes via bug ID Bugs Changes
  • 6.
  • 7.
  • 8.
  • 9.
    Program is opensource Plenty of data available – automatically Program data – all sorts of code analyses Process data – changes, bugs, e-mail, etc. Data set provides bugs, metrics, tokens
  • 10.
  • 11.
    Eclipse Bug Data<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>
  • 12.
    Eclipse Bug DataFor three Eclipse releases: For all Eclipse components: Defect counts before and after release Complexity metrics Syntactic tokens Download at http://www.st.cs.uni-sb.de/softevo/ <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>
  • 13.
  • 14.
    Where do bugscome from?
  • 15.
    Is it theDevelopers? Does experience matter? Bug density correlates with experience!
  • 16.
    How about Testing?Does code coverage predict bug density? Yes – the more tests, the more bugs!
  • 17.
    History? I foundlots of bugs here. Will there be more? Yes!
  • 18.
    How about Metrics?Do code metrics predict bug density? Yes! (but only with history)
  • 19.
    Syntactic Tokens? Which tokens predict bug density? imports • extends • implements
  • 20.
    Eclipse Imports importorg.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*; 14% of all components importing ui show a post-release defect 71% of all components importing compiler show a post-release defect Joint work with Adrian Schröter • Tom Zimmermann
  • 21.
    Eclipse Imports Correlationwith failure Correlation with success import org.eclipse.jdt.internal.compiler.lookup.*; import org.eclipse.jdt.internal.compiler.*; import org.eclipse.jdt.internal.compiler.ast.*; import org.eclipse.jdt.internal.compiler.util.*; ... import org.eclipse.pde.core.*; import org.eclipse.jface.wizard.*; import org.eclipse.ui.*;
  • 22.
    What makes codebuggy in the first place?
  • 23.
    Eclipse Bug DataFor three Eclipse releases: For all Eclipse components: Defect counts before and after release Complexity metrics Syntactic tokens Download at http://www.st.cs.uni-sb.de/softevo/ <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <!-- comments --> <defects project=&quot;eclipse&quot; release=&quot;2.0&quot; dataversion=&quot;1.0&quot;> <plug-in name=&quot;platform-launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <package name=&quot;org.eclipse.core.launcher&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot; avg=&quot;0.0&quot; compilationunits=&quot;1&quot; max=&quot;0&quot;/> </counts> <compilationunit dir=&quot;/platform-launcher/library/&quot; base=&quot;Main.java&quot;> <counts> <count id=&quot;pre&quot; value=&quot;0&quot;/> <count id=&quot;post&quot; value=&quot;0&quot;/> </counts> </compilationunit> </package> </package> </package> </plug-in>