Using Developer Information as a Prediction Factor


Published on

Elaine Weyueker and Thomas Ostrand and Robert Bell

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Using Developer Information as a Prediction Factor

  1. 1. Using Developer Information as a Factor for Fault Prediction May 20, 2007 Elaine Weyuker Tom Ostrand Bob Bell AT&T Labs – Research
  2. 2. GOAL : To determine which files of a software system with multiple releases are particularly likely to contain large numbers of faults.
  3. 3. Because this should allow us to build highly dependable software systems more economically by allowing us to better allocate testing effort and resources, including personnel. Prioritize testing. Why is this important?
  4. 4. Infrastructure Projects use an integrated change management/version control system. Any change to the software requires that a modification request (MR) be opened. MRs include information such as the reason that the change is to be made, a description of the change, a severity rating, the actual change, development stage during which the MR was initiated.
  5. 5. Explanatory Variables <ul><li>Size of file - log(KLOC) </li></ul><ul><li>Age of file – 0, 1, 2-4, >4. </li></ul><ul><li>New to the current release, and if not, whether it was changed during prior release? </li></ul><ul><li>Sqrt(number of changes in the previous release) </li></ul><ul><li>Sqrt(number of changes two releases ago). </li></ul><ul><li>Sqrt(number of faults in the previous release). </li></ul><ul><li>Programming language used. </li></ul>
  6. 6. Systems Studied 84% 9 years Maintenance Support 75% 2.25 years Voice Resp 83% 2 years Provisioning 83% 4 years Inventory 20% Files Period Covered System Type
  7. 7. Maintenance Support System <ul><li>Developed and maintained by a different company. </li></ul><ul><li>Very mature system - 9 years of field data. </li></ul><ul><li>The 20% of the files identified by our model contained 84% of the faults. </li></ul>
  8. 8. Adding Developer Information to Improve Predictions for Changed Files <ul><li>The number of developers who modified the file during the prior release. </li></ul><ul><li>The number of new developers who modified the file during the prior release. </li></ul><ul><li>The cumulative number of distinct developers who modified the file during all releases through the prior release. </li></ul><ul><li>NB: Don’t know who created the file. </li></ul>
  9. 9. Cumulative Number of Developers After 20 Releases (526 Files, Mean 3.54)
  10. 10. Mean Cumulative Number of Developers by File Age (Age 20 = 3.54)
  11. 11. Proportion of Changed Files with Multiple Developers by File Age
  12. 12. Proportion of Changed Files with at Least 1 New Developer by File Age
  13. 13. Percentage Faults in Identified 20% Files 84.9 83.9 Mean Rel 6-35 92 92 31-35 91 90 26-30 88 89 21-25 86 84 16-20 73 71 11-15 79 78 6-10 With Developers W/O Developers Release Number
  14. 14. Conclusions <ul><li>Using developer information helps, but only a little bit. Factors like size and whether or not the file is new or changed are much more important. </li></ul>