Training on Errors Experiment              to Detect Fault-Prone Software Modules                                        b...
Training on Errors Experiment              to Detect Fault-Prone Software Modules                                        b...
2Osaka Univ.                                                                     What We Tried…Idea: Fault-prone filtering ...
3Osaka Univ.                                                                               Overview     Preliminary     Fa...
4                                                                                          Preliminary:Osaka Univ.        ...
5                                                                                                 Preliminary:Osaka Univ. ...
6                                                                                              Preliminary:Osaka Univ.    ...
6                                                                                                       Preliminary:Osaka ...
7Osaka Univ.                                                                               Overview     Preliminary     Fa...
8Osaka Univ.                                                                 Fault-Prone Filtering     All software module...
8Osaka Univ.                                                                       Fault-Prone Filtering     All software ...
9                                                                                     Fault-Prone Filtering:Osaka Univ.   ...
10                                                                             Fault-Prone Filtering: Osaka Univ.   Exampl...
10                                                                             Fault-Prone Filtering: Osaka Univ.   Exampl...
10                                                                             Fault-Prone Filtering: Osaka Univ.   Exampl...
10                                                                             Fault-Prone Filtering: Osaka Univ.   Exampl...
11                                                                            Fault-Prone Filtering:Osaka Univ.   Example ...
11                                                                             Fault-Prone Filtering:Osaka Univ.   Example...
11                                                                             Fault-Prone Filtering:Osaka Univ.   Example...
11                                                                             Fault-Prone Filtering:Osaka Univ.   Example...
11                                                                             Fault-Prone Filtering:Osaka Univ.   Example...
11                                                                             Fault-Prone Filtering:Osaka Univ.   Example...
12Osaka Univ.                                                                               Overview     Preliminary     F...
13Osaka Univ.                                                                                        Experiments     Targe...
14                                                                                                  Experiment:Osaka Univ....
15                                                                                                 Experiment:Osaka Univ. ...
16Osaka Univ.                                                                            Overview     Preliminary     Faul...
17Osaka Univ.                         Training Only Errors Procedure     In Spam filtering:         Apply e-mail messages t...
18Osaka Univ.                                                                               Overview     Preliminary     F...
19Osaka Univ.                                          Evaluation Measurements     Accuracy                               ...
20Osaka Univ.   Result of Experiment (Transition of Rates)   All extracted                              1   modules are so...
21Osaka Univ.            Result of Experiment (Final Accuracy)         Cumulative prediction result at the end of TOE.    ...
22Osaka Univ.                                                                               Overview     Preliminary     F...
23Osaka Univ.                                                                  Threats to Validity     Threats to construc...
24Osaka Univ.                                                                                Related Works     Much resear...
25Osaka Univ.                                                                                    Conclusions     Summary  ...
26Osaka Univ.                                                                    Q&A                         Thank you!   ...
27                                                 Result of Cross Validation                                             ...
28                           Training Only Errors Procedure  Osaka Univ.           Case 2: Prediction matches to actual st...
28                                       Training Only Errors Procedure  Osaka Univ.                       Case 2: Predict...
28                           Training Only Errors Procedure  Osaka Univ.           Case 2: Prediction matches to actual st...
28                           Training Only Errors Procedure  Osaka Univ.           Case 2: Prediction matches to actual st...
28                           Training Only Errors Procedure  Osaka Univ.           Case 2: Prediction matches to actual st...
28                              Training Only Errors Procedure Osaka Univ.               Case 2: Prediction matches to act...
29                               Training Only Errors Procedure      Osaka Univ.   Case 1: Prediction does not match to ac...
29                                Training Only Errors Procedure      Osaka Univ.    Case 1: Prediction does not match to ...
29                                Training Only Errors Procedure      Osaka Univ.    Case 1: Prediction does not match to ...
29                                Training Only Errors Procedure      Osaka Univ.    Case 1: Prediction does not match to ...
29                                       Training Only Errors Procedure      Osaka Univ.           Case 1: Prediction does...
29                                Training Only Errors Procedure    Osaka Univ.      Case 1: Prediction does not match to ...
30Osaka Univ.                                            Procedure of Experiment     Two experiments with different thresh...
31Osaka Univ.             Result of Experiment (OSB, tFP = 0.25) Comparison with                               1          ...
Upcoming SlideShare
Loading in …5
×

ESEC/FSE 2007 presentation slide by Osamu Mizuno

346 views
278 views

Published on

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
346
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • My name is Osamu Mizuno, from Osaka University, JAPAN.\n\nThe title of my talk is ....\n\n
  • This slide briefly shows what I tried.\n\nThe main idea is development of new approach to detect fault prone modules using a generic text discriminator such as a spam filter.\n\nIn order to show effectiveness, we performed an experiment.\nWe used CRM114 text discriminator for a spam filter,\nWe collected data of fault-prone modules from Eclipse project.\nWe then performed training only errors procedure in the experiment.\n\nThe result of experiment showed that we achieved high recall, but low precision.\n\n
  • Overview of this presentation.\n
  • \n
  • In order to breakthrough the situation,\n\n\n\nNowadays, most of spam filters implement Bayesian filters.\n
  • it is assumed that all e-mail messages can be classified into SPAM and HAM. SPAM is undesired, and HAM is desired e-mail messages.\n\nwe then tokenize both spam and ham e-mail messages and learn them into corpuses\n\nThen, incoming messages are classified into spam or ham by SPAM filter.\n
  • it is assumed that all e-mail messages can be classified into SPAM and HAM. SPAM is undesired, and HAM is desired e-mail messages.\n\nwe then tokenize both spam and ham e-mail messages and learn them into corpuses\n\nThen, incoming messages are classified into spam or ham by SPAM filter.\n
  • \n
  • As for the fault-prone filtering, almost the same approach to spam filtering is adopted.\n\nAt first, we assume all software modules can be classified into bug-injected and not-bug-injected\nwe call bug injected modules as fault-prone (FP), and not bug injected modules as not fault-prone (NFP).\n\nThen modules are tokenized and learned to FP and NFP corpuses.\n\nFinally, newly developed modules are classified into FP or NFP by spam filter.\n\nNote that the input of our approach is source code only.\n\n
  • As for the fault-prone filtering, almost the same approach to spam filtering is adopted.\n\nAt first, we assume all software modules can be classified into bug-injected and not-bug-injected\nwe call bug injected modules as fault-prone (FP), and not bug injected modules as not fault-prone (NFP).\n\nThen modules are tokenized and learned to FP and NFP corpuses.\n\nFinally, newly developed modules are classified into FP or NFP by spam filter.\n\nNote that the input of our approach is source code only.\n\n
  • \n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that two modules mFP and mNFP are created.\nAlso Assume that we have empty FP and NFP corpuses.\n\nin fact, mFP is a falut-injected module, and mNFP is a revised one of mFP.\nWe can see that this module is intended to calculate factorial of given integer x, however, mFP has a bug that the ++ should be --. mNFP revised the bug.\n\nBy using CRM114, we obtain tokens to be trained like this. The difference between tokens TFP and TNFP are represented as red.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • Assume that a new module mnew is created. Here, this module intended to calculate summation of given integer x.\nThis module also includes a bug that the operator ++ should be --.\n\nToken for mnew is generated like this. By applying this tokens to FP filter, we calculate probability to be fault-prone.\n\nAs I mentioned before, corpuses FP and NFP include tokens TFP and TNFP .\nIntuitively, by matching the similarity between these tokens, FP filter determines which corpus is more similar to Tnew.\nIn this example, we can see that T_new has more similarity with T_FP than with T_NFP (指さす). \n\nBy the Bayesian formula, we can calculate the probability, and thus m_new is predicted as FP.\n
  • \n
  • in order to show the usefulness of our approach, we have conducted experiment.\n\nthe target system is “Eclipse” project\n\n
  • For the case study, we have to collect FP modules from the target CVS repository. \n\nTo do so, we adopt an algorithm shown by Sliwerski et. al. in MSR2005.\nBriefly speaking, at first, we search terms such as “issue”, problem, or #, along with terms “fixed” or “resolved” or “removed” and so on. we can then identify a revision the bug is removed.\nnext, we get differences from the previous revision and identify modified modules.\nfinally, we track back the cvs and identify modules that have not been modified since the bug is reported. we consider these are FP modules.\n\nThe result of collection is summarized here.\n- Bugs found in cvs log was 1,973. it was 42 % of total\n- # of found FP modules are 9,547, and NFP modules are 86,770 modules.\n\n
  • \n
  • \n
  • \nIf an e-mail message from my boss is marked as spam, I have to click not junk button unwillingly. \nThis is exactly “training in case of errors”.\n\n
  • \n
  • \n
  • This slide shows the result of training only errors experiment.\n\nAll extracted modules are sorted by date, and applied FP filter one by one from the oldest one.\n\nThe graph shows transitions of accuracy, recall, and precision.\nX axis shows the cumulative number of modules applied to FP filter. \nY axis shows the values of rates.\n\nWe can see that the prediction results become stable after 50,000 modules classification.\nIt is because there is little training data at the beginning of the TOE.\n\n\n
  • the table shows the cumulative classification result at the final point of the experiment.\n\n\n\n
  • \n
  • We have proposed a new approach to detect FP modules using spam filter.\n\nThe threat of construction validity is that it is not clear whether collected FP modules really include faults or not. We need to investigate this threat as a future work.\n\n
  • \n
  • We have proposed a new approach to detect FP modules using spam filter.\n\nThe threat of construction validity is that it is not clear whether collected FP modules really include faults or not. We need to investigate this threat as a future work.\n\n
  • \n
  • the result of cross validation is shown in this table.\n\nthe columns shows the number of predicted status of modules, and the rows shows the number of actual status of modules.\n\nin order to evaluate the result, we introduce 3 measures.\n[click]\nPrecision denotes the ratio of actual FP modules to the predicted FP modules.\n[click]\nRecall denotes the ratio of predicted FP modules to the actual FP modules. \n[click]\nAccuracy is the ratio of correctly predicted modules to all modules.\n\nThe recall is an important measure to assure the quality\nthe precision is important for the cost of testing\n\nWe can see that recall and accuracy are relatively high, but precision is low.\n\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • At this point, assume that the probability is calculated as 0.2, and classified as NFP.\n\nAs the time passes by, the fault-proneness of a module can be revealed.\nHere assume that a bug is detected in this module.\n\n\n\n
  • \n
  • \n\n
  • ESEC/FSE 2007 presentation slide by Osamu Mizuno

    1. 1. Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter Osamu Mizuno, Tohru Kikuno Graduate School of Information Science and Technology Osaka University, JAPANOsaka Univ. ESEC/FSE2007 presentation 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
    2. 2. Training on Errors Experiment to Detect Fault-Prone Software Modules by Spam Filter Osamu Mizuno, Tohru Kikuno Graduate School of Information Science and Technology Osaka University, JAPANOsaka Univ. ESEC/FSE2007 presentation 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved
    3. 3. 2Osaka Univ. What We Tried…Idea: Fault-prone filtering Detection of fault-prone modules using a generic text discriminator such as a spam filterExperiment SPAM filter: CRM114 (generic text discriminator) Data of fault-proneness from an OSS project (Eclipse) Training Only Errors(TOE) procedureResult Achieved high recall (despite of low precision) (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    4. 4. 3Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    5. 5. 4 Preliminary:Osaka Univ. Fault-Prone Modules Fault-prone modules are: Software modules (a certain unit of source code) which may include faults. In this study: Source code of Java methods which seems to include faults from the information of a bug tracking system. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    6. 6. 5 Preliminary:Osaka Univ. Spam E-mail Filtering (1) Spam e-mail increases year by year. About 94% of entire e-mail messages are Spam. Various spam filters have been developed. Pattern matching based approach causes a rat race between spammers and developers. Bayesian classification based approach has been recognized effective[1]. [1] P. Graham, Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pp. 121-129, 2004. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    7. 7. 6 Preliminary:Osaka Univ. Spam E-mail Filtering (2) All e-mail messages can be classified into Spam: undesired e-mail Ham: desired e-mail Tokenize and learn both spam and ham e-mail messages as text data and construct corpuses. Existing e-mail Learning (Training) HAM HAM SPAM SPAM HAM corpus corpus SPAM Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    8. 8. 6 Preliminary:Osaka Univ. Spam E-mail Filtering (2) All e-mail messages can be classified into Spam: undesired e-mail Ham: desired e-mail Tokenize and learn both spam and ham e-mail messages as text data and construct corpuses. Existing e-mail Learning (Training) HAM HAM SPAM SPAM HAM Incoming e-mail messages ? Classify corpus SPAM Filter corpus SPAM are classified into spam or ham by spam filter. HAM Incoming e-mail (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    9. 9. 7Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    10. 10. 8Osaka Univ. Fault-Prone Filtering All software modules can be classified into bug-detected (fault-prone: FP) not-bug-detected (not-fault-prone: NFP) Tokenize and learn both FP and NFP modules as text data and construct corpusesExisting code modules FP FP Learning (Training) NFP FP NFP corpus corpus FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    11. 11. 8Osaka Univ. Fault-Prone Filtering All software modules can be classified into bug-detected (fault-prone: FP) not-bug-detected (not-fault-prone: NFP) Tokenize and learn both FP and NFP modules as text data and construct corpusesExisting code modules FP FP Learning (Training) NFP ? FP NFP Newly developed modules Classify corpus corpus FP are classified into FP or FP NFP by the FP filter. Filter NFP Newly created code (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    12. 12. 9 Fault-Prone Filtering:Osaka Univ. Spam Filter: CRM114 Spam filter: CRM114 (http://crm114.sourceforge.net/) Generic text discriminator for various purpose Implements several classifiers: Markov, OSB, kNN, ... Characteristic: generation of tokens. Tokens are generated by combination of words (not a single word) return (x + 2 * y ); with the OSB tokenizer token return x x + + 2 2 y return + x 2 + * * y return 2 x * + y return * x y 2 * (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    13. 13. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB)Source code (mFP) Source code (mNFP)public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x));} } (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    14. 14. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB)Source code (mFP) Source code (mNFP)public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x));} } Empty Empty FP NFP corpus corpus FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    15. 15. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB)Source code (mFP) Source code (mNFP)public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x));} } Tokens (TFP) Tokens (TNFP) public int public int public fact public fact public x public x int fact int fact int int int int ... ... ... ... x ++ FP NFP x -- x x corpus corpus x x * fact * fact * ++ FP * -- * x Filter * x fact ++ fact -- fact x fact x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    16. 16. 10 Fault-Prone Filtering: Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB)Source code (mFP) Source code (mNFP)public int fact(int x) { public int fact(int x) { return (x<=1?1:x*fact(++x)); return (x<=1?1:x*fact(--x));} } Tokens (TFP) Tokens (TNFP) public int public int public fact public fact public x public x int fact Training Training int fact int int int int ... ... ... ... x ++ FP NFP x -- x x corpus corpus x x * fact * fact * ++ FP * -- * x Filter * x fact ++ fact -- fact x fact x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    17. 17. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    18. 18. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) public int public sigma public x int sigma int int ... ... x ++ x x + sigma + ++ + x sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    19. 19. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) public int public sigma public x int sigma int int ... FP NFP ... corpus corpus x ++ x x Prediction FP + sigma Filter + ++ + x sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    20. 20. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); }Tokens (TFP) Tokens (Tnew) Tokens (TNFP)public int public int public intpublic fact public sigma public factpublic x public x public xint fact int sigma int factint int int int int int ... ... ... ... ... ...x ++ x ++ x --x x x x x x* fact + sigma * fact* ++ + ++ * --* x + x * xfact ++ sigma ++ fact --fact x sigma x fact x++ x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    21. 21. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); }Tokens (TFP) Tokens (Tnew) Tokens (TNFP)public int public int public intpublic fact public sigma public factpublic x public x public xint fact int sigma int factint int int int int int ... ... ... ... ... ...x ++ x ++ x --x x x x x x* fact + sigma * fact* ++ + ++ * --* x + x * xfact ++ sigma ++ fact --fact x sigma x fact x++ x ++ x -- x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    22. 22. 11 Fault-Prone Filtering:Osaka Univ. Example of Fault-Prone Filtering (CRM114, OSB) Source code (mnew) public int sigma(int x) { return (x<=0?0:x+sigma(++x)); } Tokens (Tnew) mnew is predicted as FP public int public sigma because TFP has more public x similarity than TNFP. int sigma int int ... FP NFP ... corpus corpus x ++ Probability: x x Prediction FP 0.52 + sigma Filter + ++ Predicted: + x FP sigma ++ sigma x ++ x (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    23. 23. 12Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    24. 24. 13Osaka Univ. Experiments Target: Eclipse project Written in Java “Methods” in Java classes are considered as modules Date of snapshots of cvs repository and bugzilla database January 30, 2007. Large CVS repository (about 14GB) Faults are recorded precisely (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    25. 25. 14 Experiment:Osaka Univ. Collecting FP & NFP Modules Track FP modules from CVS log based on an algorithm by Sliwerski, et. al[2]. [2] J. Sliwerski, et. al., When do changes induce fixes? (on fridays.). In Proc. of MSR2005, pp. 24-28, 2005. Search terms such as “issue”, “problem”, “#”, and bug id as well as “fixed”, “resolved”, or “removed” from CVS log, then identify a revision the bug is removed. Get difference from the previous revision and identify modified modules. Track back repository and identify modules that have not been modified since the bug is reported. They are FP modules. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    26. 26. 15 Experiment:Osaka Univ. Result of Module Collection Extracted bugs from bugzilla database of Eclipse Conditions: Type of faults: Bugs Status of faults: Resolved,Verified, or Closed Resolution of faults: Fixed Severity: Blocker, Critical, Major, or Normal Total # of faults: 40,627 Result of collection # of faults found in CVS log: 21,761 (52% of total) # of fault-prone(FP) modules: 65,782 # of not-fault-prone(NFP) modules: 1,113,063 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    27. 27. 16Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    28. 28. 17Osaka Univ. Training Only Errors Procedure In Spam filtering: Apply e-mail messages to spam filter in order of arrival. Only misclassified e-mail messages are trained in corpuses. You may do this procedure in daily e-mail assorting. In Fault-prone filtering: Apply software modules to fault-prone filter in order of construction and modification. Only misclassified modules are trained in corpuses. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    29. 29. 18Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    30. 30. 19Osaka Univ. Evaluation Measurements Accuracy Result of Predicted Overall accuracy of prediction prediction NFP FP (N1+N4) / (N1+N2+N3+N4) NFP N1 N2 Actual FP N3 N4 Recall How much actual FP modules are predicted as FP. N3 / (N3+N4) Precision How much predicted FP modules include actual FP modules N2 / (N2+N4) (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    31. 31. 20Osaka Univ. Result of Experiment (Transition of Rates) All extracted 1 modules are sorted 0.9 AA A A A A AAAA AA AAAA AAAA AAA Accuracy AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A AAAAAAAA AAA AAA AAA AAAA A by date, and applied 0.8 A A A FFFF F A A FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFF FFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FF FFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFF 0.7 F F FFFFFFFFFFFFFFFF FFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFF F F Recall FP filter one by one F FF FF FFF F F FFF F 0.6 F F F F Rate from the oldest one. 0.5 I IIII Precision 0.4 I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIII I I IIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII II IIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIII I IIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I IIII 0.3 II IIII I I II DI II I D D D D 0.2 DDD DDD D DDD DDD DDDDD D DDDDDD D DDDDDDD False-negative D DDDDDDDDDDDDD Observation D DDDDDDDDDDD DDD DDDDDDD DDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDD DD DDDD DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDD DDD 0.1 DD DDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDD False-positive 0 The prediction result 0 0 00 00 00 00 00 00 00 00 00 0 00 00 00 00 00 00 00 00 00 00 00 00 50 50 50 become stable after 15 25 35 45 55 65 75 85 95 10 11 old Methods sorted by date new 50,000 modules classification. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    32. 32. 21Osaka Univ. Result of Experiment (Final Accuracy) Cumulative prediction result at the end of TOE. TOE - final Predicted OSB Precision: 0.347 NFP FP Recall: 0.728 NFP 1,022,895 90,168 Actual Accuracy: 0.908 FP 17,890 47,892 In other words, 72% of actual FP modules are predicted as FP. 34% of predicted FP modules include faults. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    33. 33. 22Osaka Univ. Overview Preliminary Fault-Prone Filtering Experiments Training Only Errors (TOE) procedure Results Conclusions (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    34. 34. 23Osaka Univ. Threats to Validity Threats to construction validity Collection of fault-prone modules from OSS projects. We could not cover all faults in bugzilla database. We have to collect more reliable data in future work. Threats to external validity Generalizability of the results We have to apply Fault-prone Filtering to many projects including industrial ones. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    35. 35. 24Osaka Univ. Related Works Much research has been done so far. Logistic regression CART Bayesian classification and more. Most of them use software metrics McCabe, Halstead, Object-oriented, and so on. Intuitively speaking, our approach uses a new metric, “frequency of tokens”. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    36. 36. 25Osaka Univ. Conclusions Summary We proposed the new approach to detect fault prone modules using spam filter. The case study showed that our approach can predict fault prone modules with high accuracy. Future works Using semantic parsing information instead of raw code Using differences between revisions as an input of Fault-prone filtering Seems more reasonable... (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    37. 37. 26Osaka Univ. Q&A Thank you! Any questions? (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    38. 38. 27 Result of Cross Validation (From slide of MSR2007)Result for Eclipse BIRT plugin10-fold cross validation Cross Validation Predicted OSB Precision: 0.319 NFP FP NFP 70,369 16,011 Recall: 0.786 Actual FP 2,039 7,501 Accuracy: 0.811 Recall is important for quality assurance. Precision implies the cost for finding FP modules. Recall is rather high, and precision is rather low. (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved MSR2007
    39. 39. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ? ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    40. 40. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ? ? .java FP NFP Probability: ? .java corpus corpus 0.9 .java FP Predicted: .java Filter Prediction FP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    41. 41. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ? ? .java FP NFP Probability: .java corpus corpus 0.9 .java FP Predicted: Filter FP ?FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    42. 42. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ? ? .java FP NFP Probability: .java corpus corpus 0.9 .java FP Predicted: Filter FP ?FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    43. 43. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ? ? .java FP NFP .java corpus corpus .java FP Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    44. 44. 28 Training Only Errors Procedure Osaka Univ. Case 2: Prediction matches to actual status? ?.java FP NFP ? .java corpus corpus FP .java Filter Prediction (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    45. 45. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status? ? ? .java ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    46. 46. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status? ? ? .java ? .java FP NFP Probability: ? .java corpus corpus 0.2 .java FP Predicted: .java Filter Prediction NFP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    47. 47. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status? ? ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP ? FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    48. 48. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status? ? ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP ? FP .java (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    49. 49. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status ? FP? .java ? Training ? .java ? .java FP NFP .java corpus corpus Probability: 0.2 .java FP Predicted: Filter NFP (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    50. 50. 29 Training Only Errors Procedure Osaka Univ. Case 1: Prediction does not match to actual status? ? ? .java FP NFP ? .java corpus corpus .java FP .java Filter (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    51. 51. 30Osaka Univ. Procedure of Experiment Two experiments with different thresholds of probability(tFP) to determine FP and NFP. Changing tFP may achieve higher recall Experiment 1: TOE with OSB classifier, tFP=0.5 Experiment 2: TOE with OSB classifier, tFP=0.25 Predict more modules as FP than Experiment 1 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007
    52. 52. 31Osaka Univ. Result of Experiment (OSB, tFP = 0.25) Comparison with 1 0.9 Recall threshold = 0.50 FF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF D F A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFAAAAAFFFFFFFFFF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFFFFFFFFFFFFFFFFF FFFFFAFAFAFAF AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFFAAAAAFAFAFA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFAAAAAAAA AAAAAAAAFFAAAAAAAAA AAAAAAAAAA FFFFFAAAAAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAAAA A FFF FFFFFAAAFFAFAFAFAAAAAAAAAAAAAAAAAAAAAAAAAAA FFFF FAFAFAFA FFAAFAFAFAFA 0.8 DFFF AA F FA FF A DF A F AAAA F AA FFAAA F AA A F AA FFAA FAFA A ccuracy A A 0.7 A Precision becomes DA AA A AA A A D 0.6 A D D lower. A A A DD Rate D 0.5 D DD D DD DD D D Only 1/4 of FP 0.4 D DD D DD D D 0.3 DDDD D D DD DDD Precision predicted modules D DD I IIIIIIIIIIIIIII DD DD DD IIIIIIIIIIIIIDIIIIII DDI I I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I IIIII DIDDDDDDIIIIIIIII DID DIDIIIIII I DI IIIIIIII DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII III IIII I DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDD DDDDDDD D DDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD 0.2 IIIIII III II False-positive DD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDD D DDDDDDDDDD DDDD hits actual faulty I 0.1 DDDDDDDDDDDDD False-negative DDDDDDDDDDDDDDD D DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD modules. DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD 0 11 00 0 25 0 35 0 45 0 55 0 65 0 75 0 85 0 95 0 10 00 15 0 00 0 0 0 0 0 0 0 0 00 0 00 00 00 00 00 00 00 00 00 Recall becomes much 50 50 50 higher. Methods sorted by date TOE - final Predicted Precision: 0.232 83% of actual faulty OSB NFP FP modules can be Recall: 0.839 NFP 930,218 182,845 Actual detected. FP 10,592 55,190 Accuracy: 0.835 (C) 2007 Osamu Mizuno @ Osaka University / All rights reserved ESEC/FSE2007

    ×