Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. International Journal of Information Technology & Management Information System (IJITMIS), ISSN INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME MANAGEMENT INFORMATION SYSTEM (IJITMIS) ISSN 0976 – 6405(Print) ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), pp. 136-146 © IAEME: http://www.iaeme.com/IJITMIS.asp Journal Impact Factor (2013): 5.2372 (Calculated by GISI) www.jifactor.com IJITMIS ©IAEME PREDICTING SOFTWARE DEFECTS WITH ASSOCIATION MINING TO DISCOVER DEFECT PATTERN BY USING ABDP Dr. R. Dillibabu1, K. Karnavel2, L. Sudha3 2 1 Research Scholar, Department of Industrial Engineering, Anna university, Chennai. Assiociate Professor, Department of Industrial Engineering, Anna University, Chennai. 3 PG student, Department of Management Studies, Anna University, Chennai. ABSTRACT This paper presents an efficient problem solving approach and detecting defects at an early stage, which is based on the Action-Based Defect Predection(ABDP)and Feature Subset Selection (FSS). Factors causing defects vary according to the different attributes of a project, including the experience of the developers, the product complexity, the development tools and the schedule. The most significant challenge for a project manager is to identify actions that may incur defects before the action is performed. Actions performed in different projects may yield different results, which are hard to predict in advance. This approach will practically:(i) Accurately predicting actions that cause many defects by mining records of performed actions. (ii) To address this problem, the under-sampling is applied to the data set to increase the precision of predictions for subse-quence actions. (iii)It is applied to a business project, revealing that under-sampling with FSS successfully predicts the problematic actions during project execution. This method applies association rule mining to find out defect patterns, and multi-interval discretization to hold the persistent attributes of actions. The discovered patterns then can be applied to predict the defects generated by the following actions and take required corrective actions to avoid defects. In addition to improve software quality products, if two or more failure modes have the same RPN, it is possible to prioritize the failure modes with the help of FMEA and Risk Priority Code (RPC). If there is a tie situation in RPC, a more detailed selection can be done. The proposed technique is useful to industry project, generous exceptional prediction results and informative the efficiency of the proposed approach. The detected actions not only provide the information to avoid possible defects, but also facilitate the software process improvement. 136
  2. 2. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME Keywords: FMEA; Software defect prediction; Classification; Mining rarity INTRODUCTION Software defects not only influence the quality of the software products, but also can increase the effort involved in a project. Furthermore, the undetected defects may need significant effort to detect and remove in subsequent software development stages (Pressman, 2001). Additionally, the high defect rate is also an important factor in the cost and schedule of the project overrun (Jones, 1994). Rather than detecting existing defects, the defect prevention can be applied to prevent the defects from occurring, as in the Causal Analysis and Resolution (CAR) in maturity level 5 (ML 5) of the Capability Maturity Model Integration (CMMI1). The objective of CAR is to detect, analyze, and prevent defects or other process problems from occur-ring in the future (Chrissis et al., 2003). This study proposes an action-based defect predection (ABDP) approach, which applies classifies the records of the performed actions to predict whether the subsequent actions cause defects in the same project. An action is defined herein as an operation performed based on the task in Work Breakdown Structure (WBS) of the project. Rather than focusing on the reported defects, ABDP mines the patterns of actions that may cause defects, and uses the analytical results to predict whether the subsequent actions are likely to generate defects. Once actions with high prob-ability of causing defects are identified, stakeholders can review these actions carefully and take appropriate correc-tive actions. The newly performed actions are continually appended to the historic data set to construct a new prediction model for subsequent actions. To address the imbal-anced data set problem where the number of actions causing defects is fairly small, this study applies under-sam-pling techniques to the data set, and compares the results with those of over sampling. The comparison results indi-cate that under-sampling achieves more precise predictions than over-sampling. ABDP also adopts the Feature Subset Selection (FSS) technique to filter out the important attri-butes and thus improve the prediction accuracy. The advantages of applying ABDP to measure the process are as follows: • In-process prediction: The data used to construct the pre-diction model are obtained from the same project that can decrease the variance between di erent projects. • Requires less effort to collect data: Actions and defect reporting are common procedures for most software teams, and the required data can be collected from these reports. • Reduces the effort in identifying the problem in the pro-cess: The detected actions that are likely to cause defects can be further analyzed and reviewed in the causal anal-ysis meeting, thus reducing the effort involved in identi-fying problematic actions. FAILURE MODE AND EFFECTS ANAYSIS Failure Mode and Effects Analysis (FMEA) is commonly defined as “a systematic process for identifying potential design and process failures before they occur, with the intent to eliminate them or minimize the risk associated with them”. The FMEA technique was first reported in the 1920s but its use has only been significantly documented since the early 1960s. It was developed in the USA in the 1960s by National Aeronautics Space Agency (NASA) as a means of addressing a way to improve the reliability of military equipment. The 137
  3. 3. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME criticality part of the analysis prioritizes the failures for corrective action based on the probability of the item’s failure mode and the severity of its effects. It uses linguistic terms to rank the probability of the failure mode occurrence, the severity of its failure effect and the probability of the failure being detected on a numeric scale from 1 to 10. These rankings are then multiplied to give the Risk Priority Number. Failure modes having a high RPN are assumed to be more important and given a higher priority than those having a lower RPN [2]. In the RPN methodology the parameters used to determine the “criticality’ of an item failure mode are, the severity of its failure effects, its frequency of occurrence, and the likelihood that subsequent testing of the design will detect that the potential failure mode actually occurs. Tables I, II and III show the qualitative scales commonly used for the severity, the occurrence and the detectability indexes [3]. Severity is ranked according to the seriousness of the failure mode effect on the next higher level assembly, the system or the user. Occurrence is ranked according to the failure probability, which represents the relative number of failures anticipated during the design life of the item. The effects of a failure mode are normally described by the effects on the user of the product or as they would be seen by the user. Detectability is an assessment of the ability of a proposed design verification program to identify a potential weakness before the part or assembly is released for production. The RPN is a mathematical product of the severity, the occurrence and the detection. In equation form, RPN = S * O* D. The number is used to identify the most critical failure mode, leading to corrective action [4]. TABLE I: Detailed report of FMEA Severity 138
  4. 4. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME TABLE II Detailed report of FMEA Probability II: TABLE III: Detailed report of FMEA Delectability III 139
  5. 5. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME DRAWBACKS OF TRADITIONAL FMEA APPROACH The traditional FMEA has been a well-accepted safety analysis method; however, it suffers from several drawbacks. The first drawback is the method that the traditional FMEA employs to achieve a risk ranking. The purpose of ranking risk in order of importance is to assign the limited resources to the most critical risk items [8]. Traditional FMEA approach uses a RPN to evaluate the risk level of a component or process. The RPN is obtained by finding the multiplication of three factors, which are the severity of the failure (S), the probability of occurrence (O) and the probability of detection (D). The most critical disadvantage of the traditional FMEA is that various sets of S, O and D may produce an identical value of RPN; however, the risk implication may be totally different. For example, consider two different events having values of 2, 3, 2 and 4, 1, 3 for S, O and D respectively. Both these events will have a total RPN of 12 (RPN1 = 2x3x2 = 12 and RPN2 = 4x1x3 = 12), however, the risk implications of these two events may not necessarily be the same. This could entail a waste of resources and time or in some cases a high risk event going unnoticed. BACKGROUND Causal analysis Causal analysis is an approach used to identify the causes of defects. The main procedures of the causal analysis are item selection and analysis (CMMI Product Team, 2001). To select the defect items for analyzing, the defect classification schema can be adopted to categorize the reported defects (Chillarege et al., 1992), which can be prioritized according to frequency of occurrence, defect severity, cost of impact and type of defect (Mohapatra and Mohanty, 2001). The selected defects then can be further analyzed in detail in a causal analysis meeting, where brainstorming is a common approach in causal analysis. However, these methods focus on the reported defects rather than measuring the actions in advance, while measurement of actions can provide practical predictions to prevent defects from occurring. To define an action schema of actions, the Multi-User Dimension (MUD) refines the process into tasks, transac-tions and actions that can be used to support the data col-lection stage of the software development process (Doppke et al., 1997). The prediction model Data mining techniques can be applied to build models describing the behaviors of the processes from the collected data, and predict the possible results of the subsequent actions. The classification with decision tree is one of the common approaches for analyzing the data (Han and Kamber, 2001). To predict the action that is likely to cause defects, two major problems have to be solved before applying the clas-sification tree model for defect prediction, the rarity prob-lem and irrelevant feature problem. The rarity problem occurs because the number of actions that cause defects (the minority class) is small compared to the number of actions that do not cause any defects (the majority class). The sampling technique is commonly used to solve the rar-ity problem. Under-sampling can be used to reduce the number of the majority class, while over-sampling is used to increase the number in the minority class (Weiss, 2004). Selected attributes for classification may be redun-dant or irrelevant, causing actions to be classified incor-rectly. The feature subset selection can be applied to address the rarity problem, where only the relevant attri-butes are selected to construct the model (Dy and Brodley, 2000). The wrapper and filter are two common 140
  6. 6. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME approaches used for feature selection. The wrapper wraps the FSS and induction algorithm as a black box, where the feature sub-set is searched to find a good subset of features, and is eval-uated by the induction algorithm (Kohavi and John, 1996). To facili-tate the feature selection process, the search strategy can be utilized to select a desired feature subset within a reason-able time, such as the sequential forward search, hill climb-ing search and best-first search. The best-first search with forward search is a common method applied on CFS for feature selection, and achieves good results (Russell and Norvig, 1995). PROPOSED METHODOLOGY The Proposed Methodology has been developed based on Failure modes of effects and analysis and Action Based Defect Prediction. The integration of this methodology was carried out in an early stage of defect prediction. The execution of a software process can be treated as a sequence of action executed in sequence or parallel to achieve the objective of the project Fig.1. Fig. 1. FMEA-Action Based Defect Prediction Model The ABDP approach proposed herein treats the action as the basic element used to execute the task of the WBS. The action can be as small as an operation to correct a bug, or as large as coding a module. The execution of an action can be divided into three stages, namely planning, execution and reporting. The planning stage is performed to plan the execution of the action, such as the description of the action, the required resources of the action and the work products to be performed. The stakeholders can then perform the planned 141
  7. 7. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME action. The results of the performed action, such as the actual efforts used to execute the action, and the defects detected by the action, can be reported after execu-tion. A set of features must be defined to collect the data from the actions. PROCEDURE FOR EXECUTING AN ACTION First, the action is planned to determine the values of predecessor features, such as the action description, action type, originator and state. The predecessor features are then submitted to the prediction engine, and are predicted using an existing prediction model constructed from previously per-formed actions. The engine responds with predictions of the submitted action. The submitted action may need to be re-planned if a High-defect action is predicted, where the submitted action is defined as a High-defect action if the number of defects generated by the action is greater than 3, and as a Low-defect action otherwise. By using the ARDP approach, the data set of performed actions can be generated to build the prediction model. The number of generated defects of an action can be used to classify actions as low-defect (less than 3), medium-defects (between 3 and 5) and highdefect (more than 5). The pre-diction model can then be applied to predict the submitted action will cause high defects or not Fig.2. Fig. 2. The execution of an action For instance, an action to create a new module can be planned as follows (only some of the features are shown): • Action_State = 0 (scheduled) • Action_Type = N (create a new module) 142
  8. 8. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME • Action_Complexity = 0 (evaluated as Low complexity) • • Object_Type = 3 (work on application) Num_of_action_objects = 1 (one module will be worked on) • Originator = 4 (performed by programmer) • Link_By = -(this is a root action) • Effort_Expected = 6 (the efforts expected to execute the action) The action submitted with above features is predicted as High-defect according the values of Action_Type, Object_ Type, Action_Complexity and Effort_Expected. To avoid High-defect actions, certain modifications on the submitted action can be performed, such as decomposing the action into two or more actions to reduce the value of Effort_Expected (such as below 6). THE MAIN FEATURES OF ACTIONS Table 4 lists the main features of the action. The expected efforts and complexity of the action are evaluated by the actor in advance. The originator denotes the stake-holder who invokes the action. For instance, if the customer sends a change request, which is approved by the project manager, then the originator is the customer. The originator may not be the same person as the actor who performs the action. Although the actions vary in size, to reduce the complexity of individual factors, this study stip-ulates that one action can only be performed by one person in one task. Table IV: The main features of Actions 143
  9. 9. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME THE MAIN FEATURES OF DEFECTS Table 5 lists the main features of defects where the expected and actual used efforts to fix the defect can be retrieved from the expected and used efforts of Act_removed (the action used to remove the defect). Table V: The main features of Defects CONCLUSION This study presents an action-based defect prevention approach that can be applied to the software development process to detect actions that may cause many defects. The ABDP approach presented in this study classifies data collected from the reports of operations and defects of the project. The main advantage utilizing ABDP is that the actions likely to produce defects can be predicted prior to their execution. The in-process analysis also can reduce the variance between different projects. Second, the features utilized in ABDP to build the prediction model can be adapted from the existing process where the effort involved in modifying the existing process for ABDP can be reduced. Third, the latest models can accurately predict the submitted actions to obtain a quick response. The detected actions not only provide the information to avoid possible defects, but also facilitate the software process improvement. REFERENCES 1. 2. 3. 4. Aversano, L., Lucia, A.D., Gaeta, M., Ritrovato, P., Stefanucci, S., Villani, M.L., 2004. Managing coordination and cooperation in distributed software processes: The GENESIS Environment. Software Process Improvement and Practice 9, 239–263. Boehm, B., Huang, L.G., 2003. Value-based software engineering: A case study. IEEE Computer 36 (3), 33–41. Card, D.N., 1993. Defect-causal analysis drives down error rates. IEEE Software 15 (1), 88–89. Card, D.N., 1998. Learning from our mistakes with defect causal analysis. IEEE Software 15 (1), 56–63. 144
  10. 10. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 15, 321– 357. Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K., Man-Yuen Wong, M.-Y., 1992. Orthogonal defect classi-fication – A concept for in-process measurements. IEEE Transactions on Software Engineering 18 (11), 943–956. Chrissis, M.B., Konrad, M., Shrum, S., 2003. CMMI guidelines for process integration and product improvement. Addison-Wesley, MA, pp. 143–155. CMMI Product Team. 2001. Capability Maturity Model Integration V1.1, Stage Representation. Software Engineering Institute, Carnegie Mel-lon University, Pittsburgh, USA. Doppke, J.C., Heimbigener, D.H., Wolf, A.L., 1997. Software process modeling and execution within virtual environments. ACM Transac-tions on Software Engineering and Methodology 7, 1–47. Drummond, C., Holte, R.C., 2003. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. In: Proceed-ings of the Workshop on Learning from Imbalance Data Sets II, International Conf. on Machine Learning. Dy, J.G., Brodley, C.E., 2000. Feature selection for unsupervised learning. Journal of Machine Learning Research 5, 845–889. Fleming, Q.W., 1998. Cost/Schedule Control Systems Criteris: The management Guide to C/SCSC. Probus. Florac, W.A., Carleton, A.D., 1999. Measuring the Software Process. Addison-Wesley, MA. K. G. Johnson and M. K. Khan, “A Study into the use of the Process Failure Mode and Effects Analysis (PFMEA) in the Automotive Industry in the UK”, Journal of Materials Processing Technology, 2003, vol.139,pp. 348-356. John B. Bowles and C. Enrique Peláez, “Fuzzy logic prioritization of failures in a system failure mode, effects and criticality analysis”, Journal of Reliability Engineering and System Safety, 1995, vol. 50, pp. 203-213. D. H. Stamatis, Failure Mode and Effects Analysis: FMEA from Theory to Execution, Productivity Press India Pvt. Ltd., Madras, 1997. Paul Palady, Failure Modes and Effects Analysis: Predicting & Preventing Problems before they Occur, PT Publications Inc., FL 33409, 1995. Rudiger Wirth, Bernd Berthold, Anita Kramer and Gerhard Peter, “Knowledge-based Support of System Analysis for the Analysis of Failure Modes and Effects”, Journal of Artificial Intelligent, 1996, vol. 9, no. 3, pp. 219-229. Fiorenzo Franceschini and Maurizio Galetto, “A New Approach for Evaluation of Risk Priorities of Failure Modes in FMEA”, International Journal of Production Research, 2001, vol.39, no.13, pp. 2991-3002. N. Ravishankar and B. S. Prabhu,“Modified Approach for Prioritization of Failures in a System Failure Mode and Effects Analysis”, International Journal of Quality and Reliability Management, 2001, vol. 18, no. 3, pp.324-335. Anand Pillay and Jin Wang, “Modified failure mode and effects analysis using approximate reasoning”, Journal of Reliability Engineering and System Safety, 2003, vol. 79, pp. 69-85. Seung J. Rhee, and Kosuke Ishii, “Using Cost based FMEA to Enhance Reliability and Serviceability”, Journal of Advanced Engineering Informatics, 2003, vol.17, pp. 179188. 145
  11. 11. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME 24. S. M. Seyed-Hosseini, N. Safaei and M. J. Asgharpour, “Reprioritization of failures in a system failure mode and effects analysis by decision making trial and evaluation laboratory technique”, Journal of Reliability Engineering & System Safety, 2006, vol. 91, issue 8, pp. 872 – 881. 25. V. P. Arunachalam and C. Jegadheesan, “Modified Failure Mode and effects Analysis: A Reliability and Cost-based Approach”, The ICFAI Journal of Operations Management, 2006, pp. 7-20. 26. Chensong Dong, “Failure mode and effects analysis based on fuzzy utility cost estimation”, International Journal of Quality & Reliability Management, 2007, vol. 24, issue 9, pp. 958 – 971. 27. Dillibabu .R, Krishnaiah .K, “Cost estimation of a software product using COCOMO II.2000model – a case study”, International Journal of Project Management 23 (2005) 297-307. 28. Jih Kuang Chen, “Utility Priority Number Evaluation for FMEA”, Journal of Failure Analysis and Prevention, 2007, vol. 7, no. 5, pp. 321 – 328. 29. Ying-Ming Wang, Kwai-Sang Chin, Gary Ka Kwai Poon and Jian-Bo Tang, “Risk evaluation in failure mode and effects analysis using fuzzy weighted geometric mean”, Journal of Expert Systems with Applications, to be published. 30. A.Mariajayaprakash, Dr.T.Senthilvelan and K.P.Vivekananthan, “Optimisation of Shock Absorber Parameters using Failure Mode and Effect Analysis and Taguchi Method”, International Journal of Mechanical Engineering & Technology (IJMET), Volume 3, Issue 2, 2012, pp. 328 - 345, ISSN Print: 0976 – 6340, ISSN Online: 0976 – 6359. 31. J. Arun, S. Pravin Kumar, M. Venkatesh and A.S. Giridharan, “A Detailed Study on Process Failure Mode and Effect Analysis of Punching Process”, International Journal of Industrial Engineering Research and Development (IJIERD), Volume 4, Issue 3, 2013, pp. 1 - 12, ISSN Online: 0976 - 6979, ISSN Print: 0976 – 6987. 32. Pravin Kumar.S, Venkatakrishnan.R and Vignesh Babu.S, “Process Failure Mode and Effect Analysis on End Milling Process- A Critical Study”, International Journal of Mechanical Engineering & Technology (IJMET), Volume 4, Issue 5, 2013, pp. 191 - 199, ISSN Print: 0976 – 6340, ISSN Online: 0976 – 6359. 146