50320130403010

International Journal of Information Technology & Management Information System (IJITMIS), ISSN
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY &
0976 – 6405(Print), ISSN 0976 – 6413(Online) Volume 4, Issue 3, September - December (2013), © IAEME

MANAGEMENT INFORMATION SYSTEM (IJITMIS)

ISSN 0976 – 6405(Print)
ISSN 0976 – 6413(Online)
Volume 4, Issue 3, September - December (2013), pp. 136-146
© IAEME: http://www.iaeme.com/IJITMIS.asp
Journal Impact Factor (2013): 5.2372 (Calculated by GISI)
www.jifactor.com

IJITMIS
©IAEME

PREDICTING SOFTWARE DEFECTS WITH ASSOCIATION MINING
TO DISCOVER DEFECT PATTERN BY USING ABDP
Dr. R. Dillibabu1,

K. Karnavel2,

L. Sudha3

2

1

Research Scholar, Department of Industrial Engineering, Anna university, Chennai.
Assiociate Professor, Department of Industrial Engineering, Anna University, Chennai.
3
PG student, Department of Management Studies, Anna University, Chennai.

ABSTRACT
This paper presents an efficient problem solving approach and detecting defects at an
early stage, which is based on the Action-Based Defect Predection(ABDP)and Feature
Subset Selection (FSS). Factors causing defects vary according to the different attributes of a
project, including the experience of the developers, the product complexity, the development
tools and the schedule. The most significant challenge for a project manager is to identify
actions that may incur defects before the action is performed. Actions performed in different
projects may yield different results, which are hard to predict in advance. This approach will
practically:(i) Accurately predicting actions that cause many defects by mining records of
performed actions. (ii) To address this problem, the under-sampling is applied to the data set
to increase the precision of predictions for subse-quence actions. (iii)It is applied to a
business project, revealing that under-sampling with FSS successfully predicts the
problematic actions during project execution. This method applies association rule mining to
find out defect patterns, and multi-interval discretization to hold the persistent attributes of
actions. The discovered patterns then can be applied to predict the defects generated by the
following actions and take required corrective actions to avoid defects. In addition to
improve software quality products, if two or more failure modes have the same RPN, it is
possible to prioritize the failure modes with the help of FMEA and Risk Priority Code (RPC).
If there is a tie situation in RPC, a more detailed selection can be done. The proposed
technique is useful to industry project, generous exceptional prediction results and
informative the efficiency of the proposed approach. The detected actions not only provide
the information to avoid possible defects, but also facilitate the software process
improvement.

136


Keywords: FMEA; Software defect prediction; Classification; Mining rarity
INTRODUCTION
Software defects not only influence the quality of the software products, but also can
increase the effort involved in a project. Furthermore, the undetected defects may need
significant effort to detect and remove in subsequent software development stages (Pressman,
2001). Additionally, the high defect rate is also an important factor in the cost and schedule
of the project overrun (Jones, 1994). Rather than detecting existing defects, the defect
prevention can be applied to prevent the defects from occurring, as in the Causal Analysis
and Resolution (CAR) in maturity level 5 (ML 5) of the Capability Maturity Model Integration (CMMI1). The objective of CAR is to detect, analyze, and prevent defects or other
process problems from occur-ring in the future (Chrissis et al., 2003).
This study proposes an action-based defect predection (ABDP) approach, which
applies classifies the records of the performed actions to predict whether the subsequent
actions cause defects in the same project. An action is defined herein as an operation
performed based on the task in Work Breakdown Structure (WBS) of the project. Rather than
focusing on the reported defects, ABDP mines the patterns of actions that may cause defects,
and uses the analytical results to predict whether the subsequent actions are likely to generate
defects. Once actions with high prob-ability of causing defects are identified, stakeholders
can review these actions carefully and take appropriate correc-tive actions. The newly
performed actions are continually appended to the historic data set to construct a new prediction model for subsequent actions. To address the imbal-anced data set problem where the
number of actions causing defects is fairly small, this study applies under-sam-pling
techniques to the data set, and compares the results with those of over sampling. The
comparison results indi-cate that under-sampling achieves more precise predictions than
over-sampling. ABDP also adopts the Feature Subset Selection (FSS) technique to filter out
the important attri-butes and thus improve the prediction accuracy. The advantages of
applying ABDP to measure the process are as follows:
• In-process prediction: The data used to construct the pre-diction model are obtained from
the same project that can decrease the variance between di erent projects.
• Requires less effort to collect data: Actions and defect reporting are common procedures
for most software teams, and the required data can be collected from these reports.
• Reduces the effort in identifying the problem in the pro-cess: The detected actions that are
likely to cause defects can be further analyzed and reviewed in the causal anal-ysis
meeting, thus reducing the effort involved in identi-fying problematic actions.
FAILURE MODE AND EFFECTS ANAYSIS
Failure Mode and Effects Analysis (FMEA) is commonly defined as “a systematic
process for identifying potential design and process failures before they occur, with the intent
to eliminate them or minimize the risk associated with them”. The FMEA technique was first
reported in the 1920s but its use has only been significantly documented since the early
1960s. It was developed in the USA in the 1960s by National Aeronautics Space Agency
(NASA) as a means of addressing a way to improve the reliability of military equipment. The
137


criticality part of the analysis prioritizes the failures for corrective action based on the
probability of the item’s failure mode and the severity of its effects. It uses linguistic terms to
rank the probability of the failure mode occurrence, the severity of its failure effect and the
probability of the failure being detected on a numeric scale from 1 to 10. These rankings are
then multiplied to give the Risk Priority Number. Failure modes having a high RPN are
assumed to be more important and given a higher priority than those having a lower RPN [2].
In the RPN methodology the parameters used to determine the “criticality’ of an item failure
mode are, the severity of its failure effects, its frequency of occurrence, and the likelihood
that subsequent testing of the design will detect that the potential failure mode actually
occurs. Tables I, II and III show the qualitative scales commonly used for the severity, the
occurrence and the detectability indexes [3]. Severity is ranked according to the seriousness
of the failure mode effect on the next higher level assembly, the system or the user.
Occurrence is ranked according to the failure probability, which represents the relative
number of failures anticipated during the design life of the item. The effects of a failure mode
are normally described by the effects on the user of the product or as they would be seen by
the user. Detectability is an assessment of the ability of a proposed design verification
program to identify a potential weakness before the part or assembly is released for
production. The RPN is a mathematical product of the severity, the occurrence and the
detection. In equation form, RPN = S * O* D. The number is used to identify the most critical
failure mode, leading to corrective action [4].
TABLE I: Detailed report of FMEA Severity

138


TABLE II Detailed report of FMEA Probability
II:

TABLE III: Detailed report of FMEA Delectability
III

139


DRAWBACKS OF TRADITIONAL FMEA APPROACH
The traditional FMEA has been a well-accepted safety analysis method; however, it
suffers from several drawbacks. The first drawback is the method that the traditional FMEA
employs to achieve a risk ranking. The purpose of ranking risk in order of importance is to
assign the limited resources to the most critical risk items [8]. Traditional FMEA approach
uses a RPN to evaluate the risk level of a component or process. The RPN is obtained by
finding the multiplication of three factors, which are the severity of the failure (S), the
probability of occurrence (O) and the probability of detection (D). The most critical
disadvantage of the traditional FMEA is that various sets of S, O and D may produce an
identical value of RPN; however, the risk implication may be totally different. For example,
consider two different events having values of 2, 3, 2 and 4, 1, 3 for S, O and D respectively.
Both these events will have a total RPN of 12 (RPN1 = 2x3x2 = 12 and RPN2 = 4x1x3 = 12),
however, the risk implications of these two events may not necessarily be the same. This
could entail a waste of resources and time or in some cases a high risk event going unnoticed.
BACKGROUND
Causal analysis
Causal analysis is an approach used to identify the causes of defects. The main
procedures of the causal analysis are item selection and analysis (CMMI Product Team,
2001). To select the defect items for analyzing, the defect classification schema can be
adopted to categorize the reported defects (Chillarege et al., 1992), which can be prioritized
according to frequency of occurrence, defect severity, cost of impact and type of defect
(Mohapatra and Mohanty, 2001).
The selected defects then can be further analyzed in detail in a causal analysis
meeting, where brainstorming is a common approach in causal analysis. However, these
methods focus on the reported defects rather than measuring the actions in advance, while
measurement of actions can provide practical predictions to prevent defects from occurring.
To define an action schema of actions, the Multi-User Dimension (MUD) refines the process
into tasks, transac-tions and actions that can be used to support the data col-lection stage of
the software development process (Doppke et al., 1997).
The prediction model
Data mining techniques can be applied to build models describing the behaviors of the
processes from the collected data, and predict the possible results of the subsequent actions.
The classification with decision tree is one of the common approaches for analyzing the data
(Han and Kamber, 2001). To predict the action that is likely to cause defects, two major
problems have to be solved before applying the clas-sification tree model for defect
prediction, the rarity prob-lem and irrelevant feature problem. The rarity problem occurs
because the number of actions that cause defects (the minority class) is small compared to the
number of actions that do not cause any defects (the majority class). The sampling technique
is commonly used to solve the rar-ity problem. Under-sampling can be used to reduce the
number of the majority class, while over-sampling is used to increase the number in the
minority class (Weiss, 2004). Selected attributes for classification may be redun-dant or
irrelevant, causing actions to be classified incor-rectly. The feature subset selection can be
applied to address the rarity problem, where only the relevant attri-butes are selected to
construct the model (Dy and Brodley, 2000). The wrapper and filter are two common
140


approaches used for feature selection. The wrapper wraps the FSS and induction algorithm as
a black box, where the feature sub-set is searched to find a good subset of features, and is
eval-uated by the induction algorithm (Kohavi and John, 1996). To facili-tate the feature
selection process, the search strategy can be utilized to select a desired feature subset within a
reason-able time, such as the sequential forward search, hill climb-ing search and best-first
search. The best-first search with forward search is a common method applied on CFS for
feature selection, and achieves good results (Russell and Norvig, 1995).
PROPOSED METHODOLOGY
The Proposed Methodology has been developed based on Failure modes of effects and
analysis and Action Based Defect Prediction. The integration of this methodology was
carried out in an early stage of defect prediction. The execution of a software process can be
treated as a sequence of action executed in sequence or parallel to achieve the objective of the
project Fig.1.

Fig. 1. FMEA-Action Based Defect Prediction Model
The ABDP approach proposed herein treats the action as the basic element used to
execute the task of the WBS. The action can be as small as an operation to correct a bug, or
as large as coding a module. The execution of an action can be divided into three stages,
namely planning, execution and reporting. The planning stage is performed to plan the
execution of the action, such as the description of the action, the required resources of the
action and the work products to be performed. The stakeholders can then perform the planned
141


action. The results of the performed action, such as the actual eﬀorts used to execute the
action, and the defects detected by the action, can be reported after execu-tion. A set of
features must be defined to collect the data from the actions.
PROCEDURE FOR EXECUTING AN ACTION
First, the action is planned to determine the values of predecessor features, such as the
action description, action type, originator and state. The predecessor features are then submitted to the prediction engine, and are predicted using an existing prediction model
constructed from previously per-formed actions. The engine responds with predictions of the
submitted action. The submitted action may need to be re-planned if a High-defect action is
predicted, where the submitted action is defined as a High-defect action if the number of
defects generated by the action is greater than 3, and as a Low-defect action otherwise.
By using the ARDP approach, the data set of performed actions can be generated to
build the prediction model. The number of generated defects of an action can be used to
classify actions as low-defect (less than 3), medium-defects (between 3 and 5) and highdefect (more than 5). The pre-diction model can then be applied to predict the submitted
action will cause high defects or not Fig.2.

Fig. 2. The execution of an action
For instance, an action to create a new module can be planned as follows (only some of the
features are shown):
• Action_State = 0 (scheduled)
• Action_Type = N (create a new module)
142


•

Action_Complexity = 0 (evaluated as Low complexity)

•
•

Object_Type = 3 (work on application)
Num_of_action_objects = 1 (one module will be worked on)

•

Originator = 4 (performed by programmer)

•

Link_By = -(this is a root action)

•

Effort_Expected = 6 (the efforts expected to execute the action)

The action submitted with above features is predicted as High-defect according the
values of Action_Type, Object_ Type, Action_Complexity and Effort_Expected. To avoid
High-defect actions, certain modifications on the submitted action can be performed, such as
decomposing the action into two or more actions to reduce the value of Effort_Expected
(such as below 6).
THE MAIN FEATURES OF ACTIONS
Table 4 lists the main features of the action. The expected efforts and complexity of
the action are evaluated by the actor in advance. The originator denotes the stake-holder who
invokes the action. For instance, if the customer sends a change request, which is approved
by the project manager, then the originator is the customer. The originator may not be the
same person as the actor who performs the action. Although the actions vary in size, to
reduce the complexity of individual factors, this study stip-ulates that one action can only be
performed by one person in one task.
Table IV: The main features of Actions

143


THE MAIN FEATURES OF DEFECTS
Table 5 lists the main features of defects where the expected and actual used efforts to
fix the defect can be retrieved from the expected and used efforts of Act_removed (the action
used to remove the defect).
Table V: The main features of Defects

CONCLUSION
This study presents an action-based defect prevention approach that can be applied to
the software development process to detect actions that may cause many defects. The ABDP
approach presented in this study classifies data collected from the reports of operations and
defects of the project. The main advantage utilizing ABDP is that the actions likely to
produce defects can be predicted prior to their execution. The in-process analysis also can
reduce the variance between different projects. Second, the features utilized in ABDP to build
the prediction model can be adapted from the existing process where the effort involved in
modifying the existing process for ABDP can be reduced. Third, the latest models can
accurately predict the submitted actions to obtain a quick response. The detected actions not
only provide the information to avoid possible defects, but also facilitate the software process
improvement.
REFERENCES
1.

2.
3.
4.

Aversano, L., Lucia, A.D., Gaeta, M., Ritrovato, P., Stefanucci, S., Villani, M.L.,
2004. Managing coordination and cooperation in distributed software processes: The
GENESIS Environment. Software Process Improvement and Practice 9, 239–263.
Boehm, B., Huang, L.G., 2003. Value-based software engineering: A case study. IEEE
Computer 36 (3), 33–41.
Card, D.N., 1993. Defect-causal analysis drives down error rates. IEEE Software 15
(1), 88–89.
Card, D.N., 1998. Learning from our mistakes with defect causal analysis. IEEE
Software 15 (1), 56–63.
144


5.

6.
7.
8.
9.

10.

11.

12.
13.
14.
15.

16.

17.
18.
19.

20.

21.

22.

23.

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research 15, 321–
357.
Chillarege, R., Bhandari, I.S., Chaar, J.K., Halliday, M.J., Moebus, D.S., Ray, B.K.,
Man-Yuen Wong, M.-Y., 1992.
Orthogonal defect classi-fication – A concept for in-process measurements. IEEE
Transactions on Software Engineering 18 (11), 943–956.
Chrissis, M.B., Konrad, M., Shrum, S., 2003. CMMI guidelines for process integration
and product improvement. Addison-Wesley, MA, pp. 143–155.
CMMI Product Team. 2001. Capability Maturity Model Integration V1.1, Stage
Representation. Software Engineering Institute, Carnegie Mel-lon University,
Pittsburgh, USA.
Doppke, J.C., Heimbigener, D.H., Wolf, A.L., 1997. Software process modeling and
execution within virtual environments. ACM Transac-tions on Software Engineering
and Methodology 7, 1–47.
Drummond, C., Holte, R.C., 2003. C4.5, Class Imbalance, and Cost Sensitivity: Why
Under-Sampling beats Over-Sampling. In: Proceed-ings of the Workshop on Learning
from Imbalance Data Sets II, International Conf. on Machine Learning.
Dy, J.G., Brodley, C.E., 2000. Feature selection for unsupervised learning. Journal of
Machine Learning Research 5, 845–889.
Fleming, Q.W., 1998. Cost/Schedule Control Systems Criteris: The management
Guide to C/SCSC. Probus. Florac,
W.A., Carleton, A.D., 1999. Measuring the Software Process. Addison-Wesley, MA.
K. G. Johnson and M. K. Khan, “A Study into the use of the Process Failure Mode and
Effects Analysis (PFMEA) in the Automotive Industry in the UK”, Journal of
Materials Processing Technology, 2003, vol.139,pp. 348-356.
John B. Bowles and C. Enrique Peláez, “Fuzzy logic prioritization of failures in a
system failure mode, effects and criticality analysis”, Journal of Reliability
Engineering and System Safety, 1995, vol. 50, pp. 203-213.
D. H. Stamatis, Failure Mode and Effects Analysis: FMEA from Theory to Execution,
Productivity Press India Pvt. Ltd., Madras, 1997.
Paul Palady, Failure Modes and Effects Analysis: Predicting & Preventing Problems
before they Occur, PT Publications Inc., FL 33409, 1995.
Rudiger Wirth, Bernd Berthold, Anita Kramer and Gerhard Peter, “Knowledge-based
Support of System Analysis for the Analysis of Failure Modes and Effects”, Journal of
Artificial Intelligent, 1996, vol. 9, no. 3, pp. 219-229.
Fiorenzo Franceschini and Maurizio Galetto, “A New Approach for Evaluation of Risk
Priorities of Failure Modes in FMEA”, International Journal of Production Research,
2001, vol.39, no.13, pp. 2991-3002.
N. Ravishankar and B. S. Prabhu,“Modified Approach for Prioritization of Failures in
a System Failure Mode and Effects Analysis”, International Journal of Quality and
Reliability Management, 2001, vol. 18, no. 3, pp.324-335.
Anand Pillay and Jin Wang, “Modified failure mode and effects analysis using
approximate reasoning”, Journal of Reliability Engineering and System Safety, 2003,
vol. 79, pp. 69-85.
Seung J. Rhee, and Kosuke Ishii, “Using Cost based FMEA to Enhance Reliability and
Serviceability”, Journal of Advanced Engineering Informatics, 2003, vol.17, pp. 179188.
145


24. S. M. Seyed-Hosseini, N. Safaei and M. J. Asgharpour, “Reprioritization of failures in
a system failure mode and effects analysis by decision making trial and evaluation
laboratory technique”, Journal of Reliability Engineering & System Safety, 2006, vol.
91, issue 8, pp. 872 – 881.
25. V. P. Arunachalam and C. Jegadheesan, “Modified Failure Mode and effects Analysis:
A Reliability and Cost-based Approach”, The ICFAI Journal of Operations
Management, 2006, pp. 7-20.
26. Chensong Dong, “Failure mode and effects analysis based on fuzzy utility cost
estimation”, International Journal of Quality & Reliability Management, 2007, vol. 24,
issue 9, pp. 958 – 971.
27. Dillibabu .R, Krishnaiah .K, “Cost estimation of a software product using COCOMO
II.2000model – a case study”, International Journal of Project Management 23 (2005)
297-307.
28. Jih Kuang Chen, “Utility Priority Number Evaluation for FMEA”, Journal of Failure
Analysis and Prevention, 2007, vol. 7, no. 5, pp. 321 – 328.
29. Ying-Ming Wang, Kwai-Sang Chin, Gary Ka Kwai Poon and Jian-Bo Tang, “Risk
evaluation in failure mode and effects analysis using fuzzy weighted geometric mean”,
Journal of Expert Systems with Applications, to be published.
30. A.Mariajayaprakash, Dr.T.Senthilvelan and K.P.Vivekananthan, “Optimisation of
Shock Absorber Parameters using Failure Mode and Effect Analysis and Taguchi
Method”, International Journal of Mechanical Engineering & Technology (IJMET),
Volume 3, Issue 2, 2012, pp. 328 - 345, ISSN Print: 0976 – 6340, ISSN Online:
0976 – 6359.
31. J. Arun, S. Pravin Kumar, M. Venkatesh and A.S. Giridharan, “A Detailed Study on
Process Failure Mode and Effect Analysis of Punching Process”, International Journal
of Industrial Engineering Research and Development (IJIERD), Volume 4, Issue 3,
2013, pp. 1 - 12, ISSN Online: 0976 - 6979, ISSN Print: 0976 – 6987.
32. Pravin Kumar.S, Venkatakrishnan.R and Vignesh Babu.S, “Process Failure Mode and
Effect Analysis on End Milling Process- A Critical Study”, International Journal of
Mechanical Engineering & Technology (IJMET), Volume 4, Issue 5, 2013,
pp. 191 - 199, ISSN Print: 0976 – 6340, ISSN Online: 0976 – 6359.

146

50320130403010

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (9)

Similar to 50320130403010

Similar to 50320130403010 (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

50320130403010