A Machine Learning approach to predict Software Defects

A Machine Learning approach to predict Software
Defects
Chetan Hireholi
February 19, 2017

Abstract
Software engineering teams are not only involved in developing new versions of a
product but are often involved in fixing customer reported defects. Customers report issues
faced by a particular software and some of these issues may actually require the engineering
team to analyze and potentially provide a fix for the issue. The number of defects that a
software engineering team has to analyze is significant and teams often prioritize the order
of the defects based on the customer’s priority and to the extent the defect impacts the
business operations of the customer. Often, it is likely that the engineering team may not
truly understand the business impact that a defect is likely to have and this results in the
customer escalating the defect up the engineering team’s management chain seeking more
immediate attention to their problem.
Such escalated defects tend to consume a lot of engineering bandwidth
and increase defect handling costs; further such escalations impact existing plans as all crit-
ical resources are used to handle these cases. Software escalations are classified under three
categories: Red, Yellow and Green. The software defect report containing a high business
value is prioritized and is marked as Red, Yellow being the neutral one, which might be
escalated if appropriate attention is not given and the Green reports have the low priority.
The engineering team understands the nature of the software escalation and allocates the
resources appropriately. The objective of this project is to be able to analyze software defects
and predict the defects that are likely to be escalated by the customer. This would permit
the engineering team to be alerted about the situation and take proactive measures that
will give better support to the customers. For the purpose of our analysis, we have used the
defects database provided by Hewlett-Packard (HP) India. We have used the concepts of R
programming for cleaning the database and to pre-process it. We then extracted keywords
using natural language processing and then used machine learning (J 48 decision tree, Na¨ıve
Bayes and Simple K Means) to predict the escalations. Thus by combining the key words
and the tickets received to the team, we can predict the nature of the Escalation and alert
the engineering team so that they can take respective steps appropriately.
i

Acknowledgment
I would like to take this to thank a lot of eminent personalities, without whose
constant encouragement and support, the endeavor of mine would not have been successful.
Firstly, I would like to thank the PES University, for having ”Final Year Project”
as a part of my curriculum, which gave me a wonderful opportunity to work on research and
presentation abilities, and provided excellent facilities, without which, this project could not
have acquired the orientation, it has now.
At the outset I would like to venerate Prof. Nitin V. Pujari, Chairperson, PES
University, who toned me encompassing the attitude towards the subject implied in this
literary work.
It gives me immense pleasure to thank Dr. K. V. Subramaniam, Department
of Computer Science and Engineering, PES University for his continuous support, advice
and guidance.
I would also like to thank Dr. Jayashree R., Department of Computer Sci-
ence and Engineering, PES University for her initiative and support which made this work
possible.
ii

Contents
Abstract i
Acknowledgment ii
List of ﬁgures v
1 Introduction 1
1.1 Problem Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 A Probabilistic Model for Software Defect Prediction . . . . . . . . . . . . . 4
2.2 Predicting Bugs from History . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Exploring the Dataset 8
3.1 Incident Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Change Requests(CR) Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Cleaning the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Defect Escalation Analysis 14
4.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.1 Analyzing the Incidents Dataset . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Analyzing Change Requests Dataset . . . . . . . . . . . . . . . . . . 21
4.2 Applying Machine Learning on the Dataset . . . . . . . . . . . . . . . . . . . 26
4.2.1 Classifying the Incidents Data . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2 Clustering the Incidents Data . . . . . . . . . . . . . . . . . . . . . . 36
4.2.3 Text Mining and Natural Language Tool Kit (NLTK) . . . . . . . . . 42
5 Results and Conclusions 51
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bibliography 53

List of Figures
1 Predicting Bugs from History- Commonly used complexity metrics . . . . . . 5
2 Life Cycle of a Defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Cleansing in OpenReﬁne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Data Transformation in Microsoft Excel . . . . . . . . . . . . . . . . . . . . 12
5 Distribution of Incident Escalations . . . . . . . . . . . . . . . . . . . . . . . 14
6 Analyzing RED Incidents: Customers vs Escalations . . . . . . . . . . . . . 15
7 Analyzing RED Incidents: Modules vs Escalations . . . . . . . . . . . . . . . 17
8 Analyzing RED Incidents: Software release vs Escalations . . . . . . . . . . 17
9 S/w release vs Escalations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10 Analyzing RED Incidents: OS vs Escalations . . . . . . . . . . . . . . . . . . 18
11 Analyzing RED Incidents: Developer vs Escalations . . . . . . . . . . . . . . 19
12 Other observations made on Incidents . . . . . . . . . . . . . . . . . . . . . . 20
13 Analyzing CR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
14 Analyzing CR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
15 Analyzing CRs: Customers vs Escalations . . . . . . . . . . . . . . . . . . . 22
16 Analyzing CRs: Modules vs Escalations . . . . . . . . . . . . . . . . . . . . . 23
17 Analyzing CRs: S/w release vs Escalations . . . . . . . . . . . . . . . . . . . 23
18 Analyzing CRs: OS vs Escalations . . . . . . . . . . . . . . . . . . . . . . . 24
19 Analyzing CRs: Developer vs Escalations . . . . . . . . . . . . . . . . . . . . 25
20 Classifying using: J48 Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
21 Classifying using: J48 Tree: Prefuse Tree . . . . . . . . . . . . . . . . . . . . 28
22 Module as root node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
23 Probability for MODULE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
24 Probability distribustion for SERVERITY SHORT . . . . . . . . . . . . . . 31
25 ESCALATION as the root node . . . . . . . . . . . . . . . . . . . . . . . . . 31
26 Probability distribution table for ESCALATION . . . . . . . . . . . . . . . . 32
27 Probability distribution table for EXPECTATION . . . . . . . . . . . . . . . 33
28 SEVERITY SHORT as the root node . . . . . . . . . . . . . . . . . . . . . . 33
iv

29 Probability distribution for SEVERITY SHORT . . . . . . . . . . . . . . . . 34
30 Probability distribution for Customer Expectations . . . . . . . . . . . . . . 35
31 Final Cluster Centroids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
32 Model and evaluation on training set . . . . . . . . . . . . . . . . . . . . . . 37
33 Cluster Centroids I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
34 Cluster Centroids II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
35 Total number of Incidents and thier Escalation count . . . . . . . . . . . . . 42
36 Words with highest frequency mined on GREEN tickets escalated to RED . 44
37 GREEN tickets escalated to RED . . . . . . . . . . . . . . . . . . . . . . . . 44
38 Words with highest frequency mined on GREEN tickets escalated to YELLOW 45
39 GREEN ticket escalated to YELLOW . . . . . . . . . . . . . . . . . . . . . . 45
40 Words with highest frequency mined on GREEN tickets escalated to YELLOW 46
41 YELLOW ticket escalated to RED . . . . . . . . . . . . . . . . . . . . . . . 46
42 Observations made on RED ticket were Escalated . . . . . . . . . . . . . . . 47
43 Plotting the highest mined words . . . . . . . . . . . . . . . . . . . . . . . . 47
44 Words with highest frequency mined . . . . . . . . . . . . . . . . . . . . . . 48
45 Plotting the words with highest frequency mined . . . . . . . . . . . . . . . . 48
46 Words with highest frequency mined . . . . . . . . . . . . . . . . . . . . . . 49
47 Plotting the words with highest frequency mined . . . . . . . . . . . . . . . . 49
48 Output of the program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
v

1 Introduction
In spite of diligent planning, documentation, and proper process adherence in
software development, occurrences of defects are inevitable. In today’s cutting edge com-
petition, it is important to make conscious efforts to control and minimize these defects by
using techniques to allow in-process quality monitoring and control. Predicting the total
number of defects before testing begins improves the quality of the product being delivered
and helps in planning and decision making for future project releases.
Defect Prediction in software is viewed as one of the most useful and cost efficient
operation. Software developers see it as a vital phase on which the quality of the product
being developed depends. It has taken up major part in bringing down the allegations on
the software industry, of being incapable to deliver the requirements within budget and on
time. Besides this, the clients response regarding the product quality has shown a large shift
from unsatisfactory to satisfactory.
Today, many data miners have replaced the earlier statistical approaches for defect
prediction. The basis of data mining is the classification model which places the component
in one of the two classes: fault prone or non fault prone.
Defects reported to the engineering team carry important information that may
lead to various important decisions. These defects are reported as Tickets. Each ticket
contain the nature of the escalation which is based on the business value.
These tickets may also give valuable information regarding the manner in which:
• Engineering defects are collected as part of the Quality & Analysis(QA) cycle – of
a software product or software application
• Product management which manages the usage and direction of the product
• Escalations: How often an Escalation happens?, What modules of the application is
buggy, and requires more attention
• Who has worked on the code-base? – during Development, QA and defect/ticket/in-
cidence fixing, etc.
1

1.1 Problem Deﬁnition
• Determine causes for Defects during the Engineering phase which may lead
to Escalation of Customer Support Cases
While most software defects are corrected and tested as part of the prolonged software
development cycle, enterprise software vendors often have to release software products
before all reported defects are corrected, due to deadlines and limited resources.
A small number of these reported defects will be escalated by customers whose busi-
nesses are seriously impacted. Escalated defects must be resolved immediately and
individually by the software vendors at a very high cost. The total costs can be even
greater, including loss of reputation, satisfaction, loyalty, and repeat revenue.
• Build a Recommendation Engine to alert on Escalation based on the nature
of defects to create an alert system for the team, given one such escalation
that has happened
By this Alerting Mechanism, the team can take proactive steps in preventing the
Escalation which is going to happen.
2

1.2 Motivation
• Market Research companies especially, Gartner and Forrester
Predicts have conducted survey that 80% of the IT budgets goes towards main-
tenance of applications
There is a dismal success rate of 2% of the new projects being successful (funded
from the 20% of the budget allocated to IT)
The research hopes to ﬁnding the way tickets information (in our opinion – wealth
of information, which is being neglected so far) is being looked at
• Mine information that are hidden and are lost in the tickets dump. The mined infor-
mation will then be useful to deduce important details - which would help the Project
Manager of the team to plan out the activities appropriately.
3

2 Literature Survey
In this process we have found many works related to the software bugs prediction
and which has helped us in understanding the kind of knowledge that can be captured by
bugs. The following are the work carried out by the specific persons in the area of the Soft-
ware Defect Prediction:
2.1 A Probabilistic Model for Software Defect Prediction
Although a number of approaches have been taken to quality prediction for software, none
have achieved widespread applicability. The author’s aim here is to produce a single model
to combine the diverse forms of, often causal, evidence available in software development
in a more natural and efficient way than done previously. The authors use graphical prob-
ability models (also known as Bayesian Belief Networks) as the appropriate formalism for
representing this evidence. The authors have used the subjective judgments of experienced
project managers to build the probability model and use this model to produce forecasts
about the software quality throughout the development life cycle. Moreover, the causal or
influence structure of the model more naturally mirrors the real world sequence of events and
relations than can be achieved with other formalism. We used WEKA in order to apply the
Bayesian Network Classifier. We selected the attributed: Escalation, Expectation, Modules
& Severity. Then by rotating the attributes as the root nodes, results were captured.
A disadvantage of a reliability model of this complexity is the amount of data that is needed
to support a statistically significant validation study. More detailed description regarding
the application of Bayesian Classification is covered in section 4.3.[PMDM]
4

2.2 Predicting Bugs from History
Version and bug databases contain a wealth of information about software failures — how
the failure occurred, who was affected, and how it was fixed. Such defect information can be
automatically mined from software archives; and it frequently turns out that some modules
are far more defect-prone than others. How do these differences come to be?
The authors have researched how code properties like (a) code complexity, (b) the problem
domain, (c) past history, or (d) process quality affect software defects, and how their cor-
relation with defects in the past can be used to predict future software properties — where
the defects are, how to fix them, as well as the associated cost.[PRBD]
Figure 1:
Commonly used complexity metrics
5

Learning from history means learning from successes and failures — and how to make the
right decisions in the future. In our case, the history of successes and failures is provided
by the bug database: systematic mining uncovers which modules are most prone to defects
and failures. Correlating defects with complexity metrics or the problem domain is useful
in predicting problems for new or evolved components. Learning from history has one big
advantage: one can focus on the aspect of history that is most relevant for the current situ-
ation. Thus the history data provided to us by Helwet Packard(HP): which consisted of the
incident and the change requests data proved helpful during the statistical analysis. More
descriptive coverage of the statistical analysis is covered in section 4.2. The dataset given to
us by HP is explained in detail in section 3.
Some more research work carried out by few people on similar problem domain:
• Work done using the Machine Learning approach:
Predicting Effort to Fix Software Bugs [PEFS]:
The authors have used the K - Nearest Neighbor approach to predict the effort
put by an engineer to fix the software bugs. In this study carried out by the
authors, their technique leverages existing issue tracking systems: given a new
issue report, they search for similar, earlier reports and use their average time
as a prediction. This technique will not be much useful in our problem domain,
since the work done here focuses on the assignment of the job to a resource for
which the effort is estimated.
Cost-Sensitive Learning for Defect Escalation [CSDE]:
The authors have established that the Cost - Sensitive Decision Tress is the best
methods for producing the highest positive net profit. Our approach on the defect
reports is different from this, since we are not focusing on the cost aspect, but
on the Escalation of a defect report. We have used Decision Trees in the form of
Apriori Algorithms to derive rules with respect to Escalation.
6

Predicting Failures with Hidden Markov Models [PHMM]:
The authors have come up with an approach using Hidden Markov Model(HMM)
to recognize the patterns in failures. Since HMMs give accurate outputs on lesser
attributes, we cannot use this approach in recognizing the defect patterns in the
defect reports.
Data Mining Based Social Network Analysis from Online Behavior [DSNA]: The
authors have used the Neural Networks and done sentimental analysis on the social
networks to predict the online behavior of people. The approach used in sentiment
analysis have given me insights on Natural Language Tool Kit(NLTK) and how
NLTK can be used to ﬁnd the sentiments of the user data. This motivated me
to pick up NLTK to analyze the tickets dataset provided by Hewlett Packard.
Detailed information on the application of NLTK is explained in section 4.
7

3 Exploring the Dataset
The methodology describes on exploring the data set acquired from Hewlett Packard(HP)
and closely analyzing it. This is the beginning phase of the project where the data is under-
stood and meaningful analysis is done. This gives an high level overview on the datasets.
HP had provided two datasets: Customer Incident data set and Change Requests(CR) data
set.
The Incidents dataset had the customer cases. These cases included - troubleshooting errors,
field issues, installation issues, environment issues and all other cases which are related to
the Software. When a customer logs a unique case which the team identifies it as a Change
Request, and a respective entry is made in the CR dataset.
Below are the characteristics of Incident & CR dataset:
3.1 Incident Dataset
• Contains 153 headers with 6,433 entries
Few important characteristics are:
Each entry has the owner assigned to respective entries
Each entry has an Unique ID called ISSUEID
The life line of each entry is captured and is represented numerically(For e.g.
DAYS TO OPEN, DAYS TO FIXED, etc.)
Each entry has an Escalation set by the CPE Support Team on consulting the
customer. The Escalation comes in 3 categories: RED, ORANGE & GREEN.
RED being the most high priority ticket, YELLOW being a potentially important
ticket and GREEN being a ticket with lesser business impact compared to the
Red and Yellow tickets.
Each entry has an Expectation set by the CPE Support team on the inputs by
the customer(E.g. Investigate Issue & Hotfix requested, Answer Question, Create
Enhancement, Investigate Issue, etc.)
8

The mail communication between the Developer and Customer can be found
under: NOTE CUSTOMER. This ﬁeld contains all the information about the
Defect being tracked with the team.
Each entry has a Severity set by the CPE Support Team on consulting the cus-
tomer. The Severity comes in 3 stages: Low, Medium & High.
Each entry has a date describing the date on which the case was Escalated. It
called QUIXY ESCALATED ON DATE & QUIXY ESCALATED YELLOW DATE.
Each entry has a RESOLUTION attribute, where it describes the resolution of
the case.
On observing the Incident data set, We needed to come up with a life cycle of how a defect
is tracked with the team.
The below ﬁgure illustrates the life cycle of the defect. This process is currently in use by
the team.
Figure 2:
Life Cycle of a Defect
9

3.2 Change Requests(CR) Dataset
The CRs dataset contains the Incident Cases which were identified as Change Requests.
• Contains 153 headers with 11960 entries
Few important characteristics are:
Each entry has the owner assigned to respective entries
Each entry has an Unique ID called ISSUEID
The life line of each entry is captured and is represented numerically(For e.g.
DAYS TO OPEN, DAYS TO FIXED, etc.)
Each entry has an Escalation set by the CPE Support Team on consulting the
customer. The Escalation comes as Y(Escalated) and N(Not Escalated)
Each entry has Expectation set by the CPE Support team on the inputs by the
customer(For e.g. Investigate Issue & Hotfix requested, Answer Question, Create
Enhancement, Investigate Issue, etc.)
The mail communication between the Developer and Customer can be found
under: NOTE CUSTOMER. This field contains all the information about the
Defect being tracked with the team.
Each entry has a Severity set by the CPE Support Team on consulting the cus-
tomer. The Severity comes in 3 stages: Low, Medium & High.
Each entry has a date describing the date on which the case was Escalated. It
called QUIXY ESCALATED ON DATE
Each entry has a RESOLUTION attribute, where it describes the resolution of
the case.
10

3.3 Cleaning the Dataset
After receiving the huge dataset the next approach was to clean the data. There were many
discrepancies in the dataset viz., presence of non-numeric values in the dates field, the data
in the rows were shifted to the left by 2 - 4 columns; due to this data - shift, the data were
not aligned to its specific header. The following steps were performed to clean the dataset.
• Removing Discrepancies
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data:
cleaning it; transforming it from one format into another; and extending it with web
services and external data.
This tool brought down the immense cleaning effort. OpenRefine helped to explore
large data sets with ease. It provided functions to transform the data to make it
unified. E.g. In the ”Customer” column, there were different names for a single
company. This tool helps in organizing varied names to a single one. Thus unifying
the word:- ”Vodafone”, ”Vodafone Inc”, ”vodafone” to a single name- ”Vodafone”. It
also help in removing special characters and make the text readable. It also takes care
of the case sensitivity of the context. i.e. we can edit the contents of multiple rows
using the feature: ”Text Facet”, shown in the figure next page.
• Removing the Unwanted Data
Microsoft Excel helped in converting the whole data set to a table. Then by moving
into the table, applying filters and filtering the data excluded all the noise. Then
further it was also possible to extract the pivot tables which favored in the statistical
analysis of the dataset.
11

Figure 3:
Figure 4:
Data Transformation in Microsoft Excel
12

• Removing Stop Words
The Text Mining (tm) package from R language helped converting the dataset into
a corpus. The pre - processing of the data is eﬃciently done by the ’tm’ package in
R. The various text transformations oﬀered by the tm package are: “removeNumbers”
“removePunctuation” “removeWords” “stemDocument” “stripWhitespace”.
13

4 Defect Escalation Analysis
After getting the data cleaned, the next approach was Statistical Analysis of the data. The
Statistical Analysis will further unveil some under - the - hood facts
4.1 Statistical Analysis
The initial Statistical Analyses of the the data received from HP- incidents.csv and crs.csv
was carried out by Microsoft Excel(MS Excel). The Pivot Charts obtained from MS Excel
helped in graphically analyzing the huge datasets.
4.1.1 Analyzing the Incidents Dataset
The incidents.csv contained all the customer incidents received reported to team.
Figure 5:
Distribution of Incident Escalations incidents.csv
There in total 125 RED Escalated incidents, 3831 GREEN incidents and 329 YEL-
LOW incidents in the dataset.
14

• Analyzing RED Incidents: Customers vs Escalations
In certain situations, the escalation of a task is necessary. For example, a user is doing
a task, and is unable to complete that task within a certain period of time. In such
cases, where the user/customer is completely blocked, a case is RED Escalated. These
RED Escalated Incident Cases are High Priority cases and have to be addressed on
high severity.
Companies RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BANK had the
highest number of RED Escalations.
Figure 6:
Analyzing RED Incidents: Customers vs Escalations
15

• Company behavior analysis: RHEINENERGIE
RHEINENERGIE had maximum RED escalations among the other customers. There
were around 28 incident cases registered to the Operations Team. The Patterns ob-
served in those 28 incidents are:
Out of the 28 escalations, 6 were RED escalated.
21.28% chance that an incident logged in will be a RED escalation
Most reported module:
Ops - Monitor Agent (opcmona) (7 nos.) ; where 3 of them were RED
Installation(6 nos.)
Perf – Collector(3 nos.)
Average number of days a single incident handled: 73.5 days
Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs
16

• Analyzing RED Incidents: Modules vs Escalations
The software modules: Ops - Action Agent (opcmona) & Installation had the
highest number of RED escalations reported to the Operations Team.
Figure 7:
Analyzing RED Incidents: Modules vs Escalations
• Analyzing RED Incidents: Software Release vs Escalations
Out of all the 125 RED escalated cases, the Operations Agent version 8.6 had the
maximum number of incident reports. Followed by version 11.14 & 11.02. These
Software Release Versions had the maximum escalations in the incident.
Figure 8:
Analyzing RED Incidents: Software release vs Escalations
17

The incident frequencies of the other Software Versions is shown below:
Figure 9:
S/w release vs Escalations
• Analyzing RED Incidents: Operating System(OS) vs Escalations
It can be observed that 83 entries out of 125 Red Escalations had a ’blank’ OS field.
After having a talk with a HP Developer, we got the inputs that the customers tend
to skip the OS field. But we further observed that the no specific version of the OS
which could aid the trouble shooting. The customers tend to put just the general OS
names viz. Windows, Solaris, etc. instead of mentioning the entire name which the
version details.
Figure 10:
Analyzing RED Incidents: OS vs Escalations
18

• Analyzing RED Incidents: Developer vs Escalations
This describes the Developer associated with the incident case. Below is the distribu-
tion of incidents among the developers. The developer who was been assigned with
high number of the incident cases is prasad.m.k hp.com
Figure 11:
Analyzing RED Incidents: Developer vs Escalations
• Other observations made on Incidents
Calculating the age of a single ticket was challenging. The data dump had two
columns: ”OPEN-IN-DATE and CLOSED-IN-DATE.The diﬀerence of those two
dates would give the total days taken to close the ticket. But it was contradicting
to the column present in the table: DAYS SUPPORT TO CPE. Both the values
were not matching. Below is a pictorial representation of the same:
19

Figure 12:
Other observations made on Incidents
20

4.1.2 Analyzing Change Requests Dataset
The second data set provided by HP was the CR data(Incident cases which were
”Change Requests”). These are the cases which are added to the product backlog.
Each entry had an escalation attached to it. The three escalations attached were:
Showstopper, Yes and No. Below is the distribution of CRs and the nature of
escalation it carried. There were in total 10,387 CR entries. Of these, 10219 cases did
not escalate, 75 cases escalated and 93 of them were marked as ”Showstopper”.
Figure 13:
Analyzing CR data
The Showstopper escalations were High Priority cases.
Figure 14:
CR data
21

• Analyzing CRs: Customers vs Escalations
The company TATA CONSULTANCY SERVICES LTD. had the maximum
Showstopper escalations. Where as Allegis, NORTHROP GRUMMAN,PepperWeed
are the companies with highest ”Y”(Yes) escalations.
Figure 15:
Analyzing CRs: Customers vs Escalations
• Analyzing CRs: Modules vs Escalations
The software module: Ops - Monitor Agent (opcmona) & Installation had the
highest ”Showstopper” escalations. Where as the modules: Installation & Lcore
had the highest number of ”Y”(yes) escalations.
22

Figure 16:
Analyzing CRs: Modules vs Escalations
• Analyzing CRs: Software Release vs Escalations
Software Release Version 11 had the highest ”Showstopper” and ”Y”(yes) escalations.
Where as the Software Release Version 8.6 was the second highest in having the ”Show-
stopper” and ”Y”(yes) escalations.
Figure 17:
Analyzing CRs: S/w release vs Escalations
23

• Analyzing CRs: OS vs Escalations
The Software running Windows OS had the maximum number of both ”Showstop-
per” and ”Y”(yes) escalations. Note: Submitter of these tickets tend to choose the OS
ﬁelds as they want to. Some choose the exact versions where the issue was seen or
reported or some choose just at a high level. No strict rules observed
Figure 18:
Analyzing CRs: OS vs Escalations
• Analyzing CRs: Developer vs Escalations
While analyzing the Developers and the Escalated tickets assigned to them: swati.sinha hp.com
was been assigned the highest number of ”Showstopper” escalations.
Where as umesh.sharoﬀ hp.com was been assigned the highest number of ”Y”(yes)
escalations.
24

Figure 19:
Analyzing CRs: Developer vs Escalations
Post statistical analysis we used WEKA for applying few machine learning algorithms
onto the dataset. In the next section we will use few datamining concepts along with
machine learning algorithms to collect meaningfull conclusion from the dataset.
25

4.2 Applying Machine Learning on the Dataset
4.2.1 Classifying the Incidents Data
In this phase, classification and clustering is applied on the data to get to know the main
attributes which are responsible to trigger an Escalation. By using WEKA tool, which offers
various machine learning algorithms to use on the data set, certain informative conclusions
have been drawn. Classification is a data mining function that assigns items in a collection
to target categories or classes. The goal of classification is to accurately predict the target
class for each case in the data.
Here the target class will be the Escalation attribute of each ticket. The class assignments
are know: viz., Severity of a Bug, Expectation on the Defect Resolution, the Modules of the
Software, etc. By computing these important attributes of a ticket, the classification algo-
rithm finds the relationship between these values and predicts the vale of the target. In this
work, I have chosen J48 Decision Tree Algorithm and Bayes Network Classifier Algorithm
to predict the Target class: Escalation.
Classifying using: J48 Tree
The J48 Decision tree classifier follows the following simple algorithm. In order to classify a
new item, it first needs to create a decision tree based on the attribute values of the avail-
able training data. So, whenever it encounters a set of items (training set) it identifies the
attribute that discriminates the various instances most clearly. This feature that is able to
tell the most about the data instances so that it can classify them the best is said to have
the highest information gain. Now, among the possible values of this feature, if there is any
value for which there is no ambiguity, that is, for which the data instances falling within
its category have the same value for the target variable, then it terminates that branch and
assign to it the target value that it has obtained.
We used WEKA to apply the decision tree on the Data set.
Attributes selected:
• Escalations(Yellow, Red)
26

• Expectation(Contains the customer expectation on the resolution of the ticket from
the support team)
• Modules
• Severity
Results:
• SEVERITY SHORT = Urgent
• ESCALATION = Red: Ops - Monitor Agent (opcmona) (17.0/11.0)*
• ESCALATION = Yellow: Ops - Logfile Encapsulator (opcle) (21.0/17.0)*
• SEVERITY SHORT = High: Installation (92.0/74.0)*
• SEVERITY SHORT = Medium: Installation (23.0/20.0)*
• SEVERITY SHORT = Low: Ops - Message Agent (opcmsga) (1.0)
• Number of Leaves : 5
• Size of the tree : 7
• Correctly Classified Instances = 32 (20.7792 %)
• Incorrectly Classified Instances = 122 (79.2208 %)
• Kappa statistic = 0.0626
• Mean absolute error= 0.0595
• Root mean squared error= 0.1725
• Relative absolute error= 96.4179
• Root relative squared error= 98.5439
• Total Number of Instances= 154
27

* The first number is the total number of instances weightofinstances reaching the leaf.
The second number is the number weight of those instances that are miss-classified.
Figure 20:
Classifying using: J48 Tree
Figure 21:
Classifying using: J48 Tree: Prefuse Tree
After observing that the incorrectly classified instances were greater than the cor-
rectly classified instances, J48 Decision Tree did not yield the required answers.
28

Bayes Network Classifier, A Supervised Learning
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier
with strong assumptions of independence among features, called naive Bayes, is competi-
tive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a
classifier with less restrictive assumptions can perform even better. In this paper we evalu-
ate approaches for inducing classifiers from data, based on the theory of learning Bayesian
networks. These networks are factored representations of probability distributions that gen-
eralize the naive Bayesian classifier and explicitly represent statements about independence.
We used WEKA to apply this Classifier on the Data set.
By using this classifier, the Probability Distribution is found out:
• ESCALATION (Yellow, Red)
• EXPECTATION (Contains the customer expectation from the support team)
• MODULES
• SEVERITY SHORT
Results I: MODULE as the root node
• Red Escalation:
Highest probable module: Lcore- Control, Ops- Message Interceptor (opcmsgi)
• Yellow escalation:
Highest probable module: Ops-Trap interceptor, other, Lcore BBC
Figure 22: Module as root node
29

Figure 23: Probability for MODULE
• The modules: Ops - Action Agent (opcacta) & Installation: Have the highest
number of RED escalations reported to the Operations Team.
• When the SEVERITY is:
URGENT: Most probable module is : LCore BBC
HIGH: Most probable module is : Installation
MEDIUM: Most probable module is : LCore- XPL
LOW: Most probable module is : LCore-control, Opcmsgi, LCore-Security,
LCore-Deploy, Operation Agent, Agent Framework
The Probability Distribution of Modules vs Severity is shown in the next page:
30

Figure 24:
Probability distribustion for SERVERITY SHORT
Results II: ESCALATION as the root node
• Correctly Classiﬁed Instances= 80; 51.9481%
• Incorrectly Classiﬁed Instances= 74; 48.0519%
Figure 25:
ESCALATION as the root node
• Probability that it would be a RED Escalation : 24.20%
• Probability that it would be a YELLOW Escalation : 75.80%
31

Figure 26:
Probability distribution table for ESCALATION
• Probability of the EXPECTATION from Customer when it’s a RED Escalation:
Answer Question: 64%
Investigation & Hotﬁx requested: 47.4%
Investigate issue: 39.7%
Create Enhancement: 64%
• Probability of the EXPECTATION from Customer when it’s a YELLOW Escala-
tion:
Answer Question: 63%
Investigation & Hotﬁx requested: 56.7%
Investigate issue: 32.4%
Create Enhancement: 46%
32

Figure 27:
Probability distribution table for EXPECTATION
• Probability of the SEVERITY of the incident when it’s a RED Escalation:
URGENT: 44.90%
HIGH: 44.90%
MEDIUM: 09.00%
LOW: 13.00%
• Probability of the SEVERITY of the incident when it’s a YELLOW Escalation:
URGENT: 18.10%
HIGH: 63.40%
MEDIUM: 17.20%
LOW: 13.00%
Results III: SEVERITY SHORT as the root node
• Correctly Classiﬁed Instances= 63; 40.9091%
• Incorrectly Classiﬁed Instances= 91; 59.0909%
Figure 28:
SEVERITY SHORT as the root node
33

• Probability distribution for SEVERITY SHORT
Probability that it would URGENT is 24.7%
Probability that it would HIGH is 59.3%
Probability that it would MEDIUM is 15.1%
Probability that it would LOW is 0.01%
Figure 29:
Probability distribution for SEVERITY SHORT
• Probability distribution for RED escalation:
URGENT: 44.9%
HIGH: 18.8%
MEDIUM: 14.6%
LOW: 25%
• Probability distribution for YELLOW escalation:
URGENT: 55.1%
HIGH: 81.2%
MEDIUM: 85.4%
LOW: 75%
34

• Probability distribution for Customer Expectations
URGENT: Investigate Issue & Hotﬁx request
HIGH: Investigate Issue & Hotﬁx request
MEDIUM: Investigate Issue
LOW: Investigate Issue
Figure 30:
Probability distribution for Customer Expectations
35

4.2.2 Clustering the Incidents Data
Cluster analysis or clustering is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar (in some sense or another) to
each other than to those in other groups (clusters).
Simple K Means method - is a method of vector quantization.
Originally from signal processing, that is popular for cluster analysis in data mining. k-
means clustering aims to partition n observations into k clusters in which each observation
belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This
results in a partitioning of the data space into Voronoi cells.
We used WEKA for applying this method. the following are the results got from it:
• Number of iterations: 3
• Within cluster sum of squared errors: 261.0
• Cluster 0: Yellow,’Investigate Issue & Hotﬁx requested’, ’Ops - Trap Inter-
ceptor (opctrapi)’, Urgent
• Cluster 1: Red,’Investigate Issue & Hotﬁx requested’, Perf,High
Figure 31:
Final Cluster Centroids
36

Figure 32:
Model and evaluation on training set
In the below cluster centroid, the instances are divided according to the Escalation
Type of the tickets:
Figure 33:
Cluster Centroids I
37

In the below cluster centroid, the instances are dived according to the Severity nature
of the tickets:
Figure 34:
Cluster Centroids II
38

Predictive Apriori - An Apriori variant
Predictive Apriori algorithm use larger support and traded with higher conﬁdence,
and calculate the expected accuracy in Bayesian framework. The result of this algo-
rithm maximizes the expected accuracy for future data of association rules.
We used WEKA to apply this algorithm to the incident Dataset.
Below are the ﬁndings:
(Try I)Attributes selected:
ESCALATION (Yellow, Red)
CUSTOMER ENTITLEMENT
SEVERITY SHORT
Best rules fond:
1. CUSTOMER ENTITLEMENT = Premier & SEVERITY SHORT
= Medium(8) == ESCALATION = Yellow(8) Accuracy: 95.49%
= High(38) == ESCALATION = Yellow 34 Accuracy: 83.22%
The above rules describe that: When the Customer Entitlement is Premier and
the Severity set to the ticket is Medium, then there is 95% chance that the Esca-
lation might be Yellow
we used WEKA to apply this algorithm on the Incidents Data sets.
(Try II)Attributes selected:
SEVERITY SHORT
MODULE
OPERATING SYSTEM
39

Best rules fond:
= Medium(11) == ESCALATION = Yellow(11) Accuracy:98.84%
2. CUSTOMER ENTITLEMENT = Premier OS & Linux(11) == ES-
CALATION = Yellow(11) Accuracy: 98.46%
3. MODULE = Ops - Logﬁle Encapsulator[opcle](10) == ESCALA-
TION = Yellow(10) Accuracy: 98.70%
The above rules describe that:
When the module is Ops - Log Encapsulator[opcle], there is 98.70% chance that
it will be a Yellow Escalated.
When the Customer Entitlement is Premier and the Severity of the ticket is
Medium, there is 98.46% chance that it would be Yellow Escalated.
40

Simple Apriori Algorithm - Apriori Algorithm is an association rule mining that
was founded in 1994
Apriori Algorithm works by several steps. First, the candidate item sets are generated.
Then, scan the database to check the support of these item sets. This later will generate
the frequent 1-item sets. In this first scan, the 1-item sets are generated by eliminating
item sets with support below the threshold value. Later, the passes candidates became
k-item sets that generated after k-1 of threshold founded. The iteration of database
scanning and calculating support will be resulting support and confidence of each
association rule that found.
MODULES
SEVERITY SHORT
OS
Best rules fond:
1. OS is Linux(19); ESCALATION observed is Yellow(18). The Con-
fidence achieved is 95%
2. CUSTOMER ENTITLEMENT is Premier & SEVERITY SHORT
is High(38); ESCALATION observed is Yellow(35). The Confidence
achieved is 92%
The above rules describe that:
When the OS is Linux, then there is 95% confidence that the Escalation is Yellow.
When the Customer Entitlement is Premier and the Severity of the ticket is high,
then there 92% confidence that the Escalation is Yellow.
41

4.2.3 Text Mining and Natural Language Tool Kit (NLTK)
After evaluating the results acquired from Phase I, a ﬁnal conclusion cannot be drawn.
It did not answer- what actually triggers an incident so that it escalates. This phase
describes the use of Text Mining and Natural Language processing to determine the
triggering factor of an incident.
Figure 35:
Total number of Incidents and thier Escalation count
The purpose of Text Mining is to process unstructured (textual) information, extract
meaningful numeric indices from the text, and, thus, make the information contained
in the text accessible to the various data mining (statistical and machine learning)
algorithms. Information can be extracted to derive summaries for the words contained
in the documents or to compute summaries for the documents based on the words
contained in them.
We used the tm package in R for text mining the Incident tickets.
The tm package provides the methods for data import, corpus handling, pre-processing,
metadata management and creation of term-document matrices.
42

The main crux was hidden in ﬁnding out What made an Incident Ticket to get
RED Escalated from other escalation states?
The following is the step by step process in ﬁnding out:What might help to identify
the reason of an Escalation We took the dataset(Incidents.csv) and performed the
following tasks using R:
Data Import: Load the text to a corpus
Inspecting Corpora: to get a concise overview of the corpus
Transformations: Modify the corpus - e.g., stemming, stop-word removal, etc.
Creating Term-Document Matrices
Operations on Term-Document Matrices e.g.: Calculating the word fre-
quencies, Plot the word frequencies, word cloud, etc.
43

Observations made on GREEN tickets which were RED Escalated It was
observed that, opcle was most talked module. It can be also observed that the words:
please, hotﬁx & support were used the most in the mail chain exchanged between
the customer and the developer.
Figure 36:
Words with highest frequency mined on GREEN tickets escalated to RED
Figure 37:
GREEN tickets escalated to RED
44

Observations made on GREEN ticket which were YELLOW Escalated It
was observed that, support was most talked word. It can be also observed that the
words: issue & time were used the most in the mail chain exchanged between the
customer and the developer.
Figure 38:
Words with highest frequency mined on GREEN tickets escalated to YELLOW
Figure 39:
GREEN ticket escalated to YELLOW
45

Observations made on YELLOW ticket which were RED Escalated This
set of corpus did not yield anything notable. But it did brought out the name of the
developer which was most associated with the resolution of the tickets
Figure 40:
Words with highest frequency mined on YELLOW tickets escalated to RED
Figure 41:
YELLOW ticket escalated to RED
46

Observations made on RED ticket were Escalated It was observed that, opc-
mona was most talked module. It can be also observed that the words: waiting,
hotﬁx & issue were used the most in the mail chain exchanged between the customer
and the developer.
Figure 42:
Observations made on RED ticket were Escalated
Figure 43:
Plotting the highest mined words
47

Observations made on the Whole RED Escalated Tickets There were in total
125 entries RED escalated in the Incidents. Text mining on all of the 125 entries
revealed the below details:
Figure 44:
Words with highest frequency mined
The words: issue, please, support & escalation were used the most in the mail
chain exchanged between Customer and the Team
Figure 45:
Plotting the words with highest frequency mined
48

Observations made on Whole GREEN Escalated Tickets There were in total
3831 entries GREEN escalated in the Incidents. Text mining on all of these entries
revealed the below details:
Figure 46:
Words with highest frequency mined
Mining the whole GREEN Escalated tickets did not yield valuable information.
Figure 47:
Plotting the words with highest frequency mined
49

The above observations show that the mail chains in which the key words: please,
hotﬁx, support & please are most expected to get converted to a RED Escalation.
The we used all these key words and built a program which takes input the Incident
data dump and scans the email chain. As the probability of these key words increases,
it alerts the user when it crosses the threshold limit. The threshold limit can be ad-
justed by the Developer based on the on going trend.
Figure 48:
Output of the program
50

5 Results and Conclusions
By text mining and applying machine learning algorithms on the incident dataset, we ob-
tained the below results:
• The mail chain of a ticket which is going to be escalated to Red will contain these
words in high occurrences : please, hotfix, support.
• The software modules: Ops - Logfile Encapsulator (Opcle), Ops - Action Agent (op-
cacta) & Installation had the highest number of red escalations reported to the team
as incidents. Whereas Ops - Monitor Agent (opcmona) & Installation had the highest
showstopper escalations for change requests.
• By applying the Predictive Apriori algorithm on the Incident dataset, we observed the
following:
We got a confidence of 98.70% when a module reported was ’Opcle’ and the
Escalation of the case was ’Yellow’.
We got a confidence of 98.84% for Escalation as ’Yellow’, when the Customer
Entitlement was ’Premier’ and Severity was ’Medium’
• We got two clusters formed by using Simple K - Means method:
Cluster 1: Escalation is ’Yellow’, Customer expectation is ’Investigate issue &
Hotfix required’, Software module is ’Ops - Trap Interceptor’ and Severity is
’Urgent’.
Cluster 2: Escalation is ’Red’, Customer expectation is ’Investigate issue & Hotfix
required’, Software module is ’Perf’ and Severity is ’High’.
For an Engineering team it is really important to avoid any major Red escalations. The
team gets a lot of incidents which need to be resolved in a limited time. Since the number of
incidents are more, it becomes hard for the team to keep track of all the incident issues with
respect of the criticality and the severity of the incident. By implementing such predictive
51

mechanisms where, an incident which will turn RED can be alerted to the team. This would
help the manager to allocated appropriate resources based on the criticality of the tickets
coming in. This would help in the resolution of the incident ticket within the stipulated time
and avoiding any unwanted escalations. This would indeed help in maintaining the trust of
the customer as well.
More accurate and varied results could have been achieved, but the missing data in the
dataset limited us from it. Due to the discrepancies in the data (data being shifted by 3-4
columns), we had to ignore such inconsistent entries in the dataset to perform the statistical
analysis.
5.1 Future Work
The use of NLTK proved to be very helpful in extracting the meaningful conclusion
from the tickets dataset. NLTK can be used to analyze real time behavior of the tickets
incoming to the team. This analysis can be used in providing proactive resolutions to the
customers, thus preventing the tickets to get escalated.
52

References
[BAG] L. Mach Learn, Bagging Predictors Breiman
[SFTWR] Ishani Arora, Vivek Tetarwal, Anju Saha, Software Defect Prediction
[PRBD] Thomas Zimmermann, NachiappanNagappan, Predicting Bugs from History
[SFRC] Stamatia Bibi, Grigorios Tsoumakas, I. Vlahavas, Ioannis Stamelos, Prediction Using
Regression via Classification
[HIDM] Felix Salfner, Predicting Failures with Hidden Markov Models
[IEMD] Ian H. Witten, Eibe Frank, Mark A. Hall,Data Mining Practical Machine Learning
Tools and Techniques, Third Edition, Copyright c 2011 Elsevier Inc.
[EFRB] Eibe Frank,Computer Science Department, University of Waikato, New Zealand and
Remco R. Bouckaert,Xtal Mountain Information Technology, Auckland, New Zealand,
Naive Bayes for Text Classification with Unbalanced Classes
[JMJ11] Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques,
Third Edition, 2011 Addison-Wesley, Reading, MA, 1999.
[IT2008] Irina Tudor, “Association Rule Mining as a Data Mining Technique”, 2008.
[PMDM] Norman Fenton, Paul Krause and Martin Neil, A Probabilistic Model for Software
Defect Prediction, 2006
[BLMK] Billy Edward Hunt, Jr., Overland Park,KS (Us); Jennifer J- Kirkpatrick,Olathe, KS
(US); Richard Allan Kloss, Wlnllgén Josseph, SOFTWARE DEFECT PREDICTION,
2014
[WEKA] WEKA Online, www.cs.waikato.ac.nz/ml/weka.
[PEFS] Cathrin Weil, Rahul Premraj, Thomas Zimmermann, Andreas ZellerPredicting Effort
to Fix Software Bugs, 2006
53

[CSDE] Victor S. Shenga, Bin Gub, Wei Fangc, Jian Wud, Cost-Sensitive Learning for Defect
Escalation, 2001
[DSNA] Jaideep Srivastava, Muhammad A. Ahmad, Nishith Pathak, David Kuo-Wei Hsu Data
Mining Based Social Network Analysis from Online Behavior, 2008
[PHMM] Felix Salfner, Predicting Failures with Hidden Markov Models, 2005
54

A Machine Learning approach to predict Software Defects

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to A Machine Learning approach to predict Software Defects

Similar to A Machine Learning approach to predict Software Defects (20)

Recently uploaded

Recently uploaded (20)

A Machine Learning approach to predict Software Defects