AI-Driven Software Quality Assurance in the Age of DevOps

Dr. Chakkrit (Kla) Tantithamthavorn
Lecturer, Faculty of Information Technology, 
Monash University, Australia.
@klainfohttp://chakkrit.com
chakkrit.tantithamthavorn@monash.edu
AI-Driven Software
Quality Assurance  
in the Age of DevOps

Australian Taxation Oﬃce’s service outages from
software bugs lead to $4.2 billion dollars lost
2

3
Software Quality Assurance practices like  
code review and testing are still time-consuming
- Large and complex: 1 billion lines of code
- Intense quality assurance activities:
- 17K code reviews
- 100 million test cases

3
Software Quality Assurance practices like  
code review and testing are still time-consuming
- Large and complex: 1 billion lines of code
- Intense quality assurance activities:
- 17K code reviews
- 100 million test cases
It is infeasible to exhaustively review and test  
source code within the limited time and resources

4
Defect models play a signiﬁcant role in
software quality management
Release
Defect 
Model
Module A
Module C
Module B
Module D
Module A
Module C
Module B
Module D

4
Predict 
future software  
defects
Release
Defect 
Model
Module A
Module C
Module B
Module D
Module A
Module C
Module B
Module D

4
Predict 
future software  
defects
Explain  
what makes software
fail
Release
Defect 
Model
Module A
Module C
Module B
Module D
Module A
Module C
Module B
Module D

4
Predict 
future software  
defects
Explain  
what makes software
fail
Release
Defect 
Model
Module A
Module C
Module B
Module D
Module A
Module C
Module B
Module D
Develop  
empirical theories related
to software quality

5
Lewis et al., ICSE’13
Mockus et al., BLTJ’00 Ostrand et al., TSE’05 Kim et al., FSE’15
Naggappan et al., ICSE’06
Zimmermann et al., FSE’09
Caglayan et al., ICSE’15
Tan et al., ICSE’15
Shimagaki et al., ICSE’16
Defect models become widespread in many large
software organisations
Explain  
what makes software
fail
Develop  
empirical theories related
to software quality
Predict 
future software  
defects

6
Today research toolkits are easily accessible

7
Statistical 
Model
Training 
Corpus
Classiﬁer  
Parameters
(7) Model 
Construction
Performance 
Measures
Data  
Sampling
(2) Data Cleaning and Filtration
(3) Metrics Extraction and Normalization
(4) Descriptive
Analytics
(+/-) Relationship
to the Outcome
Y
X
x
Software 
Repository
Software 
Dataset
Clean 
Dataset
Studied Dataset
Outcome Studied Metrics Control Metrics
+~
(1) Data Collection
Predictive  
Analytics
Prescriptive
Analytics
(8) Model Validation
(9) Model Analysis
and Interpretation
Importance  
Score
Testing 
Corpus
PredictionsPerformance 
Estimates
Patterns
In reality, defect modelling
is detailed and complex

8
A lack of practical guidelines of defect modelling have
a negative impact on software quality management
Misleading  
insights
Managers take wrong
technical decisions
Developers waste
time and resources
Wrong  
predictions

8
A lack of practical guidelines of defect modelling have
a negative impact on software quality management
Misleading  
insights
Managers take wrong
technical decisions
Developers waste
time and resources
Wrong  
predictions
Empirical investigation is needed  
to derive practical guidelines for defect modelling

9
Metric Selection
Model Construction
Model Evaluation
Control Metrics 
[ICSE-SEIP’18]
Correlation 
[TSE’16]
Model Statistics 
[ICSE-SEIP’18]
Model Interpretation
Class Imbalance 
[Under Review]
The Risks of
Unsound 
Defect 
Models
Data Quality
Issue Reports 
[ICSE’15]
Feature Selection 
[ICSME’18]
Interpretation 
[Under Review]
Model Validation 
[TSE’17]
Measures 
[ICSE-SEIP’18]
Releases 
[Under Review]
Metrics 
[On-going]
Universal Models 
[Under Review]
Ranking  
[On-going]
Time-Wise 
[On-going]
Parameters 
[ICSE’16,TSE’18]

10
Defect prediction models have conﬁgurable
parameters that control their characteristics
Most of the widely-used classiﬁcation techniques
require at least one parameter setting
Based on the literature analysis of  
300+ defect studies

10
300+ defect studies
#trees for  
random forest
#clusters for  
k-nearest neighbors

10
300+ defect studies
#trees for  
random forest
#clusters for  
k-nearest neighbors
80% of top-50 highly-cited defect studies
rely on a default setting [IST’16]

10
300+ defect studies
#trees for  
random forest
#clusters for  
k-nearest neighbors
80% of top-50 highly-cited defect studies
rely on a default setting [IST’16]
Even within the R toolkit, 2 random forest
packages have different default settings

What is the impact of optimization
techniques on defect models?
11
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The Impact of Automated Parameter
Optimization on Defect Prediction Models. IEEE Transactions on Software Engineering (TSE) (2018), pp. 1-32.

11
(RQ1) What is the impact of automated parameter optimization
on the accuracy of defect models?

11
(RQ2) How much does the interpretation of defect models
change when automated parameter optimization is applied?

11
(RQ3) What are the best classiﬁcation techniques for defect
models when automated parameter optimization is applied?

12
A comprehensive framework to extensively evaluate the
impact of optimization techniques on defect models
Defect  
Dataset

12
Defect  
Dataset
Default  
Setting
Construct  
Defect 
Models
Optimized  
Setting

12
Defect  
Dataset
Default  
Setting
Construct  
Defect 
Models
Optimized  
Setting
Optimized 
Model
Default-setting 
Model

12
Defect  
Dataset
Default  
Setting
Construct  
Defect 
Models
Optimized  
Setting
Optimized 
Model
Default-setting 
Model
Calculate 
accuracy
The accuracy of 
optimized model
The accuracy of 
default model

12
Defect  
Dataset
Default  
Setting
Construct  
Defect 
Models
Optimized  
Setting
Optimized 
Model
Default-setting 
Model
Calculate 
accuracy
The accuracy of 
optimized model
The accuracy of 
default model
Rank metrics
by importance
score
The ranking of 
metrics in  
optimized model
The ranking of 
metrics in  
default model

12
26 classiﬁcation techniques  
(e.g., C5.0, Random Forest)
12 performance measures  
(e.g., AUC, F-measure)
4 optimization algorithms  
(e.g., genetic algorithm, and
differential evolution)
Defect  
Dataset
Default  
Setting
Construct  
Defect 
Models
Optimized  
Setting
Optimized 
Model
Default-setting 
Model
Calculate 
accuracy
The accuracy of 
optimized model
The accuracy of 
default model
Rank metrics
by importance
score
The ranking of 
metrics in  
optimized model
The ranking of 
metrics in  
default model

13
Case Study Setup
Code Metrics 
(e.g., lines of code 
code complexity)
Process Metrics 
(e.g., #commits 
#added_lines)
Human Factor 
(e.g., #dev, 
ownership)
600 - 10,000 modules 
11-48% defective ratios
18 software releases 15-38 software metrics

Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boos
AUCPerformanceImprovement
14
(RQ1) What is the impact of automated parameter
optimization on the accuracy of defect models?
Approach: Compute the accuracy difference between optimized
models and default-setting models
AUCDifference(Optimized-Default)

Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boos
14
9 of the 26 studied
classiﬁcation techniques
have a large accuracy
improvement

Large Medium Small
●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
C
5.0
AdaBoost
AVN
N
etC
ART
PC
AN
N
etN
N
etFDA
M
LPW
eightD
ecayM
LP
LM
TG
PLS
LogitBoostKN
N
xG
BTreeG
BM
N
B
R
BF
SVM
R
adial
G
AM
Boos
14
9 of the 26 studied
classiﬁcation techniques
have a large accuracy
improvement
Optimization substantially improves the accuracy of defect
prediction models

15
Optimized 
Model
Default 
Model
Approach: Compute the percentage of the top-rank metrics in
optimized models that appears at the top-rank in default models

15
Optimized 
Model
Default 
Model
Calculate an
importance
score for each
metric

15
Optimized 
Model
Default 
Model
Calculate an
importance
score for each
metric
Ranking of metrics 
in optimized models
in default models

15
Optimized 
Model
Default 
Model
Calculate an
importance
score for each
metric
in optimized models
in default models
Calculate the
rank difference
for each metric
Diff %metrics
0 60%
-1 20%

16
optimized models that appears at the top-rank for default models
0%
20%
40%
60%
80%
100%
0 -1 -2 -3 -4
Percentageofmetrics
Rank difference

16
36% of the top-rank metrics in
optimized models  
do not appear at the top-rank for
default models
0%
20%
40%
60%
80%
100%
0 -1 -2 -3 -4
Percentageofmetrics
Rank difference

16
36% of the top-rank metrics in
optimized models  
do not appear at the top-rank for
default models
0%
20%
40%
60%
80%
100%
0 -1 -2 -3 -4
Percentageofmetrics
One-third of the most important metrics are missing  
if we do not apply parameter optimization
Rank difference

17
(RQ3) What is the best classification techniques for defect
Approach: Compute the average rank of classification techniques
across the studied datasets
A JMLR’14 paper finds that random
forest is top-performing classifiers for
machine learning datasets

17
Optimized xGBTree
Optimized C5.0
Optimized RF
Surprisingly, the best technique like
Random Forest in ML domains is not
always be the best in SE domains

17
Optimized xGBTree
Optimized C5.0
Optimized RF
Surprisingly, the best technique like
Random Forest in ML domains is not
always be the best in SE domains
Domain-speciﬁc modelling guidelines are critically needed

18
Optimization substantially improves the accuracy of defect
prediction models
One-third of the most important metrics are missing if we do
not apply parameter optimization
The best technique (RF) in ML domain might not always be the
best in SE domain when parameter optimization is applied
Automated parameter optimization should be applied for
defect prediction models

19
“This is a much needed
contribution both for researchers
and for practitioners:  
 
Researchers will find a checklist for
the quality assurance of their defect
modelling methods.  
 
Practitioners, that is software quality
experts in companies, will avoid a
false interpretation of their data.”  
 
- An Anonymous Reviewer -

20
Open Challenges:  
Faster dev. speed, but Slower QA activities
Software companies are shifting  
from long to rapid release cycles

20
Large volume of code changes

20
Slow CI builds and tests

20
How to develop an AI agent to accurately and intelligently
remove software defects prior to CI build runs?
Slow CI builds and tests

21
Future Research Agenda:

AI-Driven Software Quality Assurance in the Age of DevOps

21

Accurate Predictions 
of Future Defects

21

of Future Defects
Explain the Nature 
of Software Defects

21

of Future Defects
of Software Defects
Generate Actionable  
Guidelines

21

of Future Defects
of Software Defects
Guidelines
Bug Hunter

21

of Future Defects
of Software Defects
Guidelines
Bug Hunter
This research project is expected to reduce software defects and operating costs,
while accelerating development productivity for Australian software industry

22
I'm actively recruiting Ph.D. students
Jirayus Jiarpakdee
Nov 2017 - current 
Publications: ICSME’18, ICSE’19, 
TSE’19, more to come …
Beneﬁts
- Tuition Fees Scholarships
- Full stipends ($27,353 per annum in 2018, indexed annually)
- International Travel Funding
- Work with Experts in AI/ML/DataMining/SE
- Access to Monash HPC clusters
- Possible Domestic and International Internships
Current Ph.D. student:

23
Dr. Chakkrit (Kla) Tantithamthavorn
@klainfohttp://chakkrit.com
chakkrit.tantithamthavorn@monash.edu

AI-Driven Software Quality Assurance in the Age of DevOps

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AI-Driven Software Quality Assurance in the Age of DevOps

Similar to AI-Driven Software Quality Assurance in the Age of DevOps (20)

Recently uploaded

Recently uploaded (20)

AI-Driven Software Quality Assurance in the Age of DevOps