Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics
Towards Modeling the User-Perceived Quality of Source Code
using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail,
Themistoklis Diamantopoulos and Andreas Symeonidis
Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki
Intelligent Systems & Software Engineering Labgroup, Information Processing Laboratory
Thessaloniki, Greece
{valadima, alexkypr}@ece.auth.gr, {mpapamic, thdiaman}@issel.ee.auth.gr, asymeon@eng.auth.gr
12th International Conference on Software Technologies – ICSOFT 2017
2 Outline
The concept of user-perceived quality.
Ground truth.
The designed system.
Quality score formulation.
Principal Feature Analysis (PFA).
Quality assessment models training.
Evaluation.
Conclusion and Future work.
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
3 User-perceived quality
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
The extend to which a software component is adopted by developers
Use of software components popularity and degree of reuse as a code quality indicator.
Crowdsourcing
Information
Static Analysis
Metrics
+
Quality
Indicator
Approach
12th International Conference on Software Technologies – ICSOFT 2017
But:
Crowdsourcing information cannot be used as a sole quality criterion.
- Is based on current trends.
- Depends on the programming language.
4 Designed System Overview
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
100 most starred and
forked Repositories
GitHub
100,000
classes
Training
set
One Class SVM Map the area
of high quality
code
Quality
ScoreANNs
models
Aggregation
Scores
Complexity
Coupling
Inheritance
Size
Documentation
12th International Conference on Software Technologies – ICSOFT 2017
Target
set
Static
Analysis
Principal
Feature
Analysis
Repositories
information
5 Target set formation
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Use of GitHub stars and forks as
ground truth information.
But:
GitHub stars/forks per repository
(NOT per class)
Every class is of different
importance
Big differences in the number of
stars/forks between repositories
6000
stars
3000
forks
x1 stars x2 stars y1 forks y2 forks
Quality
Score
12th International Conference on Software Technologies – ICSOFT 2017
Class A Class B Class A Class B
6 Target set formation
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
For the j-th class of the i-th repository, the target is formulated as follows:
𝑄𝑠𝑐𝑜𝑟𝑒 𝑖, 𝑗 = log 𝑆𝑠𝑡𝑎𝑟𝑠 𝑖, 𝑗 + 𝑆𝑓𝑜𝑟𝑘𝑠 𝑖, 𝑗
𝑆𝑠𝑡𝑎𝑟𝑠 𝑖, 𝑗 = 1 + 𝑁𝑃𝑀 𝑗
𝑆𝑡𝑎𝑟𝑠(𝑖)
𝑁𝑐𝑙𝑎𝑠𝑠𝑒𝑠(𝑖)
𝑆𝑓𝑜𝑟𝑘𝑠 𝑖, 𝑗 = 1 + 𝐴𝐷 𝑗 + 𝑁𝑀 𝑗
𝐹𝑜𝑟𝑘𝑠(𝑖)
𝑁𝑐𝑙𝑎𝑠𝑠𝑒𝑠(𝑖)
Equal starting contribution
Metrics-based contribution
Smoothing
factor
12th International Conference on Software Technologies – ICSOFT 2017
7 Principal Feature Analysis (PFA)
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Training
Dataset
Principal
Component
Analysis
54 metrics
Hierarchical
Clustering
Transformation matrix
One metric per
cluster
A set of 15 most
important features
SVM one-class
classifier
8 Principal Component Analysis
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
54
metrics
Principal Components
PercentangeOfVariance
0510152025
1 7 14 22 30 38 46 54
12 PCs
82,8% of the
information
LCOM5 NL
…
WMC
LCOM5 0 0.726 0.3919
NL 0.726 0 0.5294
… …
WMC 0.3919 0.5294 0
9
Hierarchical Clustering
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Transformation
matrix
0.8 0.6 0.4 0.2 0.0
TNG
TNM
TNPM
NOC
NOD
CBOI
NII
AD
CD
TCD
NA.
NPA
TNA
TNPA
TNLA
TNLPA
NLA
NLPA
NOP
DIT
NOA
McCC
NL
NLE
NLG
NLS
WMC
RFC
CBO
NOI
LOC
LLOC
NOS
NM
NPM
TNS
NG
NS
PDA
TCLOC
CLOC
DLOC
PUA
NLM
NLPM
LCOM5
NUMPAR
TNLG
TNLS
TNLM
TNLPM
TLOC
TLLOC
TNOS
Distance
Hierarchical
clustering
(complete linkage)
One metric per cluster
(15 in total)
10 SVM one-class classifier
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Use of the previously identified (using PFA)
metrics in order to rule out classes of low
quality.
Radial kernel function
(Gamma, Nu, Tolerance) = (0.01, 0.1, 0.01)
Rule out 8,815 classes representing the
9.99% of the dataset.
Assessment using coding violations.
Violation
Types
Mean Violations
Rejected
classes
Accepted
classes
WarningInfo 83.0935 18.5276
Clone 20.9365 4.3106
Cohesion 0.7893 0.3225
Complexity 1.2456 0.0976
Coupling 1.5702 0.1767
Documentation 49.9751 12.5367
Inheritance 0.4696 0.0697
Size 8.1069 1.0134
11 ANNs models construction
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Training Dataset
containing high quality classes
Complexity Coupling SizeInheritanceDocumentation Size
PCA PCA PCA PCA PCA
Metrics selection
using 2 PCs
Metrics selection
using 2 PCs
Metrics selection
using 2 PCs
Metrics selection
using 2 PCs
Metrics selection
using 2 PCs
12 ANNs models construction
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
-0.55 -0.50 -0.45 -0.40
-0.8-0.40.00.4
PC1
PC2
NL NLE
WMC
McCC
Complexity related
metrics
NL, NLE, WMC, McCC
PCA
Selected
metrics
NL, WMC, McCC
13 ANNs models training
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
Training Error
Testing Error
Metrics
Category
Input
Nodes
Hidden
Nodes
Output
Nodes
Complexity 3 1 1
Coupling 3 2 1
Documentation 3 2 1
Inheritance 2 2 1
Size 6 4 1
11.35%
8.79%
Two-layer feedforward network.
Levenberg-Marquardt algorithm (LMA) for
adjusting the weights and the biases.
10-k cross validation.
14 ANNs models training
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
0
500
1000
1500
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
0
500
1000
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
0
1000
2000
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
0
2500
5000
7500
10000
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
0
500
1000
1500
2000
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
0
5000
10000
15000
-1.0 -0.5 0.0 0.5 1.0
Error
Frequency
Testing Error
Training Error
Complexity Coupling Documentation
Inheritance Size Final
15 Quality Score Aggregation
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Metrics
Category
Weight
s
Complexity 0,207
Coupling 0,210
Documentation 0,197
Inheritance 0,177
Size 0,208
Use a weight for each category
corresponding to the correlation of its
metrics with the target score
5 scores
(one for each category)
Final Quality Score
16 System Evaluation
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Evaluation on two main axes:
1. The system's ability to distinguish high quality classes.
2. The effectiveness of the models for estimating the quality of classes
exceeding a certain quality threshold.
One-class classifier:
Using coding violations.
ANNs models:
Assessment on whether the final score is reasonable from a quality
perspective.
Manual examination of the metrics of classes receiving both high and
low scores for each metric category.
17 Evaluation – One Class Classifier
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
69.72%
77.72%
49.29%
93.89%
85.76%
56.46%
77.20%
87.55%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WarningInfo
Clone
Cohesion
Complexity
Coupling
Documentation
Inheritance
Size
Percentage of less violations (per category)
in accepted vs rejected classes
18 Evaluation – ANNs models
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
NPA TNLS TNG TLLOC TNA NPAR score
0.000
0.349
0
11
0
37
3
135
0.14
32.00
1
14
0.191
0.489
Size
Four behaviors that lead to the following
scores:
• Low [min, q1)
• Low – moderate [q1, med)
• Moderate – high [med, q3)
• High [q3, max)
TLLOC, TNA, and NPAR metrics seem to
have a high influence in the outcome of the
score.
Classes with moderate size and many
attributes or parameters seem to receive
high quality scores.
Absence of information lead to low score.
Min
q1
med
q3
Max
19 Evaluation – ANNs models
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Complexity
Metric
Name
Min Value Max Value
Class with
High Score
Class with
Low Score
McCC 1 39 2.3 8.5
WMC 0 498 273 51
NL 0 55 4 28
The more complex class received lower quality score.
Higher WMC combined by low McCC leads to higher score.
Higher NL values lead to low score.
20 Conclusions and future work
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Conclusions:
Successful identification of associations between static analysis metrics using
PFA.
Reliable determination of the area of high quality source code based on static
analysis metrics.
Effective user-perceived source code quality estimation for five dominant
source code properties.
Provision of a fully interpretable quality score as perceived by developers.
Future work:
Further investigation of the target variable for different scenarios and different
application scopes.
Apply additional feature selection techniques in order to improve the current
results
21
Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
12th International Conference on Software Technologies – ICSOFT 2017
Thank you!
Michail Papamichail
mpapamic@issel.ee.auth.gr
Contact info: