Slide related to the paper: "Machine learning based code smell detection through WekaNose", available at:
- http://essere.disco.unimib.it/wiki/wekanose (Project website)
- https://doi.org/10.1145/3183440.3194974 (DOI)
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
WekaNose presentation
1. Machine Learning based
Code Smell detection through
WekaNose
A research accomplished by:
ESSeRE Lab - University of Milan Bicocca
2. What is a Code Smell?
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
More specifically:
● A code smell has to be sniffable (something that's quick to spot);
● A code smell don't always indicate a problem.
1 Machine Learning based Code Smell detection through WekaNose September 2018
3. What is the problem with the state of art
approaches?
● Code smells can be subjectively interpreted;
● The results provided by detectors are usually different;
● The agreement in the results is scarce;
The metrics and
thresholds selection
problems
2 Machine Learning based Code Smell detection through WekaNose September 2018
4. Why should we bother?
3 Machine Learning based Code Smell detection through WekaNose September 2018
5. ● This approach allows to exploits the full interpretability of Code Smells;
● This approach requires just to describe the Code Smell, rather than formalize a definition;
● The Machine Learning algorithm learn the concept from data;
Why should we use Machine Learning algorithms?
4 Machine Learning based Code Smell detection through WekaNose September 2018
6. What is WekaNose?
WekaNose is a tool that supports a workflow that
aims to train Machine Learning algorithms specifically
for Code Smell detection.
This whole process is divided in three main part:
● the creation of the dataset;
● the training and testing of the
Machine Learning algorithms;
● the evaluation of the Machine Learning
algorithms performance, in term of Code Smell
detection;
5 Machine Learning based Code Smell detection through WekaNose September 2018
7. Main problems
6 Machine Learning based Code Smell detection through WekaNose September 2018
● Creation of a balance dataset, in terms of:
○ Labels;
○ Statistical properties;
● Machine Learning Algorithms selection:
○ No free lunch theorem
● Do high performances in the Machine Learning context imply
high performance in the actual Code Smells detection?
8. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on
Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 288-289. DOI: https://doi.org/10.1145/3183440.3194974
7 Machine Learning based Code Smell detection through WekaNose September 2018
9. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Sampling and Label of
the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Data Class:
The Data Class Code Smell refers to classes that store data without using complex functionality,
and having other classes that strongly rely on them. A Data Class reveals many attributes, it is not
complex, and it provides data field through accessor methods.
Switch Statement:
The Switch Statements code smell refers to method that contain a complex switch operator or a
sequence of if statements that compromise the readability or/and the clarity of the code.
8 Machine Learning based Code Smell detection through WekaNose September 2018
10. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An example is the Qualitas Corpus:
the Qualitas Corpus is a curated collection of 112 software systems
intended to be used for empirical studies of code artefacts.
9 Machine Learning based Code Smell detection through WekaNose September 2018
11. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
● A large set of object-oriented metrics
at method, class and package levels
have been taken into account and they
are considered as independent
variables in our machine learning
approach
● All metrics have been computed
through “Design Features and Metrics
for Java” (DFMC4J) [1]
[1] http://essere.disco.unimib.it/wiki/jcodeodor_doc
10 Machine Learning based Code Smell detection through WekaNose September 2018
12. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An Advisor is a deterministic rule, implemented locally or in an external
tool, that gives a classification of a code element (method or class), telling
if it is a code smell or not.
11 Machine Learning based Code Smell detection through WekaNose September 2018
13. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
12 Machine Learning based Code Smell detection through WekaNose September 2018
14. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
5 Machine Learning algorithms have
been considered so far:
● J48 (x3)
● Random Forest
● Naïve Bayes
● JRip
● SVM (x10)
Each algorithm have been trained
with and without the boosting
technique: AdaBoostM1.
Therefore 32 algorithms have been
trained, tested and compared.
13 Machine Learning based Code Smell detection through WekaNose September 2018
15. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
The parameters are considered the best if the performances of
the machine learning algorithms are maximised by them.
WEKA provides a set of algorithms that perform a greed
search with this purpose, that can be used in WekaNose.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining:
Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
14 Machine Learning based Code Smell detection through WekaNose September 2018
16. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
15 Machine Learning based Code Smell detection through WekaNose September 2018
The WEKA Experimenter [1] has been used to compare the
Machine Learning algorithms, specifically:
● A 10-fold cross-validation tests with 10 repetitions were
performed for each classifiers with the best parameters
found;
● The performances were compared using the corrected
paired t-tests.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data
Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
17. WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
16 Machine Learning based Code Smell detection through WekaNose September 2018
Sonar-WekaNose-Plugin is a SonarQube plugin
that is able to analyze Java code in order to notify
the presence of Code Smells by using the
Machine Learning algorithms trained through
WekaNose.
Alessandro Polastri (2018), Bachelor Thesis: “SonarQube plugin for Code Smell
detection through machine learning techniques”.
18. Results so far obtained: Algorithms Comparison
[1] Arcelli Fontana, Francesca & Mäntylä, Mika & Zanoni, Marco & Marino, Alessandro. (2015). Comparing and experimenting machine learning techniques for code smell detection. Empirical
Software Engineering. 21. DOI: https://doi.org/10.1007/s10664-015-9378-4;
[2] Umberto Azadi (2017), Bachelor Thesis: “Code smell detection through machine learning techniques”.
17 Machine Learning based Code Smell detection through WekaNose September 2018
19. Best
Algorithm
per
Code Smell
Code Smell Machine Learning Algorithm
Data Class Boosted J48 pruned
God Class Naïve Bayes
Feature Envy Boosted JRip
Long Method Boosted J48 Unpruned
Long Parameter List Boosted J48 Unpruned
Switch Statements JRip
18 Machine Learning based Code Smell detection through WekaNose September 2018
20. Threats to validity
● Threats to internal validity:
○ The manual evaluation of code smells is subject to certain degree error that concern the
developer’s experience, the knowledge and the ability to understand design issues and other
factors.
● Threats to external validity:
○ Code Smell candidates were selected with random sampling and stratifying the choice according
to the number of positive results of code smell Advisors. This criterion increases the probability of
selecting instances affected by code smell but the sampling method could cause a distortion during
the building of the training set, because the selection criterion is only partly random.
● Experiments limitations:
○ The selected systems are all open source;
○ The metrics used are only the one computed by DFMC4J
19 Machine Learning based Code Smell detection through WekaNose September 2018
21. Future Development
● Evaluate the severity (not just the presence) of the Code Smell;
● Evaluation of Active Learning techniques in this context;
● Social platforms to collect data;
● Train new machine learning algorithms in order to increase the number of identified code smells;
● Understand the correlation between the performance measures used to evaluate the machine
learning algorithms and the performance measures used to estimate the goodness of a code
smell detector;
● Design and evaluate hybrid rules that use both the machine learning algorithms and the rules set
by the user to perform a detection;
● Improve the comparison between machine learning-based and rule-based code smell detection:
○ Evaluating if there are algorithms that tend to guarantee high performance compared to actual detection,
by experimenting with comparisons on large software systems and by experimenting with the comparison
on software systems belonging to a specific application domain;
○ Extend the comparison by considering other code smells detection tools, based on different techniques;
20 Machine Learning based Code Smell detection through WekaNose September 2018
22. Thank you for your attention
http://essere.disco.unimib.it/wiki/wekanose