SlideShare a Scribd company logo
Machine Learning based
Code Smell detection through
WekaNose
A research accomplished by:
ESSeRE Lab - University of Milan Bicocca
What is a Code Smell?
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
More specifically:
● A code smell has to be sniffable (something that's quick to spot);
● A code smell don't always indicate a problem.
1 Machine Learning based Code Smell detection through WekaNose September 2018
What is the problem with the state of art
approaches?
● Code smells can be subjectively interpreted;
● The results provided by detectors are usually different;
● The agreement in the results is scarce;
The metrics and
thresholds selection
problems
2 Machine Learning based Code Smell detection through WekaNose September 2018
Why should we bother?
3 Machine Learning based Code Smell detection through WekaNose September 2018
● This approach allows to exploits the full interpretability of Code Smells;
● This approach requires just to describe the Code Smell, rather than formalize a definition;
● The Machine Learning algorithm learn the concept from data;
Why should we use Machine Learning algorithms?
4 Machine Learning based Code Smell detection through WekaNose September 2018
What is WekaNose?
WekaNose is a tool that supports a workflow that
aims to train Machine Learning algorithms specifically
for Code Smell detection.
This whole process is divided in three main part:
● the creation of the dataset;
● the training and testing of the
Machine Learning algorithms;
● the evaluation of the Machine Learning
algorithms performance, in term of Code Smell
detection;
5 Machine Learning based Code Smell detection through WekaNose September 2018
Main problems
6 Machine Learning based Code Smell detection through WekaNose September 2018
● Creation of a balance dataset, in terms of:
○ Labels;
○ Statistical properties;
● Machine Learning Algorithms selection:
○ No free lunch theorem
● Do high performances in the Machine Learning context imply
high performance in the actual Code Smells detection?
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on
Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 288-289. DOI: https://doi.org/10.1145/3183440.3194974
7 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Sampling and Label of
the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Data Class:
The Data Class Code Smell refers to classes that store data without using complex functionality,
and having other classes that strongly rely on them. A Data Class reveals many attributes, it is not
complex, and it provides data field through accessor methods.
Switch Statement:
The Switch Statements code smell refers to method that contain a complex switch operator or a
sequence of if statements that compromise the readability or/and the clarity of the code.
8 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An example is the Qualitas Corpus:
the Qualitas Corpus is a curated collection of 112 software systems
intended to be used for empirical studies of code artefacts.
9 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
● A large set of object-oriented metrics
at method, class and package levels
have been taken into account and they
are considered as independent
variables in our machine learning
approach
● All metrics have been computed
through “Design Features and Metrics
for Java” (DFMC4J) [1]
[1] http://essere.disco.unimib.it/wiki/jcodeodor_doc
10 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An Advisor is a deterministic rule, implemented locally or in an external
tool, that gives a classification of a code element (method or class), telling
if it is a code smell or not.
11 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
12 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
5 Machine Learning algorithms have
been considered so far:
● J48 (x3)
● Random Forest
● Naïve Bayes
● JRip
● SVM (x10)
Each algorithm have been trained
with and without the boosting
technique: AdaBoostM1.
Therefore 32 algorithms have been
trained, tested and compared.
13 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
The parameters are considered the best if the performances of
the machine learning algorithms are maximised by them.
WEKA provides a set of algorithms that perform a greed
search with this purpose, that can be used in WekaNose.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining:
Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
14 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
15 Machine Learning based Code Smell detection through WekaNose September 2018
The WEKA Experimenter [1] has been used to compare the
Machine Learning algorithms, specifically:
● A 10-fold cross-validation tests with 10 repetitions were
performed for each classifiers with the best parameters
found;
● The performances were compared using the corrected
paired t-tests.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data
Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
16 Machine Learning based Code Smell detection through WekaNose September 2018
Sonar-WekaNose-Plugin is a SonarQube plugin
that is able to analyze Java code in order to notify
the presence of Code Smells by using the
Machine Learning algorithms trained through
WekaNose.
Alessandro Polastri (2018), Bachelor Thesis: “SonarQube plugin for Code Smell
detection through machine learning techniques”.
Results so far obtained: Algorithms Comparison
[1] Arcelli Fontana, Francesca & Mäntylä, Mika & Zanoni, Marco & Marino, Alessandro. (2015). Comparing and experimenting machine learning techniques for code smell detection. Empirical
Software Engineering. 21. DOI: https://doi.org/10.1007/s10664-015-9378-4;
[2] Umberto Azadi (2017), Bachelor Thesis: “Code smell detection through machine learning techniques”.
17 Machine Learning based Code Smell detection through WekaNose September 2018
Best
Algorithm
per
Code Smell
Code Smell Machine Learning Algorithm
Data Class Boosted J48 pruned
God Class Naïve Bayes
Feature Envy Boosted JRip
Long Method Boosted J48 Unpruned
Long Parameter List Boosted J48 Unpruned
Switch Statements JRip
18 Machine Learning based Code Smell detection through WekaNose September 2018
Threats to validity
● Threats to internal validity:
○ The manual evaluation of code smells is subject to certain degree error that concern the
developer’s experience, the knowledge and the ability to understand design issues and other
factors.
● Threats to external validity:
○ Code Smell candidates were selected with random sampling and stratifying the choice according
to the number of positive results of code smell Advisors. This criterion increases the probability of
selecting instances affected by code smell but the sampling method could cause a distortion during
the building of the training set, because the selection criterion is only partly random.
● Experiments limitations:
○ The selected systems are all open source;
○ The metrics used are only the one computed by DFMC4J
19 Machine Learning based Code Smell detection through WekaNose September 2018
Future Development
● Evaluate the severity (not just the presence) of the Code Smell;
● Evaluation of Active Learning techniques in this context;
● Social platforms to collect data;
● Train new machine learning algorithms in order to increase the number of identified code smells;
● Understand the correlation between the performance measures used to evaluate the machine
learning algorithms and the performance measures used to estimate the goodness of a code
smell detector;
● Design and evaluate hybrid rules that use both the machine learning algorithms and the rules set
by the user to perform a detection;
● Improve the comparison between machine learning-based and rule-based code smell detection:
○ Evaluating if there are algorithms that tend to guarantee high performance compared to actual detection,
by experimenting with comparisons on large software systems and by experimenting with the comparison
on software systems belonging to a specific application domain;
○ Extend the comparison by considering other code smells detection tools, based on different techniques;
20 Machine Learning based Code Smell detection through WekaNose September 2018
Thank you for your attention
http://essere.disco.unimib.it/wiki/wekanose

More Related Content

Similar to WekaNose presentation

An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
Valéry BERNARD
 
Janus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoringJanus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoring
UmbertoAzadi
 
Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02
Jorge Hidalgo
 
What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?
Andrey Karpov
 
AGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android ApplicationsAGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android Applications
REvERSE University of Naples Federico II
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
Elmar Selbach
 
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
Ho Chi Minh City Software Testing Club
 
Diving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the TechnologiesDiving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the Technologies
QASymphony
 
Code coverage & tools
Code coverage & toolsCode coverage & tools
Code coverage & tools
Rajesh Kumar
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
Quontra Solutions
 
Driving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive SoftwareDriving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive Software
Parasoft
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
Mohamed BOUSSAA
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
Andrey Karpov
 
Lesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errorsLesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errors
PVS-Studio
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case study
EosSoftware
 
Introduction to Parasoft C++TEST
Introduction to Parasoft C++TEST Introduction to Parasoft C++TEST
Introduction to Parasoft C++TEST
Engineering Software Lab
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)
United International University
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
PVS-Studio
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
Andrey Karpov
 

Similar to WekaNose presentation (20)

An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Janus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoringJanus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoring
 
Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02
 
What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?
 
AGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android ApplicationsAGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android Applications
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
 
Diving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the TechnologiesDiving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the Technologies
 
Code coverage & tools
Code coverage & toolsCode coverage & tools
Code coverage & tools
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
 
Driving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive SoftwareDriving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive Software
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
 
Lesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errorsLesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errors
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case study
 
Introduction to Parasoft C++TEST
Introduction to Parasoft C++TEST Introduction to Parasoft C++TEST
Introduction to Parasoft C++TEST
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)
 
Deepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine LearningDeepcoder to Self-Code with Machine Learning
Deepcoder to Self-Code with Machine Learning
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 
Regular use of static code analysis in team development
Regular use of static code analysis in team developmentRegular use of static code analysis in team development
Regular use of static code analysis in team development
 

Recently uploaded

E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 

Recently uploaded (20)

E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 

WekaNose presentation

  • 1. Machine Learning based Code Smell detection through WekaNose A research accomplished by: ESSeRE Lab - University of Milan Bicocca
  • 2. What is a Code Smell? A code smell is a surface indication that usually corresponds to a deeper problem in the system. More specifically: ● A code smell has to be sniffable (something that's quick to spot); ● A code smell don't always indicate a problem. 1 Machine Learning based Code Smell detection through WekaNose September 2018
  • 3. What is the problem with the state of art approaches? ● Code smells can be subjectively interpreted; ● The results provided by detectors are usually different; ● The agreement in the results is scarce; The metrics and thresholds selection problems 2 Machine Learning based Code Smell detection through WekaNose September 2018
  • 4. Why should we bother? 3 Machine Learning based Code Smell detection through WekaNose September 2018
  • 5. ● This approach allows to exploits the full interpretability of Code Smells; ● This approach requires just to describe the Code Smell, rather than formalize a definition; ● The Machine Learning algorithm learn the concept from data; Why should we use Machine Learning algorithms? 4 Machine Learning based Code Smell detection through WekaNose September 2018
  • 6. What is WekaNose? WekaNose is a tool that supports a workflow that aims to train Machine Learning algorithms specifically for Code Smell detection. This whole process is divided in three main part: ● the creation of the dataset; ● the training and testing of the Machine Learning algorithms; ● the evaluation of the Machine Learning algorithms performance, in term of Code Smell detection; 5 Machine Learning based Code Smell detection through WekaNose September 2018
  • 7. Main problems 6 Machine Learning based Code Smell detection through WekaNose September 2018 ● Creation of a balance dataset, in terms of: ○ Labels; ○ Statistical properties; ● Machine Learning Algorithms selection: ○ No free lunch theorem ● Do high performances in the Machine Learning context imply high performance in the actual Code Smells detection?
  • 8. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 288-289. DOI: https://doi.org/10.1145/3183440.3194974 7 Machine Learning based Code Smell detection through WekaNose September 2018
  • 9. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Sampling and Label of the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell Data Class: The Data Class Code Smell refers to classes that store data without using complex functionality, and having other classes that strongly rely on them. A Data Class reveals many attributes, it is not complex, and it provides data field through accessor methods. Switch Statement: The Switch Statements code smell refers to method that contain a complex switch operator or a sequence of if statements that compromise the readability or/and the clarity of the code. 8 Machine Learning based Code Smell detection through WekaNose September 2018
  • 10. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell An example is the Qualitas Corpus: the Qualitas Corpus is a curated collection of 112 software systems intended to be used for empirical studies of code artefacts. 9 Machine Learning based Code Smell detection through WekaNose September 2018
  • 11. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell ● A large set of object-oriented metrics at method, class and package levels have been taken into account and they are considered as independent variables in our machine learning approach ● All metrics have been computed through “Design Features and Metrics for Java” (DFMC4J) [1] [1] http://essere.disco.unimib.it/wiki/jcodeodor_doc 10 Machine Learning based Code Smell detection through WekaNose September 2018
  • 12. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell An Advisor is a deterministic rule, implemented locally or in an external tool, that gives a classification of a code element (method or class), telling if it is a code smell or not. 11 Machine Learning based Code Smell detection through WekaNose September 2018
  • 13. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 12 Machine Learning based Code Smell detection through WekaNose September 2018
  • 14. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 5 Machine Learning algorithms have been considered so far: ● J48 (x3) ● Random Forest ● Naïve Bayes ● JRip ● SVM (x10) Each algorithm have been trained with and without the boosting technique: AdaBoostM1. Therefore 32 algorithms have been trained, tested and compared. 13 Machine Learning based Code Smell detection through WekaNose September 2018
  • 15. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell The parameters are considered the best if the performances of the machine learning algorithms are maximised by them. WEKA provides a set of algorithms that perform a greed search with this purpose, that can be used in WekaNose. Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016. 14 Machine Learning based Code Smell detection through WekaNose September 2018
  • 16. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 15 Machine Learning based Code Smell detection through WekaNose September 2018 The WEKA Experimenter [1] has been used to compare the Machine Learning algorithms, specifically: ● A 10-fold cross-validation tests with 10 repetitions were performed for each classifiers with the best parameters found; ● The performances were compared using the corrected paired t-tests. Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
  • 17. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 16 Machine Learning based Code Smell detection through WekaNose September 2018 Sonar-WekaNose-Plugin is a SonarQube plugin that is able to analyze Java code in order to notify the presence of Code Smells by using the Machine Learning algorithms trained through WekaNose. Alessandro Polastri (2018), Bachelor Thesis: “SonarQube plugin for Code Smell detection through machine learning techniques”.
  • 18. Results so far obtained: Algorithms Comparison [1] Arcelli Fontana, Francesca & Mäntylä, Mika & Zanoni, Marco & Marino, Alessandro. (2015). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering. 21. DOI: https://doi.org/10.1007/s10664-015-9378-4; [2] Umberto Azadi (2017), Bachelor Thesis: “Code smell detection through machine learning techniques”. 17 Machine Learning based Code Smell detection through WekaNose September 2018
  • 19. Best Algorithm per Code Smell Code Smell Machine Learning Algorithm Data Class Boosted J48 pruned God Class Naïve Bayes Feature Envy Boosted JRip Long Method Boosted J48 Unpruned Long Parameter List Boosted J48 Unpruned Switch Statements JRip 18 Machine Learning based Code Smell detection through WekaNose September 2018
  • 20. Threats to validity ● Threats to internal validity: ○ The manual evaluation of code smells is subject to certain degree error that concern the developer’s experience, the knowledge and the ability to understand design issues and other factors. ● Threats to external validity: ○ Code Smell candidates were selected with random sampling and stratifying the choice according to the number of positive results of code smell Advisors. This criterion increases the probability of selecting instances affected by code smell but the sampling method could cause a distortion during the building of the training set, because the selection criterion is only partly random. ● Experiments limitations: ○ The selected systems are all open source; ○ The metrics used are only the one computed by DFMC4J 19 Machine Learning based Code Smell detection through WekaNose September 2018
  • 21. Future Development ● Evaluate the severity (not just the presence) of the Code Smell; ● Evaluation of Active Learning techniques in this context; ● Social platforms to collect data; ● Train new machine learning algorithms in order to increase the number of identified code smells; ● Understand the correlation between the performance measures used to evaluate the machine learning algorithms and the performance measures used to estimate the goodness of a code smell detector; ● Design and evaluate hybrid rules that use both the machine learning algorithms and the rules set by the user to perform a detection; ● Improve the comparison between machine learning-based and rule-based code smell detection: ○ Evaluating if there are algorithms that tend to guarantee high performance compared to actual detection, by experimenting with comparisons on large software systems and by experimenting with the comparison on software systems belonging to a specific application domain; ○ Extend the comparison by considering other code smells detection tools, based on different techniques; 20 Machine Learning based Code Smell detection through WekaNose September 2018
  • 22. Thank you for your attention http://essere.disco.unimib.it/wiki/wekanose