SlideShare a Scribd company logo
1 of 22
Download to read offline
Machine Learning based
Code Smell detection through
WekaNose
A research accomplished by:
ESSeRE Lab - University of Milan Bicocca
What is a Code Smell?
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
More specifically:
● A code smell has to be sniffable (something that's quick to spot);
● A code smell don't always indicate a problem.
1 Machine Learning based Code Smell detection through WekaNose September 2018
What is the problem with the state of art
approaches?
● Code smells can be subjectively interpreted;
● The results provided by detectors are usually different;
● The agreement in the results is scarce;
The metrics and
thresholds selection
problems
2 Machine Learning based Code Smell detection through WekaNose September 2018
Why should we bother?
3 Machine Learning based Code Smell detection through WekaNose September 2018
● This approach allows to exploits the full interpretability of Code Smells;
● This approach requires just to describe the Code Smell, rather than formalize a definition;
● The Machine Learning algorithm learn the concept from data;
Why should we use Machine Learning algorithms?
4 Machine Learning based Code Smell detection through WekaNose September 2018
What is WekaNose?
WekaNose is a tool that supports a workflow that
aims to train Machine Learning algorithms specifically
for Code Smell detection.
This whole process is divided in three main part:
● the creation of the dataset;
● the training and testing of the
Machine Learning algorithms;
● the evaluation of the Machine Learning
algorithms performance, in term of Code Smell
detection;
5 Machine Learning based Code Smell detection through WekaNose September 2018
Main problems
6 Machine Learning based Code Smell detection through WekaNose September 2018
● Creation of a balance dataset, in terms of:
○ Labels;
○ Statistical properties;
● Machine Learning Algorithms selection:
○ No free lunch theorem
● Do high performances in the Machine Learning context imply
high performance in the actual Code Smells detection?
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on
Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 288-289. DOI: https://doi.org/10.1145/3183440.3194974
7 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Sampling and Label of
the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
Data Class:
The Data Class Code Smell refers to classes that store data without using complex functionality,
and having other classes that strongly rely on them. A Data Class reveals many attributes, it is not
complex, and it provides data field through accessor methods.
Switch Statement:
The Switch Statements code smell refers to method that contain a complex switch operator or a
sequence of if statements that compromise the readability or/and the clarity of the code.
8 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An example is the Qualitas Corpus:
the Qualitas Corpus is a curated collection of 112 software systems
intended to be used for empirical studies of code artefacts.
9 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
● A large set of object-oriented metrics
at method, class and package levels
have been taken into account and they
are considered as independent
variables in our machine learning
approach
● All metrics have been computed
through “Design Features and Metrics
for Java” (DFMC4J) [1]
[1] http://essere.disco.unimib.it/wiki/jcodeodor_doc
10 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
An Advisor is a deterministic rule, implemented locally or in an external
tool, that gives a classification of a code element (method or class), telling
if it is a code smell or not.
11 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
12 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
5 Machine Learning algorithms have
been considered so far:
● J48 (x3)
● Random Forest
● Naïve Bayes
● JRip
● SVM (x10)
Each algorithm have been trained
with and without the boosting
technique: AdaBoostM1.
Therefore 32 algorithms have been
trained, tested and compared.
13 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
The parameters are considered the best if the performances of
the machine learning algorithms are maximised by them.
WEKA provides a set of algorithms that perform a greed
search with this purpose, that can be used in WekaNose.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining:
Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
14 Machine Learning based Code Smell detection through WekaNose September 2018
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
15 Machine Learning based Code Smell detection through WekaNose September 2018
The WEKA Experimenter [1] has been used to compare the
Machine Learning algorithms, specifically:
● A 10-fold cross-validation tests with 10 repetitions were
performed for each classifiers with the best parameters
found;
● The performances were compared using the corrected
paired t-tests.
Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data
Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
WekaNose’s workflow
Describe the
Code Smell
Select a collection of
heterogeneous system
Extract code metrics
from all the systems
Use Code Smell advisors
to sample candidates
Label the instances
Choose the Machine
Learning algorithms
Perform the Machine Learning
parameter optimization
Compare the Machine Learning
algorithms with each other
Use the SonarQube Plug-in for
actually detect the Code Smell
16 Machine Learning based Code Smell detection through WekaNose September 2018
Sonar-WekaNose-Plugin is a SonarQube plugin
that is able to analyze Java code in order to notify
the presence of Code Smells by using the
Machine Learning algorithms trained through
WekaNose.
Alessandro Polastri (2018), Bachelor Thesis: “SonarQube plugin for Code Smell
detection through machine learning techniques”.
Results so far obtained: Algorithms Comparison
[1] Arcelli Fontana, Francesca & Mäntylä, Mika & Zanoni, Marco & Marino, Alessandro. (2015). Comparing and experimenting machine learning techniques for code smell detection. Empirical
Software Engineering. 21. DOI: https://doi.org/10.1007/s10664-015-9378-4;
[2] Umberto Azadi (2017), Bachelor Thesis: “Code smell detection through machine learning techniques”.
17 Machine Learning based Code Smell detection through WekaNose September 2018
Best
Algorithm
per
Code Smell
Code Smell Machine Learning Algorithm
Data Class Boosted J48 pruned
God Class Naïve Bayes
Feature Envy Boosted JRip
Long Method Boosted J48 Unpruned
Long Parameter List Boosted J48 Unpruned
Switch Statements JRip
18 Machine Learning based Code Smell detection through WekaNose September 2018
Threats to validity
● Threats to internal validity:
○ The manual evaluation of code smells is subject to certain degree error that concern the
developer’s experience, the knowledge and the ability to understand design issues and other
factors.
● Threats to external validity:
○ Code Smell candidates were selected with random sampling and stratifying the choice according
to the number of positive results of code smell Advisors. This criterion increases the probability of
selecting instances affected by code smell but the sampling method could cause a distortion during
the building of the training set, because the selection criterion is only partly random.
● Experiments limitations:
○ The selected systems are all open source;
○ The metrics used are only the one computed by DFMC4J
19 Machine Learning based Code Smell detection through WekaNose September 2018
Future Development
● Evaluate the severity (not just the presence) of the Code Smell;
● Evaluation of Active Learning techniques in this context;
● Social platforms to collect data;
● Train new machine learning algorithms in order to increase the number of identified code smells;
● Understand the correlation between the performance measures used to evaluate the machine
learning algorithms and the performance measures used to estimate the goodness of a code
smell detector;
● Design and evaluate hybrid rules that use both the machine learning algorithms and the rules set
by the user to perform a detection;
● Improve the comparison between machine learning-based and rule-based code smell detection:
○ Evaluating if there are algorithms that tend to guarantee high performance compared to actual detection,
by experimenting with comparisons on large software systems and by experimenting with the comparison
on software systems belonging to a specific application domain;
○ Extend the comparison by considering other code smells detection tools, based on different techniques;
20 Machine Learning based Code Smell detection through WekaNose September 2018
Thank you for your attention
http://essere.disco.unimib.it/wiki/wekanose

More Related Content

What's hot

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Шаблоны разработки ПО. Часть 1. Введние
Шаблоны разработки ПО. Часть 1. ВведниеШаблоны разработки ПО. Часть 1. Введние
Шаблоны разработки ПО. Часть 1. ВведниеSergey Nemchinsky
 
ccs356-software-engineering-notes.pdf
ccs356-software-engineering-notes.pdfccs356-software-engineering-notes.pdf
ccs356-software-engineering-notes.pdfVijayakumarKadumbadi
 
Design Patterns
Design PatternsDesign Patterns
Design Patternssoms_1
 
Object Oriented Testing
Object Oriented TestingObject Oriented Testing
Object Oriented TestingAMITJain879
 
Database design and Development
Database design and DevelopmentDatabase design and Development
Database design and DevelopmentMd. Mahbub Alam
 
Use case diagrams 2014
Use case diagrams 2014Use case diagrams 2014
Use case diagrams 2014Inge Powell
 
UML- Unified Modeling Language
UML- Unified Modeling LanguageUML- Unified Modeling Language
UML- Unified Modeling LanguageShahzad
 
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례JeongHeon Lee
 
Introduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsIntroduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsMario Cho
 
Working With Legacy Code
Working With Legacy CodeWorking With Legacy Code
Working With Legacy CodeAndrea Polci
 
Naming Standards, Clean Code
Naming Standards, Clean CodeNaming Standards, Clean Code
Naming Standards, Clean CodeCleanestCode
 
Android tutorial points
Android tutorial pointsAndroid tutorial points
Android tutorial pointsbsb_2209
 
Clean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit TestsClean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit Testsradin reth
 
Design Patterns Presentation - Chetan Gole
Design Patterns Presentation -  Chetan GoleDesign Patterns Presentation -  Chetan Gole
Design Patterns Presentation - Chetan GoleChetan Gole
 
Coupling and cohesion
Coupling and cohesionCoupling and cohesion
Coupling and cohesionSutha31
 

What's hot (20)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Шаблоны разработки ПО. Часть 1. Введние
Шаблоны разработки ПО. Часть 1. ВведниеШаблоны разработки ПО. Часть 1. Введние
Шаблоны разработки ПО. Часть 1. Введние
 
CS8592-OOAD Question Bank
CS8592-OOAD  Question BankCS8592-OOAD  Question Bank
CS8592-OOAD Question Bank
 
ccs356-software-engineering-notes.pdf
ccs356-software-engineering-notes.pdfccs356-software-engineering-notes.pdf
ccs356-software-engineering-notes.pdf
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
 
Object Oriented Testing
Object Oriented TestingObject Oriented Testing
Object Oriented Testing
 
Database design and Development
Database design and DevelopmentDatabase design and Development
Database design and Development
 
Use case diagrams 2014
Use case diagrams 2014Use case diagrams 2014
Use case diagrams 2014
 
UML- Unified Modeling Language
UML- Unified Modeling LanguageUML- Unified Modeling Language
UML- Unified Modeling Language
 
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
비정형 데이터를 기반으로 한 빅데이터 필요기술 및 적용사례
 
Introduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. ApplicationsIntroduce Deep learning & A.I. Applications
Introduce Deep learning & A.I. Applications
 
Refactoring
RefactoringRefactoring
Refactoring
 
Software Design Patterns
Software Design PatternsSoftware Design Patterns
Software Design Patterns
 
Working With Legacy Code
Working With Legacy CodeWorking With Legacy Code
Working With Legacy Code
 
Clean code
Clean codeClean code
Clean code
 
Naming Standards, Clean Code
Naming Standards, Clean CodeNaming Standards, Clean Code
Naming Standards, Clean Code
 
Android tutorial points
Android tutorial pointsAndroid tutorial points
Android tutorial points
 
Clean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit TestsClean code: understanding Boundaries and Unit Tests
Clean code: understanding Boundaries and Unit Tests
 
Design Patterns Presentation - Chetan Gole
Design Patterns Presentation -  Chetan GoleDesign Patterns Presentation -  Chetan Gole
Design Patterns Presentation - Chetan Gole
 
Coupling and cohesion
Coupling and cohesionCoupling and cohesion
Coupling and cohesion
 

Similar to WekaNose presentation

Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Publishing House
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smelleSAT Journals
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventuremylittleadventure
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringAli Ouni
 
Combinatorial testing
Combinatorial testingCombinatorial testing
Combinatorial testingKedar Kumar
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine LearningValéry BERNARD
 
Janus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoringJanus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoringUmbertoAzadi
 
Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02Jorge Hidalgo
 
What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?Andrey Karpov
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & ValidationElmar Selbach
 
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...Ho Chi Minh City Software Testing Club
 
Diving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the TechnologiesDiving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the TechnologiesQASymphony
 
Code coverage & tools
Code coverage & toolsCode coverage & tools
Code coverage & toolsRajesh Kumar
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsQuontra Solutions
 
Driving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive SoftwareDriving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive SoftwareParasoft
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesMohamed BOUSSAA
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
 
Lesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errorsLesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errorsPVS-Studio
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case studyEosSoftware
 

Similar to WekaNose presentation (20)

Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
Multi step automated refactoring for code smell
Multi step automated refactoring for code smellMulti step automated refactoring for code smell
Multi step automated refactoring for code smell
 
Introduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventureIntroduction Machine Learning by MyLittleAdventure
Introduction Machine Learning by MyLittleAdventure
 
Recommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software EnginneringRecommending Software Refactoring Using Search-based Software Enginnering
Recommending Software Refactoring Using Search-based Software Enginnering
 
Combinatorial testing
Combinatorial testingCombinatorial testing
Combinatorial testing
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Janus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoringJanus - Automation of duplicate code detection and refactoring
Janus - Automation of duplicate code detection and refactoring
 
Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02Open Source Power Tools - Opensouthcode 2018-06-02
Open Source Power Tools - Opensouthcode 2018-06-02
 
What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?What's the Difference Between Static Analysis and Compiler Warnings?
What's the Difference Between Static Analysis and Compiler Warnings?
 
AGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android ApplicationsAGRippin: A Novel Search Based Testing Technique for Android Applications
AGRippin: A Novel Search Based Testing Technique for Android Applications
 
Presentation Verification & Validation
Presentation Verification & ValidationPresentation Verification & Validation
Presentation Verification & Validation
 
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
A Novel Approach of Automation Test for Software Monitoring Solution - Tran S...
 
Diving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the TechnologiesDiving into the World of Test Automation The Approach and the Technologies
Diving into the World of Test Automation The Approach and the Technologies
 
Code coverage & tools
Code coverage & toolsCode coverage & tools
Code coverage & tools
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
 
Driving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive SoftwareDriving Risks Out of Embedded Automotive Software
Driving Risks Out of Embedded Automotive Software
 
GPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesGPCE16: Automatic Non-functional Testing of Code Generators Families
GPCE16: Automatic Non-functional Testing of Code Generators Families
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
 
Lesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errorsLesson 7. The issues of detecting 64-bit errors
Lesson 7. The issues of detecting 64-bit errors
 
Achieving quality with tools case study
Achieving quality with tools case studyAchieving quality with tools case study
Achieving quality with tools case study
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

WekaNose presentation

  • 1. Machine Learning based Code Smell detection through WekaNose A research accomplished by: ESSeRE Lab - University of Milan Bicocca
  • 2. What is a Code Smell? A code smell is a surface indication that usually corresponds to a deeper problem in the system. More specifically: ● A code smell has to be sniffable (something that's quick to spot); ● A code smell don't always indicate a problem. 1 Machine Learning based Code Smell detection through WekaNose September 2018
  • 3. What is the problem with the state of art approaches? ● Code smells can be subjectively interpreted; ● The results provided by detectors are usually different; ● The agreement in the results is scarce; The metrics and thresholds selection problems 2 Machine Learning based Code Smell detection through WekaNose September 2018
  • 4. Why should we bother? 3 Machine Learning based Code Smell detection through WekaNose September 2018
  • 5. ● This approach allows to exploits the full interpretability of Code Smells; ● This approach requires just to describe the Code Smell, rather than formalize a definition; ● The Machine Learning algorithm learn the concept from data; Why should we use Machine Learning algorithms? 4 Machine Learning based Code Smell detection through WekaNose September 2018
  • 6. What is WekaNose? WekaNose is a tool that supports a workflow that aims to train Machine Learning algorithms specifically for Code Smell detection. This whole process is divided in three main part: ● the creation of the dataset; ● the training and testing of the Machine Learning algorithms; ● the evaluation of the Machine Learning algorithms performance, in term of Code Smell detection; 5 Machine Learning based Code Smell detection through WekaNose September 2018
  • 7. Main problems 6 Machine Learning based Code Smell detection through WekaNose September 2018 ● Creation of a balance dataset, in terms of: ○ Labels; ○ Statistical properties; ● Machine Learning Algorithms selection: ○ No free lunch theorem ● Do high performances in the Machine Learning context imply high performance in the actual Code Smells detection?
  • 8. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell Umberto Azadi, Francesca Arcelli Fontana, and Marco Zanoni. 2018. Machine learning based code smell detection through WekaNose. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18). ACM, New York, NY, USA, 288-289. DOI: https://doi.org/10.1145/3183440.3194974 7 Machine Learning based Code Smell detection through WekaNose September 2018
  • 9. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Sampling and Label of the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell Data Class: The Data Class Code Smell refers to classes that store data without using complex functionality, and having other classes that strongly rely on them. A Data Class reveals many attributes, it is not complex, and it provides data field through accessor methods. Switch Statement: The Switch Statements code smell refers to method that contain a complex switch operator or a sequence of if statements that compromise the readability or/and the clarity of the code. 8 Machine Learning based Code Smell detection through WekaNose September 2018
  • 10. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell An example is the Qualitas Corpus: the Qualitas Corpus is a curated collection of 112 software systems intended to be used for empirical studies of code artefacts. 9 Machine Learning based Code Smell detection through WekaNose September 2018
  • 11. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell ● A large set of object-oriented metrics at method, class and package levels have been taken into account and they are considered as independent variables in our machine learning approach ● All metrics have been computed through “Design Features and Metrics for Java” (DFMC4J) [1] [1] http://essere.disco.unimib.it/wiki/jcodeodor_doc 10 Machine Learning based Code Smell detection through WekaNose September 2018
  • 12. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell An Advisor is a deterministic rule, implemented locally or in an external tool, that gives a classification of a code element (method or class), telling if it is a code smell or not. 11 Machine Learning based Code Smell detection through WekaNose September 2018
  • 13. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 12 Machine Learning based Code Smell detection through WekaNose September 2018
  • 14. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 5 Machine Learning algorithms have been considered so far: ● J48 (x3) ● Random Forest ● Naïve Bayes ● JRip ● SVM (x10) Each algorithm have been trained with and without the boosting technique: AdaBoostM1. Therefore 32 algorithms have been trained, tested and compared. 13 Machine Learning based Code Smell detection through WekaNose September 2018
  • 15. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell The parameters are considered the best if the performances of the machine learning algorithms are maximised by them. WEKA provides a set of algorithms that perform a greed search with this purpose, that can be used in WekaNose. Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016. 14 Machine Learning based Code Smell detection through WekaNose September 2018
  • 16. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 15 Machine Learning based Code Smell detection through WekaNose September 2018 The WEKA Experimenter [1] has been used to compare the Machine Learning algorithms, specifically: ● A 10-fold cross-validation tests with 10 repetitions were performed for each classifiers with the best parameters found; ● The performances were compared using the corrected paired t-tests. Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.
  • 17. WekaNose’s workflow Describe the Code Smell Select a collection of heterogeneous system Extract code metrics from all the systems Use Code Smell advisors to sample candidates Label the instances Choose the Machine Learning algorithms Perform the Machine Learning parameter optimization Compare the Machine Learning algorithms with each other Use the SonarQube Plug-in for actually detect the Code Smell 16 Machine Learning based Code Smell detection through WekaNose September 2018 Sonar-WekaNose-Plugin is a SonarQube plugin that is able to analyze Java code in order to notify the presence of Code Smells by using the Machine Learning algorithms trained through WekaNose. Alessandro Polastri (2018), Bachelor Thesis: “SonarQube plugin for Code Smell detection through machine learning techniques”.
  • 18. Results so far obtained: Algorithms Comparison [1] Arcelli Fontana, Francesca & Mäntylä, Mika & Zanoni, Marco & Marino, Alessandro. (2015). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering. 21. DOI: https://doi.org/10.1007/s10664-015-9378-4; [2] Umberto Azadi (2017), Bachelor Thesis: “Code smell detection through machine learning techniques”. 17 Machine Learning based Code Smell detection through WekaNose September 2018
  • 19. Best Algorithm per Code Smell Code Smell Machine Learning Algorithm Data Class Boosted J48 pruned God Class Naïve Bayes Feature Envy Boosted JRip Long Method Boosted J48 Unpruned Long Parameter List Boosted J48 Unpruned Switch Statements JRip 18 Machine Learning based Code Smell detection through WekaNose September 2018
  • 20. Threats to validity ● Threats to internal validity: ○ The manual evaluation of code smells is subject to certain degree error that concern the developer’s experience, the knowledge and the ability to understand design issues and other factors. ● Threats to external validity: ○ Code Smell candidates were selected with random sampling and stratifying the choice according to the number of positive results of code smell Advisors. This criterion increases the probability of selecting instances affected by code smell but the sampling method could cause a distortion during the building of the training set, because the selection criterion is only partly random. ● Experiments limitations: ○ The selected systems are all open source; ○ The metrics used are only the one computed by DFMC4J 19 Machine Learning based Code Smell detection through WekaNose September 2018
  • 21. Future Development ● Evaluate the severity (not just the presence) of the Code Smell; ● Evaluation of Active Learning techniques in this context; ● Social platforms to collect data; ● Train new machine learning algorithms in order to increase the number of identified code smells; ● Understand the correlation between the performance measures used to evaluate the machine learning algorithms and the performance measures used to estimate the goodness of a code smell detector; ● Design and evaluate hybrid rules that use both the machine learning algorithms and the rules set by the user to perform a detection; ● Improve the comparison between machine learning-based and rule-based code smell detection: ○ Evaluating if there are algorithms that tend to guarantee high performance compared to actual detection, by experimenting with comparisons on large software systems and by experimenting with the comparison on software systems belonging to a specific application domain; ○ Extend the comparison by considering other code smells detection tools, based on different techniques; 20 Machine Learning based Code Smell detection through WekaNose September 2018
  • 22. Thank you for your attention http://essere.disco.unimib.it/wiki/wekanose