SlideShare a Scribd company logo
1 of 26
IEEE Transactions on Software Engineering, 2019
Deep Learning based
Code Smell Detection
Hui Liu, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu and Lu Zhang
Presented By: Sayed Mohsin Reza PhD Student, CS, UTEP
Paper DOI: https://doi.org/10.1109/TSE.2019.2936376
What is Code Smell?
• A code smell is a surface indication that usually
corresponds to a deeper/inner problem in the software
• Code smells suggest the possibility of refactoring
• Software refactoring is an effective means to improve
software quality
• Examples: Feature Envy, Large Class, Long Method etc.
2
Figure: Example of a Code Smell:
Duplicate Code
Introduction
• Background: Code Smells are needed to fix to improve the
software quality
• Problem: manual identification of code smells is challenging
and tedious
• Solution: Use Deep Learning Technique to identify code
smells
• Motivation: Unmaintained code increase actual cost of
development over time and identifying code smells can
reduce the cost
3
Figure: Technical debt arises for
unmaintained code2
2 Falon Fatemi, "Technical Debt: The Silent Company Killer ", Forbes Magazine Report
Objective
• Primary: Identifying following code smells using deep learning technique
1. Feature Envy
2. Long Method
3. God Class
4. Misplaced Class
• Secondary: Provide suggestions of possible refactoring opportunities..
4
Selected Code Smell Definition
1. Feature Envy - when a method uses more features (i.e.,
fields and methods) of another class than of its own
2. Long Method – when a method has too much
functionalities and too much coding
3. Large Class – when a class is doing too much and
containing too much code
4. Misplaced Class – when a class is belonging to one
package whereas but better fit in another package
Figure: Examples of Code Smells
(2) Long Method (3) Large Class
(1) Feature Envy
5
Proposed Approach
Download Repositories
from corpus website1
Step 1
1 http://qualitascorpus.com/, curated collection of software systems intended to be used for empirical studies of
code artefacts
Generation of
Training Data
Step 2
Labelled Code
Smells
Deep Learning
Techniques
Step 3
Model
TrainingPhaseTestingPhase
Provide new software
repository
Step 4
Generation of
Testing Data
Step 5
Classify the code
smells
• God Class
• Long Method
• Feature Envy
• Misplaced Class
6
Research
Questions
(RQs)
• Does the proposed approach
outperform the state-of-the-art
approaches in identifying code smells?
• feature envy (RQ1)
• long methods (RQ3)
• large class (RQ4)
• misplaced class (RQ5)
• Is the proposed approach accurate in
recommending destinations (target
classes) for methods associated with
feature envy (RQ2), target packages for
misplaced class (RQ6)?
7
Subject Codebases
• Consider Open-source applications (Java only)
• Facilitates researchers to repeat the
evaluation
• Popular and high-quality codebases
• Development is involved for more than 5
years
Table: Subject Codebases3
83 http://qualitascorpus.com/
Generation of Training Data
Look for suggested refactoring
opportunities after uploading in IDE using
Eclipse JDT refactoring tool
Step 1 Step 2
Import projects into
Eclipse JDT IDE
Step 3
Generate Training Data
File Name Method
name
Refactoring Code
Smell
...
Model.java login Move
method
Feature
envy
...
Model.java login Inline
method
Long
method
...
... ... ... ... ...
Model.java Model Extract
class
Large
class
...
Model.java Model Move Class Misplac
ed Class
...
9
Training Dataset Structure
Feature Sets Target Sets
File Name Enclosing
Class (ec)
Target Class (tc) Method
(m)
Dist(m,ec) Dist(m,tc) Line of
Code
(LOC)
Cohesion
(COH)
… Number of
Accessed
variables
(NOAV)
Number of
Public
attributes
(NOPA)
Number of
Methods
(NOM)
Feature
Envy
Long
Method
Large
Class
Misplaced
Class
Model.java Model Model3,model4 login 10 ... 0 0 1 0
Model2.java Model2 null login 20 ... 0 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Model3.java Model3 30 ... 1 1 0 0
Model4.java Model4 40 ... 1 0 0 0
Dataset is available at - J. Jin, Z. Xu, and Y. Bu. (2019) Deep Smell Detector. [Online]. Available:
https://github.com/liuhuigmail/DeepSmellDetection
10
Table: Deep Code Smell Dataset Structure
1 Feature Envy Classifier
Classifier
Input Variables
1. Name(m) - method under investigation
2. Name(ec) - method or class name of
enclosing class
3. Name(tc) - Potential target class
4. Dist(m, ec) - distance between method
and enclosing class
5. Dist(m, tc) - distance between method
and target class
( See distance detailed formula in Appendix
A)
* ec= enclosing class, tc= target class, m =
method
Figure: Classifier for Feature Envy Detection
11
2 Long Method Detection
Classifier
Input Variables
1. LOC(m) - Lines of Code
2. LCOM(m) - Lack of Cohesion of Methods
3. COH(m) - Cohesion of method
4. CC(m) - Class Cohesion
5. NOAV(m) – No of Accessed variables
6. CD(m) -Coupling Dispersion [measures
how much the coupling of the method
involves external classes]
7. MCN(m) - McCabe’s Cyclomatic Number
[measure the complexity of the method]
Figure: Classifier for Long Method Detection
12
3 Large Class Classifier
Classifier
Input Variables
1. AFTD(c) - Access to foreign data
2. DCC(c) - Direct Class Coupling
3. DIT(c) - Depth of Inheritance Tree
4. TCC(c) - Tight Class Cohesion
5. LOCM(c) - Lack of cohesion methods
6. CAM(c) - Cohesion among methods
7. WMC(c) - Weighted method count
8. NOPA(c) - Number of public methods
9. NOAM (c) - Number of Access Method
10. NOA(c)- Number of Attributes
11. NOM(c) – Number of Methods
Figure: Classifier for Large Class
Detection
13
4 Misplaced Class Classifier
Classifier
Input variables
1. Name(c) - name of class
2. Name(ep) - enclosing package
3. Name(tp) - target package
4. CBO(c, ep) - Coupling between objects
5. MPC (c,ep) – Max Message Passing
Coupling
* c= class, ep = enclosing package, tp = target
package
Figure: Classifier for Misplaced Class
Detection
14
EnclosingPackageTargetPackage
Evaluation
Compare the proposed approach against
• JDeodorant, a refactoring tool which can detect Feature Envy .
• DECOR, a tool for detecting Long Methods & Large Class .
• TACO, a textual-based technique to detect Misplaced Class.
Tool Demo Video: https://www.youtube.com/watch?v=LtH8uF0epV0​ 15
Evaluation Results: (1) Feature Envy
RQ1: Does the proposed approach outperform the state-of-the-art approaches in identifying feature envy?
Answer: The proposed approach significantly outperforms the state-of-the-art in identifying feature envy.
16
Table: Evaluation Results on Feature Envy Detection
Observations
• Average F1 score of proposed
approach is 51.91% whereas the
average F1 score of JDeodorant
is 24.51%.
• Average recall of proposed
approach is up to 88.11%
whereas Jdeodorant has 16.6%
Evaluation Results: (1) Feature Envy
Continue
RQ2: Is the proposed approach accurate in
recommending destinations (target classes) for methods
associated with feature envy?
Answer: The proposed approach is more accurate in
recommending destinations for feature envy methods. Table: Accuracy in Recommending Target Classes
17
Observation:
• Proposed approach is 27.25% more accurate than
JDeodorant in recommending destinations for smelly
methods.
Evaluation Results: (2) Long Methods
RQ3: Does the proposed approach outperform the state-of-the-art approaches in identifying long methods?
Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying long methods.
Table: Evaluation Results on Long Method Detection
18
Observations:
• Proposed approach
identifies most of the long
methods with average
recall 78.99% and F1
score 55.53%.
• DECOR improves
precision at the cost of
significant reduction in
recall.
Evaluation Results: (3) Large Class
RQ4: Does the proposed approach outperform the state-of-the-art approaches in identifying large class?
Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying large class.
Table: Evaluation Results on Large Class Detection
19
Observations:
• Proposed approach improves
recall (80.95%) significantly at
the cost of reduced precision
• Proposed approach
outperforms DÉCOR in F1
scores, MCC, and AUC
Evaluation Results: (4) Misplaced Class
RQ5: Does the proposed approach outperform the state-of-the-art approaches in identifying Misplaced class?
Answer: The proposed approach significantly outperforms the state-of-the-art in identifying Misplaced class.
Table: Evaluation Results on Misplaced Class Detection
20
Observations:
• Proposed approach
outperforms TACO in F1
Score, MCC, and AUC.
• Proposed approach
improves both precision
and recall significantly.
Evaluation Results: (4) Misplaced Class
RQ6: Is the proposed approach accurate in
recommending target packages for misplaced classes?
Answer: the proposed approach outperforms the
baseline in identifying misplaced classes, and it is
comparable to the baseline in recommending target
packages. Table: Accuracy in Recommending Target Packages
21
Observations:
1. Proposed approach results in greater number of
accepted recommendations
2. TACO is more accurate than the proposed
approach in recommending target packages
Conclusion & Future Work
• Proposed a deep learning-based approach to detect code smells
• Proposed a custom technique for creation of labeled training dataset
• Improve F-measure by 27.4% in feature envy detection, 15.11% in long method detection, 4.73% in
large class detection, and 48.18% in misplaced class detection
• Improves the state-of-the-art in software code smells detection
• Future works
• Detect additional categories of code smells: data clumps, lazy class etc
• Integration with IDE may benefit developers who are looking for refactoring opportunities
22
My Critic
• In evaluation, they use accuracy, recall, precision, and F1 scores. Other relevant and important
metrics should be included.
Suggestions: False Positive Rate (FPR) and False Negative Rate (FNR) can be included to show how
many false alarms the models generate
• A relatively small data set extracted from only 10 code repositories1.
Suggestion – include more codebases into the datasets
23
1 http://qualitascorpus.com/ , curated collection of software systems intended to be used for empirical studies of code artefacts
Questions
Summary
24
Download Repositories
from corpus website1
Step 1
Generation of
Training Data
Step 2
Labelled Code
Smells
Deep Learning
Techniques
Step 3
Model
TrainingPhaseTestingPhase
Provide new software
repository
Step 4
Generation of
Training Data
Step 5
Classify the code
smells
• God Class
• Long Method
• Feature Envy
• Misplaced Class
• The proposed approach is established a better technique in
Identifying code smells
• The proposed approach is successful in suggesting possible
refactoring opportunities
Figure: Proposed Approach
Appendix A - Distance metrics formula
1. If method m does not belong to Class C, the distance is computed as follows:
2. Otherwise, the distance is computed as follows:
Where , S = set of entities in method or class level
e = entity (attribute or method)
25
Appendix B - Performance Metrics
1. Accuracy is calculated as
2. Precision, recall and F1 Score is calculated as
3. Matthews Correlation Coefficient is calculated as
26

More Related Content

What's hot

The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debuggingsvilen.ivanov
 
Social network with microservices
Social network with microservicesSocial network with microservices
Social network with microservicesViet Tran
 
Intégration continue et déploiement continue avec Jenkins
Intégration continue et déploiement continue avec JenkinsIntégration continue et déploiement continue avec Jenkins
Intégration continue et déploiement continue avec JenkinsKokou Gaglo
 
Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Larry Nung
 
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...Simplilearn
 
TDD (Test Driven Developement) et refactoring
TDD (Test Driven Developement) et refactoringTDD (Test Driven Developement) et refactoring
TDD (Test Driven Developement) et refactoringneuros
 
Codemotion Madrid 2023 - Testcontainers y Spring Boot
Codemotion Madrid 2023 - Testcontainers y Spring BootCodemotion Madrid 2023 - Testcontainers y Spring Boot
Codemotion Madrid 2023 - Testcontainers y Spring BootIván López Martín
 
Resilience4j with Spring Boot
Resilience4j with Spring BootResilience4j with Spring Boot
Resilience4j with Spring BootKnoldus Inc.
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners HubSpot
 
Microservices avec Spring Cloud
Microservices avec Spring CloudMicroservices avec Spring Cloud
Microservices avec Spring CloudFlorian Beaufumé
 
Formation Gratuite Total Tests par les experts Java Ippon
Formation Gratuite Total Tests par les experts Java Ippon Formation Gratuite Total Tests par les experts Java Ippon
Formation Gratuite Total Tests par les experts Java Ippon Ippon
 
Qu'est ce qu'un logiciel de qualité
Qu'est ce qu'un logiciel de qualitéQu'est ce qu'un logiciel de qualité
Qu'est ce qu'un logiciel de qualitéSylvain Leroy
 

What's hot (20)

The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debugging
 
Social network with microservices
Social network with microservicesSocial network with microservices
Social network with microservices
 
Intégration continue et déploiement continue avec Jenkins
Intégration continue et déploiement continue avec JenkinsIntégration continue et déploiement continue avec Jenkins
Intégration continue et déploiement continue avec Jenkins
 
Debugging in visual studio (basic level)
Debugging in visual studio (basic level)Debugging in visual studio (basic level)
Debugging in visual studio (basic level)
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...
Git Tutorial For Beginners | What is Git and GitHub? | DevOps Tools | DevOps ...
 
Dependency Injection
Dependency InjectionDependency Injection
Dependency Injection
 
TDD (Test Driven Developement) et refactoring
TDD (Test Driven Developement) et refactoringTDD (Test Driven Developement) et refactoring
TDD (Test Driven Developement) et refactoring
 
Codemotion Madrid 2023 - Testcontainers y Spring Boot
Codemotion Madrid 2023 - Testcontainers y Spring BootCodemotion Madrid 2023 - Testcontainers y Spring Boot
Codemotion Madrid 2023 - Testcontainers y Spring Boot
 
Sonarlint
SonarlintSonarlint
Sonarlint
 
Resilience4j with Spring Boot
Resilience4j with Spring BootResilience4j with Spring Boot
Resilience4j with Spring Boot
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 
Support JEE Servlet Jsp MVC M.Youssfi
Support JEE Servlet Jsp MVC M.YoussfiSupport JEE Servlet Jsp MVC M.Youssfi
Support JEE Servlet Jsp MVC M.Youssfi
 
Microservices avec Spring Cloud
Microservices avec Spring CloudMicroservices avec Spring Cloud
Microservices avec Spring Cloud
 
Java Quiz Questions
Java Quiz QuestionsJava Quiz Questions
Java Quiz Questions
 
Formation Gratuite Total Tests par les experts Java Ippon
Formation Gratuite Total Tests par les experts Java Ippon Formation Gratuite Total Tests par les experts Java Ippon
Formation Gratuite Total Tests par les experts Java Ippon
 
Introduction to Maven
Introduction to MavenIntroduction to Maven
Introduction to Maven
 
GitHub Presentation
GitHub PresentationGitHub Presentation
GitHub Presentation
 
Java
JavaJava
Java
 
Qu'est ce qu'un logiciel de qualité
Qu'est ce qu'un logiciel de qualitéQu'est ce qu'un logiciel de qualité
Qu'est ce qu'un logiciel de qualité
 

Similar to Deep learning based code smell detection - Qualifying Talk

DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALEDETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALEijseajournal
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smellskim.mens
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Ch 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research pCh 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research pMaximaSheffield592
 
Ch 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research pCh 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research pnand15
 
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled ExperimentSoftware Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled ExperimentRichard Wettel
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTESSOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTESsuthi
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...ijseajournal
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_designMajong DevJfu
 
Development Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsDevelopment Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsSebastiano Panichella
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 

Similar to Deep learning based code smell detection - Qualifying Talk (20)

VISSOFTPresentation.pdf
VISSOFTPresentation.pdfVISSOFTPresentation.pdf
VISSOFTPresentation.pdf
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALEDETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALE
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
Bad Code Smells
Bad Code SmellsBad Code Smells
Bad Code Smells
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Ch 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research pCh 6 only 1. Distinguish between a purpose statement, research p
Ch 6 only 1. Distinguish between a purpose statement, research p
 
Ch 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research pCh 6 only 1. distinguish between a purpose statement, research p
Ch 6 only 1. distinguish between a purpose statement, research p
 
Software Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled ExperimentSoftware Systems as Cities: a Controlled Experiment
Software Systems as Cities: a Controlled Experiment
 
ThesisPresentation
ThesisPresentationThesisPresentation
ThesisPresentation
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTESSOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
SOFTWARE QUALITY ASSURANCE AND TESTING - SHORT NOTES
 
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
EFFECTIVE IMPLEMENTATION OF AGILE PRACTICES – OBJECT ORIENTED METRICS TOOL TO...
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Refactoring
RefactoringRefactoring
Refactoring
 
Refactor to the Limit!
Refactor to the Limit!Refactor to the Limit!
Refactor to the Limit!
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
 
06 styles and_greenfield_design
06 styles and_greenfield_design06 styles and_greenfield_design
06 styles and_greenfield_design
 
Development Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsDevelopment Emails Content Analyzer: Intention Mining in Developer Discussions
Development Emails Content Analyzer: Intention Mining in Developer Discussions
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 

Deep learning based code smell detection - Qualifying Talk

  • 1. IEEE Transactions on Software Engineering, 2019 Deep Learning based Code Smell Detection Hui Liu, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu and Lu Zhang Presented By: Sayed Mohsin Reza PhD Student, CS, UTEP Paper DOI: https://doi.org/10.1109/TSE.2019.2936376
  • 2. What is Code Smell? • A code smell is a surface indication that usually corresponds to a deeper/inner problem in the software • Code smells suggest the possibility of refactoring • Software refactoring is an effective means to improve software quality • Examples: Feature Envy, Large Class, Long Method etc. 2 Figure: Example of a Code Smell: Duplicate Code
  • 3. Introduction • Background: Code Smells are needed to fix to improve the software quality • Problem: manual identification of code smells is challenging and tedious • Solution: Use Deep Learning Technique to identify code smells • Motivation: Unmaintained code increase actual cost of development over time and identifying code smells can reduce the cost 3 Figure: Technical debt arises for unmaintained code2 2 Falon Fatemi, "Technical Debt: The Silent Company Killer ", Forbes Magazine Report
  • 4. Objective • Primary: Identifying following code smells using deep learning technique 1. Feature Envy 2. Long Method 3. God Class 4. Misplaced Class • Secondary: Provide suggestions of possible refactoring opportunities.. 4
  • 5. Selected Code Smell Definition 1. Feature Envy - when a method uses more features (i.e., fields and methods) of another class than of its own 2. Long Method – when a method has too much functionalities and too much coding 3. Large Class – when a class is doing too much and containing too much code 4. Misplaced Class – when a class is belonging to one package whereas but better fit in another package Figure: Examples of Code Smells (2) Long Method (3) Large Class (1) Feature Envy 5
  • 6. Proposed Approach Download Repositories from corpus website1 Step 1 1 http://qualitascorpus.com/, curated collection of software systems intended to be used for empirical studies of code artefacts Generation of Training Data Step 2 Labelled Code Smells Deep Learning Techniques Step 3 Model TrainingPhaseTestingPhase Provide new software repository Step 4 Generation of Testing Data Step 5 Classify the code smells • God Class • Long Method • Feature Envy • Misplaced Class 6
  • 7. Research Questions (RQs) • Does the proposed approach outperform the state-of-the-art approaches in identifying code smells? • feature envy (RQ1) • long methods (RQ3) • large class (RQ4) • misplaced class (RQ5) • Is the proposed approach accurate in recommending destinations (target classes) for methods associated with feature envy (RQ2), target packages for misplaced class (RQ6)? 7
  • 8. Subject Codebases • Consider Open-source applications (Java only) • Facilitates researchers to repeat the evaluation • Popular and high-quality codebases • Development is involved for more than 5 years Table: Subject Codebases3 83 http://qualitascorpus.com/
  • 9. Generation of Training Data Look for suggested refactoring opportunities after uploading in IDE using Eclipse JDT refactoring tool Step 1 Step 2 Import projects into Eclipse JDT IDE Step 3 Generate Training Data File Name Method name Refactoring Code Smell ... Model.java login Move method Feature envy ... Model.java login Inline method Long method ... ... ... ... ... ... Model.java Model Extract class Large class ... Model.java Model Move Class Misplac ed Class ... 9
  • 10. Training Dataset Structure Feature Sets Target Sets File Name Enclosing Class (ec) Target Class (tc) Method (m) Dist(m,ec) Dist(m,tc) Line of Code (LOC) Cohesion (COH) … Number of Accessed variables (NOAV) Number of Public attributes (NOPA) Number of Methods (NOM) Feature Envy Long Method Large Class Misplaced Class Model.java Model Model3,model4 login 10 ... 0 0 1 0 Model2.java Model2 null login 20 ... 0 0 1 1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Model3.java Model3 30 ... 1 1 0 0 Model4.java Model4 40 ... 1 0 0 0 Dataset is available at - J. Jin, Z. Xu, and Y. Bu. (2019) Deep Smell Detector. [Online]. Available: https://github.com/liuhuigmail/DeepSmellDetection 10 Table: Deep Code Smell Dataset Structure
  • 11. 1 Feature Envy Classifier Classifier Input Variables 1. Name(m) - method under investigation 2. Name(ec) - method or class name of enclosing class 3. Name(tc) - Potential target class 4. Dist(m, ec) - distance between method and enclosing class 5. Dist(m, tc) - distance between method and target class ( See distance detailed formula in Appendix A) * ec= enclosing class, tc= target class, m = method Figure: Classifier for Feature Envy Detection 11
  • 12. 2 Long Method Detection Classifier Input Variables 1. LOC(m) - Lines of Code 2. LCOM(m) - Lack of Cohesion of Methods 3. COH(m) - Cohesion of method 4. CC(m) - Class Cohesion 5. NOAV(m) – No of Accessed variables 6. CD(m) -Coupling Dispersion [measures how much the coupling of the method involves external classes] 7. MCN(m) - McCabe’s Cyclomatic Number [measure the complexity of the method] Figure: Classifier for Long Method Detection 12
  • 13. 3 Large Class Classifier Classifier Input Variables 1. AFTD(c) - Access to foreign data 2. DCC(c) - Direct Class Coupling 3. DIT(c) - Depth of Inheritance Tree 4. TCC(c) - Tight Class Cohesion 5. LOCM(c) - Lack of cohesion methods 6. CAM(c) - Cohesion among methods 7. WMC(c) - Weighted method count 8. NOPA(c) - Number of public methods 9. NOAM (c) - Number of Access Method 10. NOA(c)- Number of Attributes 11. NOM(c) – Number of Methods Figure: Classifier for Large Class Detection 13
  • 14. 4 Misplaced Class Classifier Classifier Input variables 1. Name(c) - name of class 2. Name(ep) - enclosing package 3. Name(tp) - target package 4. CBO(c, ep) - Coupling between objects 5. MPC (c,ep) – Max Message Passing Coupling * c= class, ep = enclosing package, tp = target package Figure: Classifier for Misplaced Class Detection 14 EnclosingPackageTargetPackage
  • 15. Evaluation Compare the proposed approach against • JDeodorant, a refactoring tool which can detect Feature Envy . • DECOR, a tool for detecting Long Methods & Large Class . • TACO, a textual-based technique to detect Misplaced Class. Tool Demo Video: https://www.youtube.com/watch?v=LtH8uF0epV0​ 15
  • 16. Evaluation Results: (1) Feature Envy RQ1: Does the proposed approach outperform the state-of-the-art approaches in identifying feature envy? Answer: The proposed approach significantly outperforms the state-of-the-art in identifying feature envy. 16 Table: Evaluation Results on Feature Envy Detection Observations • Average F1 score of proposed approach is 51.91% whereas the average F1 score of JDeodorant is 24.51%. • Average recall of proposed approach is up to 88.11% whereas Jdeodorant has 16.6%
  • 17. Evaluation Results: (1) Feature Envy Continue RQ2: Is the proposed approach accurate in recommending destinations (target classes) for methods associated with feature envy? Answer: The proposed approach is more accurate in recommending destinations for feature envy methods. Table: Accuracy in Recommending Target Classes 17 Observation: • Proposed approach is 27.25% more accurate than JDeodorant in recommending destinations for smelly methods.
  • 18. Evaluation Results: (2) Long Methods RQ3: Does the proposed approach outperform the state-of-the-art approaches in identifying long methods? Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying long methods. Table: Evaluation Results on Long Method Detection 18 Observations: • Proposed approach identifies most of the long methods with average recall 78.99% and F1 score 55.53%. • DECOR improves precision at the cost of significant reduction in recall.
  • 19. Evaluation Results: (3) Large Class RQ4: Does the proposed approach outperform the state-of-the-art approaches in identifying large class? Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying large class. Table: Evaluation Results on Large Class Detection 19 Observations: • Proposed approach improves recall (80.95%) significantly at the cost of reduced precision • Proposed approach outperforms DÉCOR in F1 scores, MCC, and AUC
  • 20. Evaluation Results: (4) Misplaced Class RQ5: Does the proposed approach outperform the state-of-the-art approaches in identifying Misplaced class? Answer: The proposed approach significantly outperforms the state-of-the-art in identifying Misplaced class. Table: Evaluation Results on Misplaced Class Detection 20 Observations: • Proposed approach outperforms TACO in F1 Score, MCC, and AUC. • Proposed approach improves both precision and recall significantly.
  • 21. Evaluation Results: (4) Misplaced Class RQ6: Is the proposed approach accurate in recommending target packages for misplaced classes? Answer: the proposed approach outperforms the baseline in identifying misplaced classes, and it is comparable to the baseline in recommending target packages. Table: Accuracy in Recommending Target Packages 21 Observations: 1. Proposed approach results in greater number of accepted recommendations 2. TACO is more accurate than the proposed approach in recommending target packages
  • 22. Conclusion & Future Work • Proposed a deep learning-based approach to detect code smells • Proposed a custom technique for creation of labeled training dataset • Improve F-measure by 27.4% in feature envy detection, 15.11% in long method detection, 4.73% in large class detection, and 48.18% in misplaced class detection • Improves the state-of-the-art in software code smells detection • Future works • Detect additional categories of code smells: data clumps, lazy class etc • Integration with IDE may benefit developers who are looking for refactoring opportunities 22
  • 23. My Critic • In evaluation, they use accuracy, recall, precision, and F1 scores. Other relevant and important metrics should be included. Suggestions: False Positive Rate (FPR) and False Negative Rate (FNR) can be included to show how many false alarms the models generate • A relatively small data set extracted from only 10 code repositories1. Suggestion – include more codebases into the datasets 23 1 http://qualitascorpus.com/ , curated collection of software systems intended to be used for empirical studies of code artefacts
  • 24. Questions Summary 24 Download Repositories from corpus website1 Step 1 Generation of Training Data Step 2 Labelled Code Smells Deep Learning Techniques Step 3 Model TrainingPhaseTestingPhase Provide new software repository Step 4 Generation of Training Data Step 5 Classify the code smells • God Class • Long Method • Feature Envy • Misplaced Class • The proposed approach is established a better technique in Identifying code smells • The proposed approach is successful in suggesting possible refactoring opportunities Figure: Proposed Approach
  • 25. Appendix A - Distance metrics formula 1. If method m does not belong to Class C, the distance is computed as follows: 2. Otherwise, the distance is computed as follows: Where , S = set of entities in method or class level e = entity (attribute or method) 25
  • 26. Appendix B - Performance Metrics 1. Accuracy is calculated as 2. Precision, recall and F1 Score is calculated as 3. Matthews Correlation Coefficient is calculated as 26

Editor's Notes

  1. Hello everyone, Thank you for joining qualifying talk. I am Sayed Mohsin Reza and presenting my talk on “deep learning-based code smell detection”,. The paper was published in IEEE transaction in 2019 I have shared the link of this slide on the chat for your convenience
  2. I am describing code smell little bit for those who are not familiar with this term. A code smell ….
  3. JUnit is a unit testing framework for the Java programming language.
  4. CNN layer - filters = 128, kernel size = 1 and activation = tanh, dense =128 neurons - The model employs binary crossentropy as the loss function. - A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer - A flatten layer collapses the spatial dimensions of the input into the channel dimension.
  5. - CNN layer - filters = 128, kernel size = 1 and activation = tanh, dense =128 neurons - The model employs binary crossentropy as the loss function. - A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer
  6. - Embedding layer – they convert words in identifiers into fixed length numerical vectors using word2vector package, a high-quality distributed vector representation A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. An LSTM layer above provides a sequence output rather than a single value output to the LSTM layer below
  7. - Embedding layer – they convert words in identifiers into fixed length numerical vectors using word2vector package, a high-quality distributed vector representation - A convolutional layer contains a set of filters whose parameters need to be learned. - A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer - A flatten layer collapses the spatial dimensions of the input into the channel dimension.
  8. Demo Video: https://www.youtube.com/watch?v=LtH8uF0epV0
  9. Recall is the number of smelly classes that predicted correctly in terms of the total number of actual smelly classes, precision is the smelly classes predicted correctly in terms of the total number of predicted smelly classes F1 Score is needed when you want to seek a balance between Precision and Recall. - MCC- Matthews Correlation Coefficient - measure of the quality of binary (two-class) classifications, - AUC - Area Under Curve
  10. 1 - https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9