Presented by; Sayed Mohsin Reza, Ph.D. Student, Computer Science, University of Texas
Abstract:
Code smells are structures in the source code that suggest the possibility of refactorings. Consequently, developers may identify refactoring opportunities by detecting code smells. However, manual identification of code smells is challenging and tedious. To this end, a number of approaches have been proposed to identify code smells automatically or semi-automatically. Most of such approaches rely on manually designed heuristics to map manually selected source code metrics into predictions. However, it is challenging to manually select the best features. It is also difficult to manually construct the optimal heuristics. To this end, in this paper we propose a deep learning based novel approach to detecting code smells. The key insight is that deep neural networks and advanced deep learning techniques could automatically select features of source code for code smell detection, and could automatically build the complex mapping between such features and predictions. A big challenge for deep learning based smell detection is that deep learning often requires a large number of labeled training data (to tune a large number of parameters within the employed deep neural network) whereas existing datasets for code smell detection are rather small. To this end, we propose an automatic approach to generating labeled training data for the neural network based classifier, which does not require any human intervention. As an initial try, we apply the proposed approach to four common and well-known code smells, i.e., feature envy, long method, large class, and misplaced class. Evaluation results on open-source applications suggest that the proposed approach significantly improves the state-of-the-art.
Slides of the talk held at JEEConf, Kiev and jPrime, Sofia. A personal view on the classic topics from the Uncle Bob's Clean Code bible, with some personal additions and tips&tricks. This topic actually represents the core of the training sessions that I provide as an independent trainer (www.victorrentea.ro)
by Marcel Böhme. ICSE'22 (NIER) conference presentation for our paper on "Statistical Reasoning about Programs".
Paper: https://mboehme.github.io/paper/ICSE22.NIER.pdf
Video: https://www.youtube.com/watch?v=nOCjesMumiM
Backing slides for the mini-talk requested by the audience about how Spring @Transactional actually works. (Extract from my Spring Framework Training). See www.victorrentea.ro/#spring for more details about the full training
Slides of the talk held at JEEConf, Kiev and jPrime, Sofia. A personal view on the classic topics from the Uncle Bob's Clean Code bible, with some personal additions and tips&tricks. This topic actually represents the core of the training sessions that I provide as an independent trainer (www.victorrentea.ro)
by Marcel Böhme. ICSE'22 (NIER) conference presentation for our paper on "Statistical Reasoning about Programs".
Paper: https://mboehme.github.io/paper/ICSE22.NIER.pdf
Video: https://www.youtube.com/watch?v=nOCjesMumiM
Backing slides for the mini-talk requested by the audience about how Spring @Transactional actually works. (Extract from my Spring Framework Training). See www.victorrentea.ro/#spring for more details about the full training
Clan code is extremely essential to build scalable application which can be maintained quite easily and improved further
Slide was prepared with contribution to my colleague
and i thank them for the help!
Presentation from Agile Base Camp 2 conference (Kiev, May 2010) and AgileDays'11 (Moscow, March 2011) about one of the most useful engineering practices from XP world.
European small and medium-size enterprises (SMEs) as well as large corporations--whether already operating internationally or seeking to branch out to other countries and markets-- face multiple constraints to engage in trade abroad and to localise their products and services to other countries, mainly as a consequence of legal and language barriers.
Lynx will provide more effective ways of accessing huge amounts of digital regulatory compliance documents, including legislation, case law, standards, industry norms and best practices. In particular, this solution envisages an ecosystem of smart cloud services to better manage compliance documents, based on a Legal Knowledge Graph which integrates and links heterogeneous compliance data sources.
Testing is fundamental in software development. Quality gates demand high coverage levels, pull requests need sufficient tests, leading to teams spending considerable time writing and maintaining them. But are we using our tests to their full potential?
'If code is hard to test, the design can be improved'. Starting from this mantra, this deep-dive session unveils hints to simplify code, break-down complexity, and effectively use functional programming. We'll delve into topics like fixture creep, partial mocks, onion architecture, and pure functions, providing numerous best practices and practical tips for your testing.
Be warned: This session may significantly disrupt your work routine and will likely change how you see testing. Attend at your own risk.
Slide ini menjelaskan perihal penggunaan komentar yang baik dan buruk pada suatu kode program. Slide ini merupakan bahan ajar untuk mata kuliah Clean Code dan Design Pattern.
A discussion on using AI for extracting/ representing search intent for e-commerce queries. Presented at MICES 2021
About The Author:
Aritra Mandal is an applied researcher on the search team at eBay. He focuses on query understanding and is leveraging AI/ML, structured data, and knowledge graphs to improve the search engine for e-commerce marketplace.
Aritra received his B.Eng in computer science from Birla Institute of Technology, Mesra and his MS in computer and information science from Indiana University–Purdue University Indianapolis.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
Clan code is extremely essential to build scalable application which can be maintained quite easily and improved further
Slide was prepared with contribution to my colleague
and i thank them for the help!
Presentation from Agile Base Camp 2 conference (Kiev, May 2010) and AgileDays'11 (Moscow, March 2011) about one of the most useful engineering practices from XP world.
European small and medium-size enterprises (SMEs) as well as large corporations--whether already operating internationally or seeking to branch out to other countries and markets-- face multiple constraints to engage in trade abroad and to localise their products and services to other countries, mainly as a consequence of legal and language barriers.
Lynx will provide more effective ways of accessing huge amounts of digital regulatory compliance documents, including legislation, case law, standards, industry norms and best practices. In particular, this solution envisages an ecosystem of smart cloud services to better manage compliance documents, based on a Legal Knowledge Graph which integrates and links heterogeneous compliance data sources.
Testing is fundamental in software development. Quality gates demand high coverage levels, pull requests need sufficient tests, leading to teams spending considerable time writing and maintaining them. But are we using our tests to their full potential?
'If code is hard to test, the design can be improved'. Starting from this mantra, this deep-dive session unveils hints to simplify code, break-down complexity, and effectively use functional programming. We'll delve into topics like fixture creep, partial mocks, onion architecture, and pure functions, providing numerous best practices and practical tips for your testing.
Be warned: This session may significantly disrupt your work routine and will likely change how you see testing. Attend at your own risk.
Slide ini menjelaskan perihal penggunaan komentar yang baik dan buruk pada suatu kode program. Slide ini merupakan bahan ajar untuk mata kuliah Clean Code dan Design Pattern.
A discussion on using AI for extracting/ representing search intent for e-commerce queries. Presented at MICES 2021
About The Author:
Aritra Mandal is an applied researcher on the search team at eBay. He focuses on query understanding and is leveraging AI/ML, structured data, and knowledge graphs to improve the search engine for e-commerce marketplace.
Aritra received his B.Eng in computer science from Birla Institute of Technology, Mesra and his MS in computer and information science from Indiana University–Purdue University Indianapolis.
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
This presentation delves into the world of Natural Language Processing (NLP), exploring its goal to make human language understandable to machines. The complexities of language, such as ambiguity and complex structures, are highlighted as major challenges. The talk underscores the evolution of NLP through deep learning methodologies, leading to a new era defined by large-scale language models. However, obstacles like low-resource languages and ethical issues including bias and hallucination are acknowledged as enduring challenges in the field. Overall, the presentation provides a condensed, yet comprehensive view of NLP's accomplishments and ongoing hurdles.
DETECTION AND REFACTORING OF BAD SMELL CAUSED BY LARGE SCALEijseajournal
Bad smells are signs of potential problems in code. Detecting bad smells, however, remains time
consuming for software engineers despite proposals on bad smell detection and refactoring tools. Large
Class is a kind of bad smells caused by large scale, and the detection is hard to achieve automatically. In
this paper, a Large Class bad smell detection approach based on class length distribution model and
cohesion metrics is proposed. In programs, the lengths of classes are confirmed according to the certain
distributions. The class length distribution model is generalized to detect programs after grouping.
Meanwhile, cohesion metrics are analyzed for bad smell detection. The bad smell detection experiments of
open source programs show that Large Class bad smell can be detected effectively and accurately with this
approach, and refactoring scheme can be proposed for design quality improvements of programs.
Ch 6 only 1. Distinguish between a purpose statement, research pMaximaSheffield592
Ch 6 only
1. Distinguish between a purpose statement, research problem, and research questions.
2. What are major ideas that should be included in a qualitative purpose statement?
3. What are the major components of a quantitative purpose statement?
4. What are the major components of a mixed methods purpose statement?
Requirements Engineering (20 points)
In Chapter 4 of Software Engineering. Sommerville, Pearson, 2016 (10th edition), Sommerville discusses ethnography as a method for eliciting requirements.
1. Discuss two advantages and two disadvantages of an ethnographic approach. (5 points)
2. Suggest two contexts where ethnography might be a challenging method of requirements engineering. For each context, how would you recommend that your team elicit requirements? (15 points)
Design (20 points)Design patterns (5 points)
Which of the following statements is (are) true? Explain.
1. StudentsDatabase is the model, StudentsManager is the controller, and WebApplication is the view.
2. StudentsDatabase is the model, StudentsManager is the view, and WebApplication is the controller.
3. StudentsManager is the model, StudentsDatabase is the view, and StudentsManager is the controller.
4. This is not MVC, because StudentsManager must use a listener to be notified when the database changes.
(Credit: EPFL)Design task (15 points)
Suppose you are asked to design a time management and notetaking system to support (1) scheduling meetings; and (2) tracking the documents associated with those meetings (e.g. agendas, presentations, meeting minutes).[footnoteRef:1] The system should accommodate [1: Such a feature seems like an inevitable development in any messaging platform…]
Make reasonable assumptions as needed.
1. Create a use case for “Schedule meeting”. You might follow the style in Sommerville Figure 7.3. (5 points)
2. Identify the objects in your system. Represent them using a structural diagram showing the associations between objects (“Class diagram” – cf. Sommerville Figure 5.9). (5 points)
3. Draw a sequence diagram showing the interactions between objects when a group of people are arranging a meeting (cf. Sommerville Figure 5.15). (5 points)
1. Implementation (20 points)
Consider the software package is-positive.[footnoteRef:2] Examine its source code (see index.js) and its test suite (see test.js), then complete these questions. [2: https://www.npmjs.com/package/is-positive]
1. Describe the API surface of this package. (2 points)
2. Describe how you would test this package. Describe how and why your approach would change if you maintained a similar package in a different programming language of your choice. (2 points)
3. According to npmjs.com, this package receives over 16,000 downloads each month.
a. Why might an engineer choose to use this package? (4 points)
b. Why might an engineer choose not to use this package? (You may find insights from the chapter ab ...
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
Maintaining the quality of the software is the major challenge in the process of software development.
Software inspections which use the methods like structured walkthroughs and formal code reviews involve
careful examination of each and every aspect/stage of software development. In Agile software
development, refactoring helps to improve software quality. This refactoring is a technique to improve
software internal structure without changing its behaviour. After much study regarding the ways to
improve software quality, our research proposes an object oriented software metric tool called
“MetricAnalyzer”. This tool is tested on different codebases and is proven to be much useful.
Presentation at the Houston Java Users Group on cutting edge / state of the art tools that help detect and discover refactoring opportunities and suggest how to fix the problems that are fount.
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
A code smell is an indication in the source code that hypothetically indicates a design problem in the equivalent software. The Code smells are certain code lines which makes problems in source code. It also means that code lines are bad design shape or any code made by bad coding practices. Code smells are structural characteristics of software that may indicates a code or drawing problem that makes software hard to evolve and maintain, and may trigger refactoring of code. In this paper, we proposed some success issues for smell detection tools which can assistance to develop the user experience and therefore the acceptance of such tools. The process of detecting and removing code smells with refactoring can be overwhelming.
Development Emails Content Analyzer: Intention Mining in Developer DiscussionsSebastiano Panichella
Written development communication (e.g. mailing lists, issue trackers) constitutes a precious source of information to build recommenders for software engineers, for example aimed at suggesting experts, or at redocumenting existing source code. In this paper we propose a novel, semi-supervised approach
named DECA (Development Emails Content Analyzer) that uses Natural Language Parsing to classify the content of development emails according to their purpose (e.g. feature request, opinion asking, problem discovery, solution proposal, information giving etc), identifying email elements that can be used for specific tasks.
A study based on data from Qt and Ubuntu, highlights a high precision (90%) and recall (70%) of DECA in classifying email content, outperforming traditional machine learning strategies. Moreover, we successfully used DECA for re-documenting sourcecode of Eclipse and Lucene, improving the recall, while keeping high precision, of a previous approach based on ad-hoc heuristics.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Deep learning based code smell detection - Qualifying Talk
1. IEEE Transactions on Software Engineering, 2019
Deep Learning based
Code Smell Detection
Hui Liu, Jiahao Jin, Zhifeng Xu, Yanzhen Zou, Yifan Bu and Lu Zhang
Presented By: Sayed Mohsin Reza PhD Student, CS, UTEP
Paper DOI: https://doi.org/10.1109/TSE.2019.2936376
2. What is Code Smell?
• A code smell is a surface indication that usually
corresponds to a deeper/inner problem in the software
• Code smells suggest the possibility of refactoring
• Software refactoring is an effective means to improve
software quality
• Examples: Feature Envy, Large Class, Long Method etc.
2
Figure: Example of a Code Smell:
Duplicate Code
3. Introduction
• Background: Code Smells are needed to fix to improve the
software quality
• Problem: manual identification of code smells is challenging
and tedious
• Solution: Use Deep Learning Technique to identify code
smells
• Motivation: Unmaintained code increase actual cost of
development over time and identifying code smells can
reduce the cost
3
Figure: Technical debt arises for
unmaintained code2
2 Falon Fatemi, "Technical Debt: The Silent Company Killer ", Forbes Magazine Report
4. Objective
• Primary: Identifying following code smells using deep learning technique
1. Feature Envy
2. Long Method
3. God Class
4. Misplaced Class
• Secondary: Provide suggestions of possible refactoring opportunities..
4
5. Selected Code Smell Definition
1. Feature Envy - when a method uses more features (i.e.,
fields and methods) of another class than of its own
2. Long Method – when a method has too much
functionalities and too much coding
3. Large Class – when a class is doing too much and
containing too much code
4. Misplaced Class – when a class is belonging to one
package whereas but better fit in another package
Figure: Examples of Code Smells
(2) Long Method (3) Large Class
(1) Feature Envy
5
6. Proposed Approach
Download Repositories
from corpus website1
Step 1
1 http://qualitascorpus.com/, curated collection of software systems intended to be used for empirical studies of
code artefacts
Generation of
Training Data
Step 2
Labelled Code
Smells
Deep Learning
Techniques
Step 3
Model
TrainingPhaseTestingPhase
Provide new software
repository
Step 4
Generation of
Testing Data
Step 5
Classify the code
smells
• God Class
• Long Method
• Feature Envy
• Misplaced Class
6
7. Research
Questions
(RQs)
• Does the proposed approach
outperform the state-of-the-art
approaches in identifying code smells?
• feature envy (RQ1)
• long methods (RQ3)
• large class (RQ4)
• misplaced class (RQ5)
• Is the proposed approach accurate in
recommending destinations (target
classes) for methods associated with
feature envy (RQ2), target packages for
misplaced class (RQ6)?
7
8. Subject Codebases
• Consider Open-source applications (Java only)
• Facilitates researchers to repeat the
evaluation
• Popular and high-quality codebases
• Development is involved for more than 5
years
Table: Subject Codebases3
83 http://qualitascorpus.com/
9. Generation of Training Data
Look for suggested refactoring
opportunities after uploading in IDE using
Eclipse JDT refactoring tool
Step 1 Step 2
Import projects into
Eclipse JDT IDE
Step 3
Generate Training Data
File Name Method
name
Refactoring Code
Smell
...
Model.java login Move
method
Feature
envy
...
Model.java login Inline
method
Long
method
...
... ... ... ... ...
Model.java Model Extract
class
Large
class
...
Model.java Model Move Class Misplac
ed Class
...
9
10. Training Dataset Structure
Feature Sets Target Sets
File Name Enclosing
Class (ec)
Target Class (tc) Method
(m)
Dist(m,ec) Dist(m,tc) Line of
Code
(LOC)
Cohesion
(COH)
… Number of
Accessed
variables
(NOAV)
Number of
Public
attributes
(NOPA)
Number of
Methods
(NOM)
Feature
Envy
Long
Method
Large
Class
Misplaced
Class
Model.java Model Model3,model4 login 10 ... 0 0 1 0
Model2.java Model2 null login 20 ... 0 0 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Model3.java Model3 30 ... 1 1 0 0
Model4.java Model4 40 ... 1 0 0 0
Dataset is available at - J. Jin, Z. Xu, and Y. Bu. (2019) Deep Smell Detector. [Online]. Available:
https://github.com/liuhuigmail/DeepSmellDetection
10
Table: Deep Code Smell Dataset Structure
11. 1 Feature Envy Classifier
Classifier
Input Variables
1. Name(m) - method under investigation
2. Name(ec) - method or class name of
enclosing class
3. Name(tc) - Potential target class
4. Dist(m, ec) - distance between method
and enclosing class
5. Dist(m, tc) - distance between method
and target class
( See distance detailed formula in Appendix
A)
* ec= enclosing class, tc= target class, m =
method
Figure: Classifier for Feature Envy Detection
11
12. 2 Long Method Detection
Classifier
Input Variables
1. LOC(m) - Lines of Code
2. LCOM(m) - Lack of Cohesion of Methods
3. COH(m) - Cohesion of method
4. CC(m) - Class Cohesion
5. NOAV(m) – No of Accessed variables
6. CD(m) -Coupling Dispersion [measures
how much the coupling of the method
involves external classes]
7. MCN(m) - McCabe’s Cyclomatic Number
[measure the complexity of the method]
Figure: Classifier for Long Method Detection
12
13. 3 Large Class Classifier
Classifier
Input Variables
1. AFTD(c) - Access to foreign data
2. DCC(c) - Direct Class Coupling
3. DIT(c) - Depth of Inheritance Tree
4. TCC(c) - Tight Class Cohesion
5. LOCM(c) - Lack of cohesion methods
6. CAM(c) - Cohesion among methods
7. WMC(c) - Weighted method count
8. NOPA(c) - Number of public methods
9. NOAM (c) - Number of Access Method
10. NOA(c)- Number of Attributes
11. NOM(c) – Number of Methods
Figure: Classifier for Large Class
Detection
13
14. 4 Misplaced Class Classifier
Classifier
Input variables
1. Name(c) - name of class
2. Name(ep) - enclosing package
3. Name(tp) - target package
4. CBO(c, ep) - Coupling between objects
5. MPC (c,ep) – Max Message Passing
Coupling
* c= class, ep = enclosing package, tp = target
package
Figure: Classifier for Misplaced Class
Detection
14
EnclosingPackageTargetPackage
15. Evaluation
Compare the proposed approach against
• JDeodorant, a refactoring tool which can detect Feature Envy .
• DECOR, a tool for detecting Long Methods & Large Class .
• TACO, a textual-based technique to detect Misplaced Class.
Tool Demo Video: https://www.youtube.com/watch?v=LtH8uF0epV0 15
16. Evaluation Results: (1) Feature Envy
RQ1: Does the proposed approach outperform the state-of-the-art approaches in identifying feature envy?
Answer: The proposed approach significantly outperforms the state-of-the-art in identifying feature envy.
16
Table: Evaluation Results on Feature Envy Detection
Observations
• Average F1 score of proposed
approach is 51.91% whereas the
average F1 score of JDeodorant
is 24.51%.
• Average recall of proposed
approach is up to 88.11%
whereas Jdeodorant has 16.6%
17. Evaluation Results: (1) Feature Envy
Continue
RQ2: Is the proposed approach accurate in
recommending destinations (target classes) for methods
associated with feature envy?
Answer: The proposed approach is more accurate in
recommending destinations for feature envy methods. Table: Accuracy in Recommending Target Classes
17
Observation:
• Proposed approach is 27.25% more accurate than
JDeodorant in recommending destinations for smelly
methods.
18. Evaluation Results: (2) Long Methods
RQ3: Does the proposed approach outperform the state-of-the-art approaches in identifying long methods?
Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying long methods.
Table: Evaluation Results on Long Method Detection
18
Observations:
• Proposed approach
identifies most of the long
methods with average
recall 78.99% and F1
score 55.53%.
• DECOR improves
precision at the cost of
significant reduction in
recall.
19. Evaluation Results: (3) Large Class
RQ4: Does the proposed approach outperform the state-of-the-art approaches in identifying large class?
Result Summary: The proposed approach significantly outperforms the state-of-the-art in identifying large class.
Table: Evaluation Results on Large Class Detection
19
Observations:
• Proposed approach improves
recall (80.95%) significantly at
the cost of reduced precision
• Proposed approach
outperforms DÉCOR in F1
scores, MCC, and AUC
20. Evaluation Results: (4) Misplaced Class
RQ5: Does the proposed approach outperform the state-of-the-art approaches in identifying Misplaced class?
Answer: The proposed approach significantly outperforms the state-of-the-art in identifying Misplaced class.
Table: Evaluation Results on Misplaced Class Detection
20
Observations:
• Proposed approach
outperforms TACO in F1
Score, MCC, and AUC.
• Proposed approach
improves both precision
and recall significantly.
21. Evaluation Results: (4) Misplaced Class
RQ6: Is the proposed approach accurate in
recommending target packages for misplaced classes?
Answer: the proposed approach outperforms the
baseline in identifying misplaced classes, and it is
comparable to the baseline in recommending target
packages. Table: Accuracy in Recommending Target Packages
21
Observations:
1. Proposed approach results in greater number of
accepted recommendations
2. TACO is more accurate than the proposed
approach in recommending target packages
22. Conclusion & Future Work
• Proposed a deep learning-based approach to detect code smells
• Proposed a custom technique for creation of labeled training dataset
• Improve F-measure by 27.4% in feature envy detection, 15.11% in long method detection, 4.73% in
large class detection, and 48.18% in misplaced class detection
• Improves the state-of-the-art in software code smells detection
• Future works
• Detect additional categories of code smells: data clumps, lazy class etc
• Integration with IDE may benefit developers who are looking for refactoring opportunities
22
23. My Critic
• In evaluation, they use accuracy, recall, precision, and F1 scores. Other relevant and important
metrics should be included.
Suggestions: False Positive Rate (FPR) and False Negative Rate (FNR) can be included to show how
many false alarms the models generate
• A relatively small data set extracted from only 10 code repositories1.
Suggestion – include more codebases into the datasets
23
1 http://qualitascorpus.com/ , curated collection of software systems intended to be used for empirical studies of code artefacts
24. Questions
Summary
24
Download Repositories
from corpus website1
Step 1
Generation of
Training Data
Step 2
Labelled Code
Smells
Deep Learning
Techniques
Step 3
Model
TrainingPhaseTestingPhase
Provide new software
repository
Step 4
Generation of
Training Data
Step 5
Classify the code
smells
• God Class
• Long Method
• Feature Envy
• Misplaced Class
• The proposed approach is established a better technique in
Identifying code smells
• The proposed approach is successful in suggesting possible
refactoring opportunities
Figure: Proposed Approach
25. Appendix A - Distance metrics formula
1. If method m does not belong to Class C, the distance is computed as follows:
2. Otherwise, the distance is computed as follows:
Where , S = set of entities in method or class level
e = entity (attribute or method)
25
26. Appendix B - Performance Metrics
1. Accuracy is calculated as
2. Precision, recall and F1 Score is calculated as
3. Matthews Correlation Coefficient is calculated as
26
Editor's Notes
Hello everyone,
Thank you for joining qualifying talk.
I am Sayed Mohsin Reza and presenting my talk on “deep learning-based code smell detection”,. The paper was published in IEEE transaction in 2019
I have shared the link of this slide on the chat for your convenience
I am describing code smell little bit for those who are not familiar with this term.
A code smell ….
JUnit is a unit testing framework for the Java programming language.
CNN layer - filters = 128, kernel size = 1 and activation = tanh, dense =128 neurons
- The model employs binary crossentropy as the loss function.
- A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer
- A flatten layer collapses the spatial dimensions of the input into the channel dimension.
- CNN layer - filters = 128, kernel size = 1 and activation = tanh, dense =128 neurons
- The model employs binary crossentropy as the loss function.
- A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer
- Embedding layer – they convert words in identifiers into fixed length numerical vectors using word2vector package, a high-quality distributed vector representation
A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.
An LSTM layer above provides a sequence output rather than a single value output to the LSTM layer below
- Embedding layer – they convert words in identifiers into fixed length numerical vectors using word2vector package, a high-quality distributed vector representation
- A convolutional layer contains a set of filters whose parameters need to be learned.
- A Dense layer feeds all outputs from the previous layer to all its neurons, each neuron providing one output to the next layer
- A flatten layer collapses the spatial dimensions of the input into the channel dimension.
Recall is the number of smelly classes that predicted correctly in terms of the total number of actual smelly classes,
precision is the smelly classes predicted correctly in terms of the total number of predicted smelly classes
F1 Score is needed when you want to seek a balance between Precision and Recall.
- MCC- Matthews Correlation Coefficient - measure of the quality of binary (two-class) classifications,
- AUC - Area Under Curve