Icsm19.ppt

Deep Learning Anti-patterns from Code Metrics
History
35th IEEE International Conference on Software Maintenance and Evolution
September 30th - October 4th 2019
Cleveland, OH USA
Antoine Barbez Foutse Khomh Yann-Gaël Guéhéneuc

Problem
Definition
Anti-patterns
"structures in the design that indicate violation of fundamental design
principles and negatively impact design quality"
"Certain structures in the code that suggest (sometimes they scream for)
the possibility of refactoring."
Suryanarayana et al. (2014)
Fowler (1999)
1/16

Problem
Definition
Detection
- Rely on structural metrics, e.g., LOC, Cyclomatic Complexity …
- Computed for each code component to be classified
1. Structural Anti-patterns Detection
2/16

Problem
Definition
Detection
Example 1: Rule-based approaches
Lanza and Marinescu (2007)
2/16

Problem
Definition
Detection
Example 1: Rule-based approaches
Example 2: Machine-learning-based approaches
2/16

Problem
Definition
Detection
- Anti-patterns affect how source code evolve over time when
changes are applied to the system
- Rely on an analysis of co-changes occurring between code
components
2. Historical Anti-patterns Detection
3/16

Problem
Definition
Detection
- Anti-patterns affect how source code evolve over time when
changes are applied to the system
- Rely on an analysis of co-changes occurring between code
components
Example: HIST (Historical Information for Smell deTection)
2. Historical Anti-patterns Detection
Palomba et al. (2013)
3/16

Problem
Definition
Limitations
Structural and historical detection techniques are
complementary.
HIST does not take into account the structural properties of
the changes.
4/16

Approach Convolutional Analysis of code Metrics Evolution
Main idea:
- Analyze the history of source code metrics
- Use a Convolutional Neural Network to perform
classification
Relies on structural and historical information
Processes changes at a code-level granularity
A deep-learning-based approach
5/16

Approach Input: example
Let’s compute the history of the class Dog
for three metrics:
• Number of Methods Declared (NMD)
• Number of Attributes Declared (NAD)
• Lines Of Code (LOC)
With an history length of Lh = 10
6/16

NMD NAD LOC
2
Master (commit N)
7/16

NMD NAD LOC
2 2
Master (commit N)
7/16

NMD NAD LOC
2 2 6
Master (commit N)
7/16

NMD NAD LOC
2 2 6
2 2 6
Commit N - 1
7/16

NMD NAD LOC
2 2 6
2 2 6
1 2 6
Commit N - 2
7/16

NMD NAD LOC
2 2 6
2 2 6
1 2 6
1 1 3
Commit N - 3
7/16

NMD NAD LOC
2 2 6
2 2 6
1 2 6
1 1 3
1 1 3
Commit N - 4
7/16

NMD NAD LOC
2 2 6
2 2 6
1 2 6
1 1 3
1 1 3
1 1 3
Commit N - 5
7/16

404
File not found
NMD NAD LOC
2 2 6
2 2 6
1 2 6
1 1 3
1 1 3
1 1 3
0 0 0
0 0 0
0 0 0
0 0 0
Commit N - 6
7/16

Study Design God Class
« … one object with a lion’s share of the responsibilities, while most
other objects only hold data or execute simple processes. »
Brown et al. (1998)
9/16

Study Design God Class
« … one object with a lion’s share of the responsibilities, while most
other objects only hold data or execute simple processes. »
Brown et al. (1998)
Selected metrics:
• ATFD (Access To Foreign Data) • NADC (Number of Associated Data
Classes)
• LCOM5 (Lack of COhesion in
Methods)
• NMD (Number of Methods Declared)
• LOC (Lines Of Code) • WMC (Weighted Method Count)
• NAD (Number of Attributes
Declared)
9/16

Study Design Studied Systems
System #Class #God Class
Android Opt Telephony 192 10
Android Support 109 4
Apache Ant 694 7
Apache Lucene 155 3
Apache Tomcat 925 5
Apache Xerces 512 15
ArgoUML 1230 22
Jedit 423 5
Total 4240 71
10/16

Evaluation
Apache Ant 694 7
Apache Lucene 155 3
Apache Tomcat 925 5
ArgoUML 1230 22
Jedit 423 5
Total 4240 71
10/16

Evaluation
Training
&
Tuning
Apache Ant 694 7
Apache Lucene 155 3
Apache Tomcat 925 5
ArgoUML 1230 22
Jedit 423 5
Total 4240 71
10/16

Study 1 Definition
RQ1: To what extent historical values of source code metrics can
improve detection performances?
Approach: Monitor the performances achieved by CAME with different
length of metrics history: Lh ∈ {1, 10, 50, 100, 250, 500, 1000}
For each value of Lh:
- Perform hyper-parameters tuning
- Build and train 10 distinct CNNs
- Retrieve mean and std of precision, recall and F-measure
11/16

Study 1 Results
RQ1: To what extent historical values of source code metrics can
improve detection performances?
12/16

Study 2 Definition
RQ2: How does CAME compare to other static ML algorithms?
• Decision Tree
• Multi Layer Perceptron (MLP)
• Support Vector Machine (SVM)
13/16

Study 2 Definition
RQ3: How does CAME compare to existing detection techniques?
• Decision Tree
• Multi Layer Perceptron (MLP)
• Support Vector Machine (SVM)
• DECOR Moha et al. (2010)
• HIST Palomba et al. (2013)
• JDeodorant Fokaefs et al. (2011)
13/16

Study 2 Results
Approaches Precision Recall F-measure
Decision Tree 68 % 29 % 40 %
MLP 41 % 86 % 56 %
SVM 68 % 14 % 24 %
CAME 71 % 86 % 77 %
14/16

Study 2 Results
Approaches Precision Recall F-measure
DECOR 24 % 36 % 29 %
HIST 20 % 43 % 27 %
JDeodorant 4 % 57 % 8 %
CAME 71 % 86 % 77 %
RQ3: How does CAME compare to existing detection techniques?
15/16

Study 2 References
G. Suryanarayana, G. Samarthyam, T. Sharma, Refactoring for Software Design Smells:
Managing Technical Debt, Morgan Kaufmann, 2014.
W. J. Brown, R. C. Malveau, W. H. Brown, H. W. McCormick III, et T. J. Mowbray, Anti
Patterns: Refactoring Software, Architectures, and Projects in Crisis, 1st éd. John Wiley and
Sons, March 1998.
M. Fowler, Refactoring: Improving the Design of Existing Code. Boston, MA,
USA:Addison-Wesley, 1999.
F. Palomba, G. Bavota, M. D. Penta, R. Oliveto, A. D. Lucia, et D. Poshyvanyk, “Detecting
bad smells in source code using change history information.” dans ASE, 2013, pp. 268–278.
N. Moha, Y. Guéhéneuc, D. Laurence, et L. M. Anne-Franccoise, “Decor: A method for
the specification and detection of code and design smells”, IEEE Transactions on Software
Engineering (TSE), vol. 36, no. 1, pp. 20–36, 2010.
M. Fokaefs, N. Tsantalis, E. Stroulia, et A. Chatzigeorgiou, “Jdeodorant: identification and
application of extract class refactorings”, dans Software Engineering (ICSE), 2011 33rd
International Conference on. IEEE, 2011, pp. 1037–1039.
M. Lanza et R. Marinescu, Object-oriented metrics in practice: using software metrics to
characterize, evaluate, and improve the design of object-oriented systems. Springer Science
& Business Media, 2007. 16/16

Icsm19.ppt

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Icsm19.ppt

Similar to Icsm19.ppt (20)

More from Yann-Gaël Guéhéneuc

More from Yann-Gaël Guéhéneuc (20)

Recently uploaded

Recently uploaded (20)

Icsm19.ppt