SlideShare a Scribd company logo
Dottorato in Scienze Computazionali e Informatiche, XXV Ciclo
Ph.D. Candidate:
Valerio Maggio
Thesis Advisors:
Dr. Sergio Di Martino
Dr. Anna Corazza
June 5th, 2013
UNSUPERVISED
MACHINE LEARNING
FOR SOFTWARE
MAINTENANCE
THESIS
G O A L
UNSUPERVISED
MACHINE LEARNING FOR
SOFTWARE MAINTENANCE
These solutions exploit (unsupervised) machine learning
techniques to mine information from the source code
Define and experimentally evaluate solutions (techniques and
prototype tools) for automatic software analysis to support
software maintenance activities.
There exist different types of
Software Maintenance.
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
Software Maintenance could account up to the
85-90% of the total software costs.
ISSUES
SOFTWARE
MAINTENANCE
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
ISSUES
SOFTWARE
MAINTENANCE
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
The documentation is usually
scarce or not up to date!
ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
The documentation is usually
scarce or not up to date!
ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
Reverse Engineering
The documentation is usually
scarce or not up to date!
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
• the application of the learning algorithms to the considered data
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
(+) No cost of labeling samples
(-) Trade-off imposed on the quality of the data
THESIS
C O N T R
I B U T I
O N S
&
THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
&
THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
1. SOFTWARE
RE-MODULARIZATION
3. CLONE
DETECTION
2. SOURCE CODE
NORMALIZATION
&
ISSUE
#1
SOFTWARE
RE-MODULARIZATION
Software Classes
UI Process
Components
UI
Components
Data Access
Components
Data Helpers /
Utilities
Security
Operational Management
Communications
Business
Components
Application Facade
Buisiness
Workflows
Messages
Interfaces
Service Interfaces
Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
PROBL
EM
S T A T E
M E N T
Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
External
Systems
Service Consumers
Services
Service Interfaces
Messages
Interfaces
Cross Cutting
Security
OperationalManagement
Communications
Data
Data Access
Components
Data Helpers /
Utilities
Presentation
UI
Components
UI Process
Components
Business
Application Facade
Buisiness
Workflows
Business
Components
Clusters of Software Classes
PROBL
EM
S T A T E
M E N T
SOURCECODE
LEXICALINFORMATION
SOURCECODE
LEXICALINFORMATION
SOURCECODE
LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code
to produce clusters of classes that are lexically related.
SOURCECODE
LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code
to produce clusters of classes that are lexically related.
State of the Art: Information Retrieval (IR) based approaches.
IRINDEXING
Implicit assumption: The “same” words are used whenever a
particular concept occurs
IRINDEXING
1. Tokenization
2. Normalization
draw, the, are,
null, handl,
box, r,
rectangl, g,
graphic,
box,
display, box,
...
Draws, the, are,
NullHandle,
box, r,
Rectangle, g,
Graphics,
box,
displayBox,
...
Implicit assumption: The “same” words are used whenever a
particular concept occurs
IRINDEXING
SOURCECODEZONES
LEXICALINFORMATION
SOURCECODEZONES
LEXICALINFORMATION
Class Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Method Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Parameter Names
Method Names
Parameter Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Parameter Names
Comments
Comments
Method Names
Parameter Names
Comments
SOURCECODEZONES
LEXICALINFORMATION
Source Code
Class Names
Attribute Names
Method Names
Parameter Names
Comments
Comments
Method Names
Parameter Names
Source Code
Comments
ZONEINDEXING
ZONEINDEXING
RQ1: Do terms in different Zones provide different contributions?
1. Tokenization
2. Normalization
ZONEINDEXING
Draws, the, are,
NullHandle,
box, r,
draw, g,
Graphics,
color,
displayBox,
...
draw, the, are,
null, handl,
box, r,
draw, g,
graphic,
color,
display, box,
...
RQ1: Do terms in different Zones provide different contributions?
SOURCECODELEXICON JFreeChart JEdit
JUnit Xerces
SOURCECODELEXICON JFreeChart
• Good lexicon in every Zone!
SOURCECODELEXICON JEdit
• Very good in Comments
• Very poor in Method and Parameter names
SOURCECODELEXICON JUnit
• Good in Class and Attribute names
• No Comments at all
SOURCECODELEXICON Xerces
• Poor in Method, Parameter names and Comments
• Good in Method Code and Variable names
SOURCECODELEXICON JFreeChart JEdit
JUnit Xerces
RQ2: How to automatically weight the different contributions ?
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
• A variant of the K-medoid clustering algorithm
• Tailored to the software clustering task
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
Extreme
Extreme
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
0
0.2
0.4
0.6
0.8
1.0
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
ISSUE
#2
SOURCE CODE
NORMALIZATION
SOURCECODE
LEXICALINFORMATION Implicit assumption: The “same” words are used whenever a
particular concept occurs
SOURCECODE
LEXICALINFORMATION Implicit assumption: The “same” words are used whenever a
particular concept occurs
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
null, handl
display, box
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
IDENTIFIERSSPLITTING
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• drawXORRect ==> drawXOR | Rect
IDENTIFIERSSPLITTING
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• drawXORRect ==> drawXOR | Rect
• drawxorrect ==> NO SPLIT
Splitting algorithms based on naming conventions
are not robust enough
IDENTIFIERSSPLITTING
Splitting algorithms based on naming conventions
are not robust enough
ABBREVIATIONSEXPANSION
Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
1. Tokenization
CODENORMALIZATION
2. Normalization
1.5 Code
Normalization
• AMAP (Hill and Pollock, 2008)
• SAMURAI (Enslen, et.al , 2011)
• TIDIER (Guerrouj, et.al , 2011)
• GenTest+Normalize (Lawrie and Binkley, 2011)
• LINSEN (Corazza, Di Martino and Maggio, 2012)
draw, the, are,
null, handl,
box,
rectangl,
graphic,
color,
display, box,
rectangl,
draw, xor,
rectangl
1. Tokenization
CODENORMALIZATION
2. Normalization
1.5 Code
Normalization
• AMAP (Hill and Pollock, 2008)
• SAMURAI (Enslen, et.al , 2011)
• TIDIER (Guerrouj, et.al , 2011)
• GenTest+Normalize (Lawrie and Binkley, 2011)
• LINSEN (Corazza, Di Martino and Maggio, 2012)
draw, the, are,
null, handl,
box,
rectangl,
graphic,
color,
display, box,
rectangl,
draw,xor,
rectangl
CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
• Based on an efficient String Matching technique:
Baeza-Yates&Perlberg Algorithm (BYP)
• Exploit different Sources of Information to find the matching
words
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
SKETCHOFTHEALGORITHM
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• Padding Arcs to ensure the Graph always connected
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SKETCHOFTHEALGORITHM
• Every Arc is Labelled with the corresponding dictionary word
• Weights represent the “cost” of each matching
• Cost function [c(“word”)] favors longest words and domain-
related information
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SKETCHOFTHEALGORITHM
• The final Mapping Solution corresponds to the sequence of labels
in the path with the minimum cost (Djikstra Algorithm)
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SPLITTING
Accuracy Rates for the
comparison with
DTW (Madani et. al 2010)
0
0.25
0.5
0.75
1
JhotDraw 5.1 Lynx 2.8.5
DTW LINSEN DTW LINSEN
DTW LINSEN
RE
SULTS
# 1
Accuracy Rates for the
comparison with GenTest
(Lawrie and Binkley, 2011)
0
0.25
0.5
0.75
1
which 2.20 a2ps 4.14
GenTest LINSEN
DTW O(n3)
Accuracy Rates for the
comparison with
AMAP (Hill and Pollock, 2008)
0
0.25
0.5
0.75
1
CW DL OO AC PR SL TOTAL (Avg)
AMAP
LINSEN
EXPANSION
RE
SULTS
# 2
CW: Combination Words
DL: Dropped Letters
OO: Others
AC: Acronyms
PR: Prefix
SL: Single Letters
ISSUE
#3
CLONE
DETECTION
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones Textual Similarity
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones Functional Similarity
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones affect the reliability of the system!
Sneaky Bug!
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Text Based Tools:
Text is compared line by line
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Token Based Tools:
Token sequences are
compared to sequences
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Syntax Based Tools:
Syntax subtrees are compared
to each other
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Graph Based Tools:
(sub) graphs are compared to
each other
STATEOFTHEARTTOOLS
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
• Typically applied in other field such as NLP or Bioinformatics
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Computation of the dot product between (Graph) Structures
K( ),
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
CODE
KERNELFORCLONES
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST
KERNELFORCLONES
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST AST KERNEL
KERNELFORCLONES
<
block
while
= =
block
=
y -
=
x +
+
x 1
-
y 1
<
x y
>
b 0 =
c 3
if
block
>
b a
-
b 1
<
block
while
+
a 1
=
b -
=
a +
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
IC = Conditional-Expr
I = Less-operator
C = Loop
Ls= [x,y]
IC = Loop
I = while-loop
C = Function-Body
Ls= [x, y]
Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
IC = Block
I = while-body
C = Loop
Ls= [ x ]
CLONE DETECTION
• Comparison with another (pure) AST-based clone detector
• Comparison on a system with randomly seeded clones
0
0.25
0.5
0.75
1
Precision Recall F-measure
CloneDigger Tree Kernel Tool
RE
SULTS
Results refer to clones where code
fragments have been modified by adding/
removing or changing code statements
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
FUTURE RESEARCH DIRECTIONS
• Re-modularization:
• Integrate code normalization algorithm (LINSEN) to lexical analysis
• Code Normalization:
• Investigate the contributions of different sources of information
used for disambiguations
• Clone Detection:
• Combine Static (and Kernel methods) with Dynamic Analysis
THANK YOU
June 5th, 2013 - Ph.D. Defense
Valerio Maggio
Ph.D. Student, University of Naples “Federico II”
valerio.maggio@unina.it

More Related Content

What's hot

Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
Tao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...
SC CTSI at USC and CHLA
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software Engineering
Mehdi Mirakhorli
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
prashant 100702007
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
Tao Xie
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram Idf
Hideaki Hata
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
Israel Herraiz
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
inside-BigData.com
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
Aly Abdelkareem
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
IJECEIAES
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
Tao Xie
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
Isabelle Augenstein
 
Requirementv4
Requirementv4Requirementv4
Requirementv4stat
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
Kuppusamy P
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
Manoj895639
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
Pvrtechnologies Nellore
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 

What's hot (20)

Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software Engineering
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram Idf
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Requirementv4
Requirementv4Requirementv4
Requirementv4
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 

Viewers also liked

Unit testing with Junit
Unit testing with JunitUnit testing with Junit
Unit testing with Junit
Valerio Maggio
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detection
Valerio Maggio
 
Machine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple TroubleshootingMachine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple Troubleshooting
CJ F.
 
Junit
JunitJunit
LINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviationsLINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviations
Valerio Maggio
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
Krzysztof Szafranek
 
Unit testing and scaffolding
Unit testing and scaffoldingUnit testing and scaffolding
Unit testing and scaffolding
Valerio Maggio
 
Junit
JunitJunit
Maintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of MachineMaintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of Machine
Rohit Sorte
 
Basic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance TipsBasic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance Tips
Whitney Mike Segura
 
Database Refactoring
Database RefactoringDatabase Refactoring
Database Refactoring
Anton Keks
 
Maintenance Management
Maintenance ManagementMaintenance Management
Maintenance Management
Bisina Keshara
 

Viewers also liked (13)

Unit testing with Junit
Unit testing with JunitUnit testing with Junit
Unit testing with Junit
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detection
 
Machine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple TroubleshootingMachine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple Troubleshooting
 
Junit
JunitJunit
Junit
 
LINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviationsLINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviations
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
 
Unit testing and scaffolding
Unit testing and scaffoldingUnit testing and scaffolding
Unit testing and scaffolding
 
Junit
JunitJunit
Junit
 
Maintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of MachineMaintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of Machine
 
Basic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance TipsBasic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance Tips
 
Database Refactoring
Database RefactoringDatabase Refactoring
Database Refactoring
 
Sap plant maintenance
Sap plant maintenanceSap plant maintenance
Sap plant maintenance
 
Maintenance Management
Maintenance ManagementMaintenance Management
Maintenance Management
 

Similar to Improving Software Maintenance using Unsupervised Machine Learning techniques

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Software Development Life Cycle
Software Development Life Cycle Software Development Life Cycle
Software Development Life Cycle
Dr. Ranjan Kumar Mishra
 
sadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdfsadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdf
shoukatali154717
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
praveena06
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
ijdmtaiir
 
Intro
IntroIntro
Intro
hinaaaa123
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern Hardware
Winstina Kennedy
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural designHiren Selani
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
Tao Xie
 
Over view of system analysis and design
Over view of system analysis and designOver view of system analysis and design
Over view of system analysis and design
Saroj Dhakal
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Ali Alkan
 
CASE tools and their effects on software quality
CASE tools and their effects on software qualityCASE tools and their effects on software quality
CASE tools and their effects on software quality
Utkarsh Agarwal
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
System engineering
System engineeringSystem engineering
System engineering
Lisa Elisa
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
H1803044651
H1803044651H1803044651
H1803044651
IOSR Journals
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
HPCC Systems
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 

Similar to Improving Software Maintenance using Unsupervised Machine Learning techniques (20)

Introduction
IntroductionIntroduction
Introduction
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Development Life Cycle
Software Development Life Cycle Software Development Life Cycle
Software Development Life Cycle
 
sadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdfsadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdf
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
 
Intro
IntroIntro
Intro
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern Hardware
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural design
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Over view of system analysis and design
Over view of system analysis and designOver view of system analysis and design
Over view of system analysis and design
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
CASE tools and their effects on software quality
CASE tools and their effects on software qualityCASE tools and their effects on software quality
CASE tools and their effects on software quality
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
System engineering
System engineeringSystem engineering
System engineering
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
H1803044651
H1803044651H1803044651
H1803044651
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 

More from Valerio Maggio

Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
Valerio Maggio
 
Clone detection in Python
Clone detection in PythonClone detection in Python
Clone detection in Python
Valerio Maggio
 
A Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detectionA Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detection
Valerio Maggio
 
Refactoring: Improve the design of existing code
Refactoring: Improve the design of existing codeRefactoring: Improve the design of existing code
Refactoring: Improve the design of existing code
Valerio Maggio
 
Scaffolding with JMock
Scaffolding with JMockScaffolding with JMock
Scaffolding with JMock
Valerio Maggio
 
Junit in action
Junit in actionJunit in action
Junit in action
Valerio Maggio
 
Design patterns and Refactoring
Design patterns and RefactoringDesign patterns and Refactoring
Design patterns and Refactoring
Valerio Maggio
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
Valerio Maggio
 
Web frameworks
Web frameworksWeb frameworks
Web frameworks
Valerio Maggio
 

More from Valerio Maggio (9)

Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Clone detection in Python
Clone detection in PythonClone detection in Python
Clone detection in Python
 
A Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detectionA Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detection
 
Refactoring: Improve the design of existing code
Refactoring: Improve the design of existing codeRefactoring: Improve the design of existing code
Refactoring: Improve the design of existing code
 
Scaffolding with JMock
Scaffolding with JMockScaffolding with JMock
Scaffolding with JMock
 
Junit in action
Junit in actionJunit in action
Junit in action
 
Design patterns and Refactoring
Design patterns and RefactoringDesign patterns and Refactoring
Design patterns and Refactoring
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
Web frameworks
Web frameworksWeb frameworks
Web frameworks
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Improving Software Maintenance using Unsupervised Machine Learning techniques