This document discusses unsupervised machine learning techniques for clone detection in source code. It begins by defining different types of code clones and describing current state-of-the-art clone detection tools. It then argues that machine learning approaches, such as using kernel methods to compare abstract syntax trees, can provide more computationally efficient and accurate clone detection compared to traditional text-, token-, and syntax-based techniques. The document provides examples of using kernel functions to compute similarities between code structure representations like ASTs to enable unsupervised machine learning for clone detection.
"Clone detection in Python": Slides presented at EuroPython 2012
Clone Detection in Python highlights the topic of code duplication detection using Machine Learning techniques.
Some examples on Python code duplications and C-Python implementation duplications are reported as well.
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...Kamiya Toshihiro
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
"Clone detection in Python": Slides presented at EuroPython 2012
Clone Detection in Python highlights the topic of code duplication detection using Machine Learning techniques.
Some examples on Python code duplications and C-Python implementation duplications are reported as well.
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...Kamiya Toshihiro
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
This course teaches engineering students how to program in C. I gave this course for several years in the framework of the "Advanced Technology Higher Education Network" / SOCRATES program.
C Programming Projects -
1. Sort an array in ascending order.
2. Display sum of all odd values stored in an array.
3. Display number of even values stored in an array.
--
1. A file name is command line argument. Display the contents of the file where each word will be displayed on a new line. Display proper message if file does not exist.
2. Display no. of vowels stored in the file.
3. Display no. of “the” stored in the file.
4. Copy contents of the file to another file.
The root of all modern language is ALGOL (Algorithmic Language), introduced in the early 1969s. ALGOL was the first computer language to use a block structure. In 1967, Martin Richards developed a language called BCPL (Basic Combined Programming Language) primarily for writing system software. In 1970, Ken Thompson created a language using main features of BCPL and called it simply B. B was used to create early version of UNIX operating system at Bell Laboratories. C was evolved from ALGOL, BCPL and B by Dennis Ritchie at AT & T’s Bell Laboratories in 1972 for use on the UNIX operating system. It has since spread to many other operating systems, and is now one of the most widely used programming languages.
An introduction to the C programming language for the students of the course "HJ-82 Ontwerpen voor de optie multimedia en signaalverwerking: seminaries", taught by the authors at the Catholic University of Leuven.
Introduction to C Language - Version 1.0 by Mark John LadoMark John Lado, MIT
The C programming language is a general-purpose, high – level language (generally denoted as structured language). C programming language was at first developed by Dennis M. Ritchie at At&T Bell Labs.
C is one of the most commonly used programming languages. It is simple and efficient therefore it becomes best among all. It is used in all extents of application, mainly in the software development.
Many software's & applications as well as the compilers for the other programming languages are written in C also Operating Systems like Unix, DOS and windows are written in C.
C has many powers, it is simple, stretchy and portable, and it can control system hardware easily. It is also one of the few languages to have an international standard, ANSI C.
Improving Software Maintenance using Unsupervised Machine Learning techniquesValerio Maggio
"Improving Software Maintenance using Unsupervised Machine Learning techniques": Ph.D. defence presentation.
Unsupervised Machine Learning techniques have been used to face different software maintenance issues such as Software Modularisation and Clone detection.
This course teaches engineering students how to program in C. I gave this course for several years in the framework of the "Advanced Technology Higher Education Network" / SOCRATES program.
C Programming Projects -
1. Sort an array in ascending order.
2. Display sum of all odd values stored in an array.
3. Display number of even values stored in an array.
--
1. A file name is command line argument. Display the contents of the file where each word will be displayed on a new line. Display proper message if file does not exist.
2. Display no. of vowels stored in the file.
3. Display no. of “the” stored in the file.
4. Copy contents of the file to another file.
The root of all modern language is ALGOL (Algorithmic Language), introduced in the early 1969s. ALGOL was the first computer language to use a block structure. In 1967, Martin Richards developed a language called BCPL (Basic Combined Programming Language) primarily for writing system software. In 1970, Ken Thompson created a language using main features of BCPL and called it simply B. B was used to create early version of UNIX operating system at Bell Laboratories. C was evolved from ALGOL, BCPL and B by Dennis Ritchie at AT & T’s Bell Laboratories in 1972 for use on the UNIX operating system. It has since spread to many other operating systems, and is now one of the most widely used programming languages.
An introduction to the C programming language for the students of the course "HJ-82 Ontwerpen voor de optie multimedia en signaalverwerking: seminaries", taught by the authors at the Catholic University of Leuven.
Introduction to C Language - Version 1.0 by Mark John LadoMark John Lado, MIT
The C programming language is a general-purpose, high – level language (generally denoted as structured language). C programming language was at first developed by Dennis M. Ritchie at At&T Bell Labs.
C is one of the most commonly used programming languages. It is simple and efficient therefore it becomes best among all. It is used in all extents of application, mainly in the software development.
Many software's & applications as well as the compilers for the other programming languages are written in C also Operating Systems like Unix, DOS and windows are written in C.
C has many powers, it is simple, stretchy and portable, and it can control system hardware easily. It is also one of the few languages to have an international standard, ANSI C.
Improving Software Maintenance using Unsupervised Machine Learning techniquesValerio Maggio
"Improving Software Maintenance using Unsupervised Machine Learning techniques": Ph.D. defence presentation.
Unsupervised Machine Learning techniques have been used to face different software maintenance issues such as Software Modularisation and Clone detection.
Refactoring: Improve the design of existing codeValerio Maggio
Refactoring: Improve the design of existing code
Software Engineering class on main refactoring techniques and bad smells reported in the famous Fawler's book on this topic!
Adam Culp will talk about refactoring code. (The practice of altering code to make it cleaner, simpler, and sometimes faster, while not sacrificing functionality.) We all hate to do it, but it is a necessary evil. So lets talk about how to do it better. Adam will discuss: When to refactor. How to refactor. Why to refactor. How a refactor can help us write better code in the future. A common methodology and steps to follow while refactoring. Resources to help us all on our refactor journey.
RubyConf Portugal 2014 - Why ruby must go!Gautam Rege
In this talk, I take the audience through ha whirlwind tour of Golang for Rubyists. I also discuss things like "Programmer Awareness", what can Rubyists learn from Go and how they can co-exist.
Parse::Eyapp is a collection of modules
that extends Francois Desarmenien Parse::Yapp 1.05.
Eyapp extends yacc/yapp syntax with
functionalities like named attributes,
EBNF-like expressions, modifiable default action,
automatic abstract syntax tree building,
dynamic conflict resolution,
translation schemes, tree regular expressions,
tree transformations, scope analysis support,
and directed acyclic graphs among others.
This article teaches you the basics of
Compiler Construction using Parse::Eyap to
build a translator from infix expressions to Parrot
Intermediate Representation.
"Objects validation and comparison using runtime types (io-ts)", Oleksandr SuhakFwdays
A common task in modern JS is parsing, validating and then comparing JSON objects. In this talk I will quickly go through most common ways to parse/validate and compare objects we use today and then focus more on how runtime types (based on io-ts) can help make such tasks easier and quicker to implement.
OSCON2014 : Quick Introduction to System Tools Programming with GoChris McEniry
OSCON2014 Tutorial : Quick Introduction to System Tools Programming with Go
Every day, sysadmins are required to work with tools which, while powerful, tend to need to be merged with other languages. Many times, tools are chained together out of necessity by attempting to parse and munge outputs to inputs by making many inefficient calls to string processing command line tools. While effective, this approach is not usable in many situations due to the frequency of the run of the operation, or due to the footprint of the solution. Sometimes, it’s necessary to build your own tool, and the busy sysadmin needs one that is readily available.
Go’s powerful yet simple language makes it an excellent tool for harried sysadmins with limited time. Go’s toolset makes it easy to build and deploy simple small tools to targeted situations.
This tutorial provides an introduction to Go with a focus on using it for everyday sysadmins tooling. A example of working from iostat is used to show a practical approach to learning the language.
Similar to Unsupervised Machine Learning for clone detection (20)
"Number Crunching in Python": slides presented at EuroPython 2012, Florence, Italy
Slides have been authored by me and by Dr. Enrico Franchi.
Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.
Machine Learning for Software MaintainabilityValerio Maggio
"Machine Learning for Software Maintainability":
Slides presented at the Joint Workshop on Intelligent Methods for Software System Engineering (JIMSE) 2012
LINSEN an efficient approach to split identifiers and expand abbreviationsValerio Maggio
"Linsen an efficient approach to split identifiers and expand abbreviations"
Slides presented at the International Conference of Software Maintenance (ICSM) 2012, Riva del Garda (TN), Italy
A Tree Kernel based approach for clone detectionValerio Maggio
"A Tree Kernel based approach for clone detection"
Slides presented at the International Conference of Software Maintenance (ICSM) 2010 - TImsoara, Romania
Software Engineering Class on (main) Scaffolding techniques using JMock Java Framework.
Slides report some working examples and highlight differences among Mock Objects, Test Double and Test Stubs.
Lecture of Software Engineering II
University of Naples Federico II
Main Topics:
- What is TDD
- TDD and XP
- TDD Mantra
- TDD Principles and Practices
Lecture of Software Engineering II - University of Naples Federico II Main Topics:
- Testing Taxonimes
- Unit Testing with JUnit 4.x
- Testing Scaffolding and Mocking
- JMock
Web frameworks: web development done right.
Lecture of Web Technologies, University of Naples Federico II
Main topics:
- Evolution of Web Technolgies
- Web frameworks and Design Principles
- Django and Google App engine: web frameworks in Python
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. General Disclaimer:
All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not
responsible for any possible disease or “brain consumption” caused by too much formulas.
So BEWARE; use this information at your own risk!
It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or
health professional.
AwfulMaths
3. Number one in the stink parade is duplicated code.
If you see the same code structure in more than one
place, you can be sure that your program will be better
if you find a way to unify them.
7. PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Software clones are fragments of code that are similar according
to some predefined measure of similarity
I.D. Baxter, 1998
13. THE ORIGINAL ONE
# Original Fragment
def do_something_cool_in_Python(filepath, marker='---end---'):
! lines = list()
! with open(filepath) as report:
! ! for l in report:
! ! ! if l.endswith(marker):
! ! ! ! lines.append(l) # Stores only lines that ends with "marker"
! return lines #Return the list of different lines
14. TYPE 1: Exact Copy
• Identical code segments except for differences in
layout, whitespace, and comments
15. def do_something_cool_in_Python (filepath, marker='---end---'):
! lines = list() # This list is initially empty
! with open(filepath) as report:
! ! for l in report: # It goes through the lines of the file
! ! ! if l.endswith(marker):
! ! ! ! lines.append(l)
! return lines
TYPE 1: Exact Copy
• Identical code segments except for differences in
layout, whitespace, and comments
# Original Fragment
def do_something_cool_in_Python(filepath, marker='---end---'):
! lines = list()
! with open(filepath) as report:
! ! for l in report:
! ! ! if l.endswith(marker):
! ! ! ! lines.append(l) # Stores only lines that ends with "marker"
! return lines #Return the list of different lines
16. TYPE 2: Parameter Substituted
• Structurally identical segments except for differences in identifiers, literals,
layout, whitespace, and comments
17. # Type 2 Clone
def do_something_cool_in_Python(path, end='---end---'):
! targets = list()
! with open(path) as data_file:
! ! for t in datae:
! ! ! if l.endswith(end):
! ! ! ! targets.append(t) # Stores only lines that ends with "marker"
! #Return the list of different lines
! return targets
# Original Fragment
def do_something_cool_in_Python(filepath, marker='---end---'):
! lines = list()
! with open(filepath) as report:
! ! for l in report:
! ! ! if l.endswith(marker):
! ! ! ! lines.append(l) # Stores only lines that ends with "marker"
! return lines #Return the list of different lines
TYPE 2: Parameter Substituted
• Structurally identical segments except for differences in identifiers, literals,
layout, whitespace, and comments
18. TYPE 3: Structure Substituted
• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
19. import os
def do_something_with(path, marker='---end---'):
! # Check if the input path corresponds to a file
! if not os.path.isfile(path):
! ! return None
! bad_ones = list()
! good_ones = list()
! with open(path) as report:
! ! for line in report:
! ! ! line = line.strip()
! ! ! if line.endswith(marker):
! ! ! ! good_ones.append(line)
! ! ! else:
! ! ! ! bad_ones.append(line)
! #Return the lists of different lines
! return good_ones, bad_ones
TYPE 3: Structure Substituted
• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
20. import os
def do_something_with(path, marker='---end---'):
! # Check if the input path corresponds to a file
! if not os.path.isfile(path):
! ! return None
! bad_ones = list()
! good_ones = list()
! with open(path) as report:
! ! for line in report:
! ! ! line = line.strip()
! ! ! if line.endswith(marker):
! ! ! ! good_ones.append(line)
! ! ! else:
! ! ! ! bad_ones.append(line)
! #Return the lists of different lines
! return good_ones, bad_ones
TYPE 3: Structure Substituted
• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
21. import os
def do_something_with(path, marker='---end---'):
! # Check if the input path corresponds to a file
! if not os.path.isfile(path):
! ! return None
! bad_ones = list()
! good_ones = list()
! with open(path) as report:
! ! for line in report:
! ! ! line = line.strip()
! ! ! if line.endswith(marker):
! ! ! ! good_ones.append(line)
! ! ! else:
! ! ! ! bad_ones.append(line)
! #Return the lists of different lines
! return good_ones, bad_ones
TYPE 3: Structure Substituted
• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
22. import os
def do_something_with(path, marker='---end---'):
! # Check if the input path corresponds to a file
! if not os.path.isfile(path):
! ! return None
! bad_ones = list()
! good_ones = list()
! with open(path) as report:
! ! for line in report:
! ! ! line = line.strip()
! ! ! if line.endswith(marker):
! ! ! ! good_ones.append(line)
! ! ! else:
! ! ! ! bad_ones.append(line)
! #Return the lists of different lines
! return good_ones, bad_ones
TYPE 3: Structure Substituted
• Similar segments with further modifications such as changed, added (or deleted)
statements, in additions to variations in identifiers, literals, layout and comments
23. TYPE 4: “Functional” Copies
• Semantically equivalent segments that perform the same
computation but are implemented by different syntactic variants
24. # Original Fragment
def do_something_cool_in_Python(filepath, marker='---end---'):
! lines = list()
! with open(filepath) as report:
! ! for l in report:
! ! ! if l.endswith(marker):
! ! ! ! lines.append(l) # Stores only lines that ends with "marker"
! return lines #Return the list of different lines
def do_always_the_same_stuff(filepath, marker='---end---'):
! report = open(filepath)
! file_lines = report.readlines()
! report.close()
! #Filters only the lines ending with marker
! return filter(lambda l: len(l) and l.endswith(marker), file_lines)
TYPE 4: “Functional” Copies
• Semantically equivalent segments that perform the same
computation but are implemented by different syntactic variants
38. • String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
STATEOFTHEART
TECHNIQUES
39. • String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
• Syntax based (AST) Techniques:
• Pros: Well suited to detect structural similarities
• Cons: Not Properly suited to detect Type 3 Clones
STATEOFTHEART
TECHNIQUES
40. • String/Token based Techniques:
• Pros: Run very fast
• Cons: Too many false clones
• Syntax based (AST) Techniques:
• Pros: Well suited to detect structural similarities
• Cons: Not Properly suited to detect Type 3 Clones
• Graph based Techniques:
• Pros: The only one able to deal with Type 4 Clones
• Cons: Performance Issues
STATEOFTHEART
TECHNIQUES
43. USE
MACHINE
LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
44. USE
MACHINE
LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
45. USE
MACHINE
LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
46. USE
MACHINE
LEARNING
L U K E
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
• the application of the learning algorithms to the considered data
48. UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
(+) No cost of labeling samples
(-) Trade-off imposed on the quality of the data
50. CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
52. <
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST
KERNELFORCLONES
53. <
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST AST KERNEL
KERNELFORCLONES
<
block
while
= =
block
=
y -
=
x +
+
x 1
-
y 1
<
x y
>
b 0 =
c 3
if
block
>
b a
-
b 1
<
block
while
+
a 1
=
b -
=
a +
58. while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
59. while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
IC = Conditional-Expr
I = Less-operator
C = Loop
Ls= [x,y]
IC = Loop
I = while-loop
C = Function-Body
Ls= [x, y]
Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
IC = Block
I = while-body
C = Loop
Ls= [ x ]
60. CLONE DETECTION
• Comparison with another (pure) AST-based clone detector
• Comparison on a system with randomly seeded clones
0
0.25
0.5
0.75
1
Precision Recall F-measure
CloneDigger Tree Kernel Tool
RE
SULTS
Results refer to clones where code
fragments have been modified by adding/
removing or changing code statements
61. 0
0.25
0.50
0.75
1.00
0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98
Precision, Recall and F-Measure
Precision Recall F1
Precision: How accurate are the obtained results?
(Altern.) How many errors do they contain?
Recall: How complete are the obtained results?
(Altern.) How many clones have been retrieved w.r.t. Total Clones?
63. CODE
STRUCTURES
PDG
• Two Types of Nodes
• Control Nodes (Dashed ones)
• e.g., if - for - while - function calls...
• Data Nodes
• e.g., expressions - parameters...
NODES AND EDGES
while call-site
argexpr
64. CODE
STRUCTURES
PDG
• Two Types of Nodes
• Control Nodes (Dashed ones)
• e.g., if - for - while - function calls...
• Data Nodes
• e.g., expressions - parameters...
• Two Types of Edges (i.e., dependencies)
• Control edges (Dashed ones)
• Data edges
NODES AND EDGES
while call-site
argexpr
65. • Features of nodes:
• Node Label
• i.e., , WHILE, CALL-SITE, EXPR, ...
• Node Type
• i.e., Data Node or Control Node
• Features of edges:
• Edge Type
• i.e., Data Edge or Control Edge
KERNELS
FOR CODE
STRUCTURES:
PDG
GRAPH KERNELS
FOR PDG
while
call-site
arg
expr expr
66. • Features of nodes:
• Node Label
• i.e., , WHILE, CALL-SITE, EXPR, ...
• Node Type
• i.e., Data Node or Control Node
• Features of edges:
• Edge Type
• i.e., Data Edge or Control Edge
KERNELS
FOR CODE
STRUCTURES:
PDG
Node Label = WHILE
Node Type = Control Node
GRAPH KERNELS
FOR PDG
while
call-site
arg
expr expr
Control Edge
Data Edge
67. while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG
• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
68. while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG
• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
69. while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG
• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
70. while
call-site
arg
expr expr
while
call-site
arg
expr call-site
GRAPH KERNELS FOR PDG
• Goal: Identify common subgraphs
• Selectors: Compare nodes to each others and explore the subgraphs of only “compatible”
nodes (i.e., Nodes of the same type)
• Context: The subgraph of a node (with paths whose lengths are at most L to avoid loops)
74. PROBL
EM
S T A T E
M E N T
(MODEL) CLONE
DETECTION
Models: models are typically represented visually, as box-and-arrow diagrams,
and the clones we are searching for are similar subgraphs of these diagrams.
Model Granularity: models could be represented at different levels of granularity
(such as the source code) corresponding to different syntactic (and semantic)
units.
Models Clones are categorized in (three) different Types
76. TYPE 1C L O N E S
(MODEL) CLONE
DETECTION
• Type 1 (exact) model clones: Identical model fragments except for
variations in visual presentation, layout and formatting.
77. TYPE 2C L O N E S
(MODEL) CLONE
DETECTION
Type 2 (renamed) model clones: Structurally identical model fragments except
for variations in labels, values, types, visual presentation, layout and formatting.
model@Friction Mode Logic/Break
Apart Detection
model@Friction Mode Logic/Lockup
Detection/Required Friction for
Lockup
78. TYPE 3C L O N E S
(MODEL) CLONE
DETECTION
Type 3 (near-miss) model clones: Model fragments with further modifications,
such as changes in position or connection with respect to other model fragments
and small additions or removals of blocks or lines in addition to variations in labels,
values, types, visual presentation, layout and formatting.
model@Speed.speed_estimation
model@Throttle.throttle_estimation