This presentation is given in Working Conference on Reverse Engineering (WCRE 2012).
The paper title is: "Detecting Clones across Microsoft .NET Programming Languages"
Abstract:
The Microsoft .NET framework and its language
family focus on multi-language development to support
interoperability across several programming languages. The
framework allows for the development of similar applications
in different languages through the reuse of core libraries. As a
result of such a multi-language development, the identification
and traceability of similar code fragments (clones) becomes a
key challenge. In this paper, we present a clone detection
approach for the .NET language family. The approach is based
on the Common Intermediate Language, which is generated by
the .NET compiler for the different languages within the .NET
framework. In order to achieve an acceptable recall while
maintaining the precision of our detection approach, we define
a set of filtering processes to reduce noise in the raw data. We
show that these filters are essential for Intermediate Languagebased
clone detection, without significantly affecting the
precision of the detection approach. Finally, we study the
quantitative and qualitative performance aspects of our clone
detection approach. We evaluate the number of reported
candidate clone-pairs, as well as the precision and recall (using
manual validation) for several open source cross-language
systems, to show the effectiveness of our proposed approach.
Sentence-to-Code Traceability Recovery with Domain OntologiesShinpei Hayashi
The document describes a technique for recovering traceability between natural language sentences and source code using domain ontologies. An automated tool was implemented and evaluated on a case study using the JDraw software. Results showed the technique worked well, recovering traceability between 7 sentences and code with higher accuracy than without using the ontology. The ontology helped improve recall and detect traceability in cases where word similarity alone did not work well. Future work is needed to evaluate on larger cases and domains.
Learning from other's mistakes: Data-driven code analysisAndreas Dewes
Static code analysis is an useful tool that can help to detect bugs early in the software development life cycle. I will explain the basics of static analysis and show the challenges we face when analyzing Python code. I will introduce a data-driven approach to code analysis that makes use of public code and example-based learning and show how it can be applied to analyzing Python code.
The document outlines the syllabus for a course on digital system design taught by Dr. R. Prakash Rao. The syllabus covers topics like Boolean algebra, logic gates, combinational logic systems, sequential logic systems, and binary codes. It also discusses digital representation of signals, error detection and correction codes, and Boolean algebra manipulation.
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that “code is text” is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code.
In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development.
Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.
The document compares and contrasts the Java and C# programming languages. It summarizes that Java is not fully object-oriented as it uses primitive types, while C# makes all types objects. It also discusses various language features introduced over time, showing that C# often introduced useful features earlier than Java, such as generics and LINQ. The document provides code examples to demonstrate how tasks can be expressed more declaratively and concisely in C# compared to Java.
This document provides an introduction to the C# programming language. It outlines prerequisites, learning objectives, and an agenda. The agenda includes an overview of C# design goals like component orientation and everything being an object. It also covers C# fundamentals like types, program structure, statements, operators, and using Visual Studio.NET and the .NET framework. Key points are made about C# having a unified type system where all data is an object and value and reference types can be treated polymorphically using boxing and unboxing.
C# is a component-oriented language that introduces object-oriented improvements to the C/C++ family of languages. Everything in C# is an object, providing a unified type system without performance costs. C# aims to produce robust, durable software using techniques like garbage collection and exceptions, while preserving investments in existing C/C++ code through interoperability. The document provides an overview of key C# concepts like classes, interfaces, attributes, and events to illustrate how C# supports component-based development.
Sentence-to-Code Traceability Recovery with Domain OntologiesShinpei Hayashi
The document describes a technique for recovering traceability between natural language sentences and source code using domain ontologies. An automated tool was implemented and evaluated on a case study using the JDraw software. Results showed the technique worked well, recovering traceability between 7 sentences and code with higher accuracy than without using the ontology. The ontology helped improve recall and detect traceability in cases where word similarity alone did not work well. Future work is needed to evaluate on larger cases and domains.
Learning from other's mistakes: Data-driven code analysisAndreas Dewes
Static code analysis is an useful tool that can help to detect bugs early in the software development life cycle. I will explain the basics of static analysis and show the challenges we face when analyzing Python code. I will introduce a data-driven approach to code analysis that makes use of public code and example-based learning and show how it can be applied to analyzing Python code.
The document outlines the syllabus for a course on digital system design taught by Dr. R. Prakash Rao. The syllabus covers topics like Boolean algebra, logic gates, combinational logic systems, sequential logic systems, and binary codes. It also discusses digital representation of signals, error detection and correction codes, and Boolean algebra manipulation.
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that “code is text” is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code.
In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development.
Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.
The document compares and contrasts the Java and C# programming languages. It summarizes that Java is not fully object-oriented as it uses primitive types, while C# makes all types objects. It also discusses various language features introduced over time, showing that C# often introduced useful features earlier than Java, such as generics and LINQ. The document provides code examples to demonstrate how tasks can be expressed more declaratively and concisely in C# compared to Java.
This document provides an introduction to the C# programming language. It outlines prerequisites, learning objectives, and an agenda. The agenda includes an overview of C# design goals like component orientation and everything being an object. It also covers C# fundamentals like types, program structure, statements, operators, and using Visual Studio.NET and the .NET framework. Key points are made about C# having a unified type system where all data is an object and value and reference types can be treated polymorphically using boxing and unboxing.
C# is a component-oriented language that introduces object-oriented improvements to the C/C++ family of languages. Everything in C# is an object, providing a unified type system without performance costs. C# aims to produce robust, durable software using techniques like garbage collection and exceptions, while preserving investments in existing C/C++ code through interoperability. The document provides an overview of key C# concepts like classes, interfaces, attributes, and events to illustrate how C# supports component-based development.
In this chapter we will understand how to define custom classes and their elements. We will learn to declare fields, constructors and properties for the classes. We will revise what a method is and we will broaden our knowledge about access modifiers and methods.
The document discusses the C programming language. It notes that C is commonly used for embedded systems as it produces efficient code and is portable across processors. The history of C is described, including its origins in the late 1960s/early 1970s at Bell Labs as a system programming language derived from B and BCPL. Examples of Hello World and Euclid's algorithm in C are provided to illustrate basic C syntax and constructs. Key aspects of C like types, expressions, functions, and memory management are summarized.
03 and 04 .Operators, Expressions, working with the console and conditional s...Intro C# Book
The document discusses Java syntax and concepts including:
1. It introduces primitive data types in Java like int, float, boolean and String.
2. It covers variables, operators, and expressions - how they are used to store and manipulate data in Java.
3. It explains console input and output using Scanner and System.out methods for reading user input and printing output.
4. It provides examples of using conditional statements like if and if-else to control program flow based on conditions.
The document discusses primitive data types in C#, including integer, floating-point, boolean, character, and string types. It defines each type, provides examples of declaring variables of each type and assigning values, and describes literals that can be used to represent values of different types. Key points covered include the name, size, and default value of each primitive type as well as demonstrations of declaring and initializing variables in C#.
The document summarizes the International Data Encryption Algorithm (IDEA) and RSA encryption algorithms.
IDEA is a symmetric-key block cipher that operates on 64-bit blocks using a 128-bit key. It consists of 8 identical rounds plus an output transformation. Each round uses operations like XOR, expansion, and multiplication modulo 216+1. IDEA was intended as a replacement for DES and was used in PGP v2.0. RSA is an asymmetric algorithm that uses a public/private key pair based on the difficulty of factoring large numbers that are products of two prime numbers. It involves key generation using prime numbers, computing the modulus and keys, and can encrypt/decrypt messages using those keys. Both
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Adam Mukharil Bachtiar
This file contains explanation about introduction of dev pascal, data type, value, and identifier. This file was used in my Algorithm and Programming Class.
This document provides an introduction to Python programming for an artificial intelligence lab course. It covers downloading and installing Python and the Anaconda distribution, using Spyder as an IDE, variables and data types in Python, and basic concepts like indentation, variable naming conventions, and Python keywords. The goal is to prepare students to use Python for labs related to artificial intelligence topics.
C# is a component-oriented language that introduces object-oriented improvements to the C/C++ family of languages. Key features include garbage collection, exceptions, type safety, and preservation of C++ investments like namespaces and enums. Everything in C# is an object, unifying value and reference types without performance penalties. The language supports robust features like properties, events, generics and attributes to enable component-based development.
Esoft Metro Campus - Diploma in Information Technology - (Module IX) Programming with C#.NET
(Template - Virtusa Corporate)
Contents:
Introduction to .NET Framework
.NET Framework Platform Architecture
Microsoft Visual Studio
C# Language
C#, VS and .NET Framework Versions
Your First C# Application
Printing Statements
Comments in C#
Common Type System
Value Types and Reference Type
Variables Declaration in C#
Type Conversion
Arithmetic Operators
Assignment Operators
Comparison Operators
Logical Operators
If Statement
If… Else Statement
If… Else if… Else Statement
Nested If Statement
Switch Statement
While Loop
Do While Loop
For Loop
Arrays
Accessing Arrays using foreach Loop
Two Dimensional Arrays
Classes and Objects in C#
Inheritance in C#
Partial Classes
Namespaces
Windows Forms Applications
Using Buttons, Labels and Text Boxes
Displaying Message Boxes
Error Handling with Try… Catch… finally…
Using Radio Buttons
Using Check Boxes
Using List Boxes
Creating Menus
Creating ToolStrips
MDI Forms
Database Application in C#
Creating a Simple Database Application
SQL Insert / Update / Retrieving / Delete
SQL Command Execute Methods
Data Sets
The document introduces C# as the first component-oriented language in the C/C++ family. It discusses key features of C# including everything being an object, robust and durable software through garbage collection and exceptions, and preservation of investment from C++. It provides examples of basic C# concepts like classes, structs, interfaces, enums, delegates, properties and events.
The document introduces C# and discusses its key features. It describes C# as the first component-oriented language in the C/C++ family, where everything is an object and it enables robust and durable software through features like garbage collection and exceptions. It also discusses how C# preserves investments in existing software and languages through interoperability. The document provides overviews of major C# concepts like its type system, classes, interfaces, attributes and statements.
This document compares key features of the C# and Java programming languages, including differences in their type systems, generics, keywords, exceptions handling, and specific features like anonymous classes, properties, delegates, and LINQ. It outlines common conventions and pitfalls between the two languages and provides code examples to illustrate differences in generics, constraints, exceptions, and language features like using blocks and lambda expressions.
This document provides a summary of a presentation on object-oriented programming (OOP) and clean code given at IPB Computer Science on March 28, 2017. It introduces the speaker, Ifnu Bima, and his background working at Deutsche Bank and blibli.com. The presentation covers topics like code quality metrics, meaningful naming conventions, high-quality functions, comments, and unit testing. It emphasizes writing code that is easy to maintain and modify over time to prevent issues like bugs and technical debt.
Java ppt Gandhi Ravi (gandhiri@gmail.com)Gandhi Ravi
This document provides an introduction to the Java programming language. It discusses that Java is an object-oriented, platform-independent language that uses a runtime environment and API. It explains how Java code is compiled and executed. It also covers Java applications, data types, operators, keywords, variables, constructors, inheritance, polymorphism, arrays and other core Java concepts. The document provides examples to illustrate different Java programming concepts.
Learn about the basic fundamentals of java and important for the different company's interview. Topics like JRE, JDK, Java Keywords, Primitive DataTypes, Types of Variables, Logical, Shift and Bitwise Operator working, Command Line Argument, Handling Arrays, Array Copy, and different programs and output based programs.
The document summarizes the evolution and future directions of the C# programming language. It discusses new features in recent versions such as generics in C# 2.0, language integrated query in C# 3.0, and dynamic programming in C# 4.0. It also covers trends toward declarative programming, concurrency, and compiler as a service. The presentation provides examples and demos of new C# 4.0 features like dynamic typing, optional and named parameters, and covariance and contravariance.
The document outlines a proposal for a tool called CnP to detect and prevent errors from copy-and-pasted code during software development. It describes how copy-pasted code can lead to inconsistencies when modified. It then details a proof-of-concept tool called CReN that tracks copy-pasted code, automatically renames identifiers consistently across clones when modified, and demonstrates how it could catch errors in examples from literature. It proposes evaluating CReN and exploring using it to detect other types of inconsistencies from copy-pasted code.
Renaming parts of identifiers inconsistently within code clones can introduce errors. The CReN and LexId tools help address this issue by tracking code clones and consistently renaming all instances of an identifier when one instance is edited. A user study found LexId helped programmers rename identifiers more quickly and consistently compared to performing renames manually without tool support.
This document discusses automatically detecting package clones and inferring security vulnerabilities. It proposes using statistical classification techniques to identify cloned code between software packages. Features like common filenames, hashes, and fuzzy content would be used for classification. Packages found to share code could then be checked against known vulnerabilities to see if any vulnerabilities may affect the cloned code. The approach aims to scale the analysis to thousands of packages and help identify vulnerabilities in packages with cloned code that may not otherwise be tracked.
Cloning means the use of copy-paste as method in developing software artefacts. This practice has several problems, such as unnecessary increase of these artefacts, and thereby increased comprehension and change efforts, as well as potential inconsistencies. The automatic detection of clones has been a topic for research for several years now and we have made huge progress in terms of precision and recall. This led to a series of empirical analyses we have performed on the effects and the amount of cloning in code, models and requirements. We continue to investigate the effects of cloning and work on extending clone detection to functionally similar code. This talk will give insights into how clone detection works and the empirical results we have gathered.
In this chapter we will understand how to define custom classes and their elements. We will learn to declare fields, constructors and properties for the classes. We will revise what a method is and we will broaden our knowledge about access modifiers and methods.
The document discusses the C programming language. It notes that C is commonly used for embedded systems as it produces efficient code and is portable across processors. The history of C is described, including its origins in the late 1960s/early 1970s at Bell Labs as a system programming language derived from B and BCPL. Examples of Hello World and Euclid's algorithm in C are provided to illustrate basic C syntax and constructs. Key aspects of C like types, expressions, functions, and memory management are summarized.
03 and 04 .Operators, Expressions, working with the console and conditional s...Intro C# Book
The document discusses Java syntax and concepts including:
1. It introduces primitive data types in Java like int, float, boolean and String.
2. It covers variables, operators, and expressions - how they are used to store and manipulate data in Java.
3. It explains console input and output using Scanner and System.out methods for reading user input and printing output.
4. It provides examples of using conditional statements like if and if-else to control program flow based on conditions.
The document discusses primitive data types in C#, including integer, floating-point, boolean, character, and string types. It defines each type, provides examples of declaring variables of each type and assigning values, and describes literals that can be used to represent values of different types. Key points covered include the name, size, and default value of each primitive type as well as demonstrations of declaring and initializing variables in C#.
The document summarizes the International Data Encryption Algorithm (IDEA) and RSA encryption algorithms.
IDEA is a symmetric-key block cipher that operates on 64-bit blocks using a 128-bit key. It consists of 8 identical rounds plus an output transformation. Each round uses operations like XOR, expansion, and multiplication modulo 216+1. IDEA was intended as a replacement for DES and was used in PGP v2.0. RSA is an asymmetric algorithm that uses a public/private key pair based on the difficulty of factoring large numbers that are products of two prime numbers. It involves key generation using prime numbers, computing the modulus and keys, and can encrypt/decrypt messages using those keys. Both
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Adam Mukharil Bachtiar
This file contains explanation about introduction of dev pascal, data type, value, and identifier. This file was used in my Algorithm and Programming Class.
This document provides an introduction to Python programming for an artificial intelligence lab course. It covers downloading and installing Python and the Anaconda distribution, using Spyder as an IDE, variables and data types in Python, and basic concepts like indentation, variable naming conventions, and Python keywords. The goal is to prepare students to use Python for labs related to artificial intelligence topics.
C# is a component-oriented language that introduces object-oriented improvements to the C/C++ family of languages. Key features include garbage collection, exceptions, type safety, and preservation of C++ investments like namespaces and enums. Everything in C# is an object, unifying value and reference types without performance penalties. The language supports robust features like properties, events, generics and attributes to enable component-based development.
Esoft Metro Campus - Diploma in Information Technology - (Module IX) Programming with C#.NET
(Template - Virtusa Corporate)
Contents:
Introduction to .NET Framework
.NET Framework Platform Architecture
Microsoft Visual Studio
C# Language
C#, VS and .NET Framework Versions
Your First C# Application
Printing Statements
Comments in C#
Common Type System
Value Types and Reference Type
Variables Declaration in C#
Type Conversion
Arithmetic Operators
Assignment Operators
Comparison Operators
Logical Operators
If Statement
If… Else Statement
If… Else if… Else Statement
Nested If Statement
Switch Statement
While Loop
Do While Loop
For Loop
Arrays
Accessing Arrays using foreach Loop
Two Dimensional Arrays
Classes and Objects in C#
Inheritance in C#
Partial Classes
Namespaces
Windows Forms Applications
Using Buttons, Labels and Text Boxes
Displaying Message Boxes
Error Handling with Try… Catch… finally…
Using Radio Buttons
Using Check Boxes
Using List Boxes
Creating Menus
Creating ToolStrips
MDI Forms
Database Application in C#
Creating a Simple Database Application
SQL Insert / Update / Retrieving / Delete
SQL Command Execute Methods
Data Sets
The document introduces C# as the first component-oriented language in the C/C++ family. It discusses key features of C# including everything being an object, robust and durable software through garbage collection and exceptions, and preservation of investment from C++. It provides examples of basic C# concepts like classes, structs, interfaces, enums, delegates, properties and events.
The document introduces C# and discusses its key features. It describes C# as the first component-oriented language in the C/C++ family, where everything is an object and it enables robust and durable software through features like garbage collection and exceptions. It also discusses how C# preserves investments in existing software and languages through interoperability. The document provides overviews of major C# concepts like its type system, classes, interfaces, attributes and statements.
This document compares key features of the C# and Java programming languages, including differences in their type systems, generics, keywords, exceptions handling, and specific features like anonymous classes, properties, delegates, and LINQ. It outlines common conventions and pitfalls between the two languages and provides code examples to illustrate differences in generics, constraints, exceptions, and language features like using blocks and lambda expressions.
This document provides a summary of a presentation on object-oriented programming (OOP) and clean code given at IPB Computer Science on March 28, 2017. It introduces the speaker, Ifnu Bima, and his background working at Deutsche Bank and blibli.com. The presentation covers topics like code quality metrics, meaningful naming conventions, high-quality functions, comments, and unit testing. It emphasizes writing code that is easy to maintain and modify over time to prevent issues like bugs and technical debt.
Java ppt Gandhi Ravi (gandhiri@gmail.com)Gandhi Ravi
This document provides an introduction to the Java programming language. It discusses that Java is an object-oriented, platform-independent language that uses a runtime environment and API. It explains how Java code is compiled and executed. It also covers Java applications, data types, operators, keywords, variables, constructors, inheritance, polymorphism, arrays and other core Java concepts. The document provides examples to illustrate different Java programming concepts.
Learn about the basic fundamentals of java and important for the different company's interview. Topics like JRE, JDK, Java Keywords, Primitive DataTypes, Types of Variables, Logical, Shift and Bitwise Operator working, Command Line Argument, Handling Arrays, Array Copy, and different programs and output based programs.
The document summarizes the evolution and future directions of the C# programming language. It discusses new features in recent versions such as generics in C# 2.0, language integrated query in C# 3.0, and dynamic programming in C# 4.0. It also covers trends toward declarative programming, concurrency, and compiler as a service. The presentation provides examples and demos of new C# 4.0 features like dynamic typing, optional and named parameters, and covariance and contravariance.
The document outlines a proposal for a tool called CnP to detect and prevent errors from copy-and-pasted code during software development. It describes how copy-pasted code can lead to inconsistencies when modified. It then details a proof-of-concept tool called CReN that tracks copy-pasted code, automatically renames identifiers consistently across clones when modified, and demonstrates how it could catch errors in examples from literature. It proposes evaluating CReN and exploring using it to detect other types of inconsistencies from copy-pasted code.
Renaming parts of identifiers inconsistently within code clones can introduce errors. The CReN and LexId tools help address this issue by tracking code clones and consistently renaming all instances of an identifier when one instance is edited. A user study found LexId helped programmers rename identifiers more quickly and consistently compared to performing renames manually without tool support.
This document discusses automatically detecting package clones and inferring security vulnerabilities. It proposes using statistical classification techniques to identify cloned code between software packages. Features like common filenames, hashes, and fuzzy content would be used for classification. Packages found to share code could then be checked against known vulnerabilities to see if any vulnerabilities may affect the cloned code. The approach aims to scale the analysis to thousands of packages and help identify vulnerabilities in packages with cloned code that may not otherwise be tracked.
Cloning means the use of copy-paste as method in developing software artefacts. This practice has several problems, such as unnecessary increase of these artefacts, and thereby increased comprehension and change efforts, as well as potential inconsistencies. The automatic detection of clones has been a topic for research for several years now and we have made huge progress in terms of precision and recall. This led to a series of empirical analyses we have performed on the effects and the amount of cloning in code, models and requirements. We continue to investigate the effects of cloning and work on extending clone detection to functionally similar code. This talk will give insights into how clone detection works and the empirical results we have gathered.
"Clone detection in Python": Slides presented at EuroPython 2012
Clone Detection in Python highlights the topic of code duplication detection using Machine Learning techniques.
Some examples on Python code duplications and C-Python implementation duplications are reported as well.
C# and the Evolution of a Programming LanguageJacinto Limjap
This presentation discusses an overview of the .NET framework, a little history of C#, and the evolution of C# from its early days up to its current form including a preview of C# 7.0
How To Code in C# The Complete Course. From data types to object orientation. Includes code samples and exercises.
Topics
Getting Started with C#
C# Language Fundamentals
Branching
Operators
Object-Orientated Programming
Classes and Objects
Inside Methods
Debugging
Inheritance and Polymorphism
Operator Overloading
Structs
Interfaces
Arrays
Collection Interfaces and Types
Strings
Throwing and Catching Exceptions
Delegates and EventsGenerics
New Language Features
Presented on 27th September 2017 to a joint meeting of 'Cork Functional Programmers' and the 'Cork Java Users Group'
Based on the Kotlin Language programming course from Instil. For more details see https://instil.co/courses/kotlin-development/
The document discusses compilers and their role in translating high-level programming languages into machine-readable code. It provides examples of Fortran code being compiled into Java byte code. A compiler consists of a front end that analyzes syntax and semantics and a back end that generates machine code. The front end performs lexical analysis, parsing, and generates an abstract syntax tree and symbol table. The back end then takes this intermediate representation and converts it into executable machine instructions.
The document provides an overview of the C# programming language, covering topics such as data types, operators, expressions, statements, console I/O, loops, arrays, and methods. It describes the various primitive data types in C#, including integer, floating-point, fixed-point, boolean, character, string, and object types. It also discusses variables and identifiers in C#, explaining how to declare variables and the syntax rules for identifiers.
The document provides an introduction and overview of the C# programming language. It covers topics such as types, expressions, declarations, classes, structs, namespaces, assemblies, attributes, threads, and XML comments. It compares C# to languages like Java and C++, and outlines new features in C# like reference and output parameters, objects on the stack, rectangular arrays, and generics. It also provides a basic "Hello World" example and discusses how C# programs are typically structured across multiple files.
The document provides an overview of the Kotlin programming language, including what it is, who created it, when it was created, where it can be used, and why it is useful. Specifically:
- Kotlin is a programming language created by JetBrains as an alternative to Java that compiles to JVM bytecode.
- It was created in 2010 and became open source in 2012, with stable releases beginning in 2016 and support on Android announced by Google in 2017.
- Kotlin can run on the JVM for Android and server-side applications as well as JavaScript and native platforms, though cross-platform code cannot use Java libraries.
- Advantages include being modern, concise,
This document provides an overview and introduction to debugging iOS applications using Xcode. It discusses the Xcode debugging environments, setting exception and symbolic breakpoints, editing and managing breakpoints, breakpoint actions including using AppleScript, and expressions. The document is intended to help new developers quickly learn the basics of debugging iOS applications in Xcode. It is part of a three part series on iOS debugging.
Scala Intro training @ Lohika, Odessa, UA.
This is a basic Scala Programming Language overview intended to evangelize the language among any-language programmers.
Chapter i c#(console application and programming)Chhom Karath
A console application is a C# application that runs in a console/command line interface rather than a graphical user interface. It consists of C# code that is compiled into an assembly and then into native code using a just-in-time compiler. The native code executes in the context of the Common Language Runtime. Console applications allow simple text-based input and output and are useful for tasks like data processing or automation.
The document appears to be notes from a .NET conference covering various topics related to C# and .NET concepts like inheritance, equality comparisons, enums, interfaces, partial classes, and type constructors. It includes code examples and explanations of the output for each example. Multiple choice questions are provided with the correct answers explained briefly.
The document discusses Ron Munitz's background and expertise, which includes distributed fault tolerant systems, highly distributed video routers, real-time embedded systems, Android, and enterprise mobility and security. It provides an agenda for a talk on the Java Native Interface (JNI) and native Android apps. The agenda includes an introduction to JNI theory and a "Hello World" tutorial, followed by a discussion of JNI in the Android Open Source Project (AOSP) and writing native Android apps.
The document provides an overview of MSIL (Microsoft Intermediate Language):
- MSIL is a CPU-independent bytecode that is generated by .NET compilers instead of native code. It targets the CLR for execution.
- The article explains MSIL's stack-based approach, data types, instruction types, and how instructions are executed. It also demonstrates simple MSIL code examples from C# code.
- The Ildasm tool can be used to examine the MSIL code generated from C# programs, and help debug issues by viewing the low-level operations.
DieHard: Probabilistic Memory Safety for Unsafe LanguagesEmery Berger
DieHard uses randomization and replication to transparently make C and C++ programs tolerate a wide range of errors, including buffer overflows and dangling pointers. Instead of crashing or running amok, DieHard lets programs continue to run correctly in the face of memory errors with high probability. Using DieHard also makes programs highly resistant to heap-based hacker attacks. Downloadable at www.diehard-software.org.
This document discusses how Pragmatic Smalltalk aims to allow Smalltalk code to play well with other languages by compiling to native code compatible with Objective-C. It describes how Smalltalk code can directly call C functions and interoperate with Objective-C code and libraries. Key aspects covered include compiling Smalltalk blocks and memory management to be compatible with Objective-C, and allowing Smalltalk code to be used from the terminal and shell scripts.
This document discusses IronSmalltalk, which aims to provide a Smalltalk environment that runs on the Microsoft .NET DLR framework. It covers the history and motivation behind IronSmalltalk, provides an overview of key .NET DLR concepts like message sends, call sites, and call site binders. It also discusses how IronSmalltalk implements expression trees, the code pipeline from Smalltalk to MSIL, and techniques like polymorphic inline caching. The presentation concludes by discussing the project's goals and some future directions.
Kotlin is a language developed by JetBrains that compiles to JVM bytecode and JavaScript. It is statically typed, supports functional and object-oriented programming, and is fully interoperable with Java. The document discusses Kotlin's advantages over Java for Android development, including null safety, named arguments, and extension functions. It also covers Kotlin libraries and tools that improve Android development, such as the Kotlin standard library, Kotlin extensions for Android, Anko, and Dagger 2 integration. The author shares their experience of migrating an Android project to Kotlin in a incremental, test-driven manner.
The document discusses the need for a new mainstream programming language from the perspective of game developers. It outlines the typical processes, challenges, and types of code involved in game development. Key points are that current languages fail at concurrency, reliability, and performance when developing modern, complex games. A new language is needed that enables safe concurrency, eliminates common bugs through strong typing, and supports parallelism across CPUs and GPUs.
The Next Mainstream Programming Language: A Game Developer’s Perspectiveguest4fd7a2
Tim Sweeney\'s talk at the Symposium on Principles of Programming Languages 2006. Tim is the founder of Epic Games and the lead architect of the Unreal series of engines
Similar to Detecting Clones across Microsoft .NET Programming Languages (WCRE2012) (20)
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Detecting Clones across Microsoft .NET Programming Languages (WCRE2012)
1. This is not the original version given in the WCRE 2012 conference (no animation etc.)
Detecting Clones across
Microsoft .NET Programming Languages
Farouq Al-omari
Iman Keivanloo
Chanchal K. Roy
Juergen Rilling
Contact:
keivanloo@ieee.org
Working Conference on Reverse Engineering, Canada, Kingston 18 October 2012 – I. Keivanloo
3. Clone Detection across Languages
General Solution
• C#
• VB.NET
•
• J#
Intermediate Language (IL)
(low level)
• F#
The solution is to use this
• COBOL (.NET) (instead of dealing with
several languages)
• Java
3
4. Clone Detection across Languages using IL
Is there any chance to work?
Input Data Type
CIL Source Code
Dataset # Clone # Clone # Clone Class # Clone
Class Fragment Fragment
ASXGUI 9 393 69 261
Mono 37 4373 369 1523
• Up to 3 times more cloned fragment detected
using IL
4
5. Clone Detection across Languages using IL
Observed Challenges (using an example)
VB.NET C#
Sub Main() static void main(string[] args){
Dim x As Integer x = 10 int x=10;
If x < 0 Then if(x<0)
x += 1 x++;
Else Console.WriteLine("Positive number") else
End If console.WriteLine ("Positive number");
End Sub }
5
6. Clone Detection across Languages using IL
Observed Challenges (using an example)
VB IL from VB C# IL from C#
VB.NET C#
Sub Main() static void main(string[] args){
Dim x As Integer x = 10 int x=10;
If x < 0 Then if(x<0)
x += 1 x++;
Else Console.WriteLine("Positive number") else
End If console.WriteLine ("Positive number");
End Sub }
6
7. Clone Detection across Languages using IL
Observed Challenges
VB.NET C#
Sub Main() static void main(string[] args){
Dim x As Integer x = 10 int x=10;
If x < 0 Then if(x<0)
x += 1 x++;
Observed Challenges
Else Console.WriteLine("Positive number") else
console.WriteLine ("Positive number");
1- Larger unpredictable size at IL level
End If
End Sub } [Keivanloo IWSC’12]
2- Higher dissimilarity at IL level
7
8. Observed Challenges #2: High Dissimilarity
Noise
• Sample IL Major noise types:
• Line numbers
• Pointers to line number
IL_000c: ldloc.0
• Push, Pop …
IL_000d: ldc.i4.1
• Detailed Data Type data
IL_000e: add.ovf
IL_000f: stloc.0
IL_0010: br.s IL_0024
IL_0012: nop
IL_0013: ldstr "Positive number"
IL_0018: call void [mscorlib]System.Console::WriteLine(string)
8
9. Clone Detection across Languages using IL
The Core Solution
• The Challenge: Noise
• Solution: Data cleansing (filtering noises)
• Why? (Answer: to increase recall)
Source Code IL + noise IL - noise 9
10. Our Before After Example
Filter Set Filtering Filtering Description
Filter 1 Filters for noise reduction
IL_0003: stloc.0 stloc.0 IL_0003 (instruction address)
Filter 2 brtrue.s IL_0015 brtrue.s The IL_0015 address of the
branch destination
Filter 3 ldarg 3 ldarg The value 3&1 represent
starg 1 starg argument number
Filter 4 ldc.i4.s 10 ldc.i4.s 10 is the number (pushed to
the stack)
Filter 5 ldstr "Positive number" ldstr “positive number” is the
printed string constant
Filter 6 stloc 7 stloc 7 represents variable index
Filter 7 ldc.i4.s 10 ldc i4 represent the int32 data
type in CIL and s for Short
Filter 8 IL_0011: add add
IL_0012: stloc.0 stloc Note that Filter 8 is just a nick
IL_0013: br.s IL_0020 name. Refer to the Filter 8
br description section for more
IL_001a: call void
[mscorlib]System.Console::W call details.
10
riteLine (string)
11. Clone Detection across Languages using IL
Filtering Advantage: Recall Improvement
Before Filtering Noises:
VB.NET C#
~50% similarity
Sub Main()
Dim x As Integer x = 10
If x < 0 Then
x += 1
After:Else Console.WriteLine("Positive number")
End If
~90% similarity
End Sub
11
12. Disadvantage of Noise reduction
Danger!
• Data Loss
• What if we remove important
data during data cleansing
• Might mislead the detection by
making non-cloned pairs identical
Possible negative effect on Precision
Filtering Color Data
12
13. RQ: Are They (Filters) Dangerous?
Evaluation Preparation
1. Filter Contribution Formula:
2. Dataset preparation:
– Controlled dataset (iText.NET J#) 25 pairs * 3 Lang.
1. The Cloned Dataset (VB-C#, VB-J#, and C#-J#)
2. The Noncloned Dataset (VB-C#, VB-J#, and C#-J#)
13
14. RQ: Are They (Filters) Dangerous?
Filter Contribution - Study #1
• Are they harmful? (The answer is NO - based on following graphs, filters
do not remove similar amount of data from actual clones vs. NONcloned code fragments)
Cloned Dataset NonCloned Dataset
A strong threshold for the Judge to decide
0.3 0.2
14
15. RQ: Are They (Filters) Dangerous?
Filter Contribution - Study #2
• Are they useful?
(The answer is YES - based on
the given figure, our filters help to
discriminate among actual clones
and NONcloned
fragments, therefore it is possible
to separate them with high
confidence with the chosen
threshold)
15
16. RQ: Are They (Filters) Dangerous?
Filter Contribution - Study #3
• Does filtering make actual clone-pairs and noncloned
pairs similar? (we used Chernoff faces – glyphs, to see if filters make noncloned pairs similar to
cloned code. Each face represents a pair. As you can see, faces in group A are different from Group B in
most cases)
Final Conclusion:
Filters contribute to discriminate between cloned and noncloned fragments
16
17. An Interesting Unexpected Discovery
Language-dependency!!!
Corresponding faces in each group are
not similar, while all of them are
extracted from single language (IL).
Specially look at C#-J# faces, all of them
are different from other groups. This is an
interesting discovery that the original
high-level programming languages affect
similarity at the IL level
17
18. Clone Detection across Languages using IL
Our Clone Detection Framework
Input: .NET Code CIL Manipulation Clone Detection Clone Analysis Reporting
for Clone Detection Algorithms
Source Code LCS-based
Clone Clusters
(from NiCad)
MS .NET Report (CIL)
Proposed Filtering SimHash-based
EXE & DLL Mechanism (from SimCad) Merging
Report (Src Code)
IlDasm.exe
Levenshtein Source Code
CIL (plain text) Distance-based Mapping
18
19. The Selected Datasets for Performance
Evaluation
language File LOC Method
ASXGUI 2.5 VB.NET 47 32,594 303
ASXGUI 3.0 C# 19 2088 78
language File LOC Method
Mono 2.10 VB.NET 375 - -
Mono 2.10 C# 57 - -
Total 432 - 4998
language File LOC Method
iText C# - - -
iText.NET J# - - -
Total 2.5 K 600 K
4th Dataset: iText.NET dataset from 1st case study19
20. Clone Detection across Languages using IL
Our Clone Detection Framework Performance
Pay attention to
changes within
0.6 … 0.8
20
21. Clone Detection across Languages using IL
Our Clone Detection Framework
• 2K clone-pair manually investigated
Precision
The optimum, considering the
trade-off
between precision and recall,
was achieved using
Levenshtein Distance-based
comparison with the High
threshold (80% TP)
Recall
0.6 Normal (iText.NET API) 76% using High
0.7 High TP = {E and S} threshold between three
0.8 Extreme 21
languages (C#, J#, and VB.NET).
22. An Interesting Clone
Detected by Our Approach
private static string filename_nodir(string name)
{
int slash = -1, len = name.Length;
for (int i = 0; i < len; i++)
{
string sub = name.Substring(i, 1);
if (sub == "" || sub == "/")
C#
slash = i;
}
slash++;
return name.Substring(slash, len - slash);
}
*The matching algorithm was limited to the content available
within the boxes (it was NOT aware of same method names)
Function Filename_Nodir() As String
Dim intFileName As Integer, intSlash As Integer, strFilename As String
strFileName = editvid.video
For intFilename = 1 To len(strFileName)
VB.NET
If mid(strfilename, intfilename, 1) = "" Or mid(strfilename, intfilename, 1) = "/" Then
intslash = intFilename
End If
Next
Return mid(strFileName, intSlash + 1, len(strFilename) - intSlash) 22
End Function
23. Summary
• The first comprehensive research focusing on,
(1) .NET clone detection,
(2) across programming languages,
and (3) using Intermediate Language
• Identified challenges in cross language clone detection + IL
Input: .NET Code CIL Manipulation Clone Detection Clone Analysis Reporting
for Clone Detection Algorithms
Source Code LCS-based
Clone Clusters
(from NiCad)
MS .NET Report (CIL)
Proposed Filtering SimHash-based
EXE & DLL Mechanism (from SimCad) Merging
Report (Src Code)
IlDasm.exe
Levenshtein Source Code
CIL (plain text) Distance-based Mapping
23
24. Related Publication
Iman Keivanloo, Chanchal K. Roy, Juergen Rilling,
“Java Bytecode Clone Detection via Relaxation on Code
Fingerprint and Semantic Web Reasoning,”
6th International Workshop on Software Clones (IWSC), 2012.
{In this paper we answered some the very basic research questions related to this topicA general clone detection framework
This talk is about source code clones. And I am going to use Sheeps to present clones. Suppose that there is a ship which is doing Mergesort. And the other sheeps are also doing mergesort. I can detect them as clone groups since thery are identical. so far there is no problem, however it becomes challenging when we want to find sheeps from other planets which are doing merge sort as wellTwo code fragments that share some degree of similarityare typically considered a clone pair. Based on their actualsimilarity, clone pairs can be categorized [5, 8] as Type-1,Type-2, Type-3, and Type-4 clones. Type-1 clones are exactcopies of each other, except for possible differences inwhitespaces, layouts and comments. Type-2 clones aresyntactically identical fragments except for variations inidentifiers, literals, data types, whitespace, layouts andcomments. Copied fragments (e.g., Type-1 and Type-2clones) with further modifications such as additions,deletions and changes of statements are called Type-3clones. Type-2 and Type-3 clones are also known as nearmissclones. Code fragments that perform the samecomputation (e.g., semantically similar) but implementedthrough different syntactic variations are called Type-4clones. Note that all of these definitions were originallyintroduced for clone-pairs implemented in the sameprogramming language. In our cross-language clone researchthese definitions are no longer applicable as-is, and have tobe refined to meet our research context. For example, the VBand C# fragments in Fig. 1 would be considered Type-1clones in the cross-language clone detection since they areessentially performing the same task implemented indifferent programming languages-----------------------------------------------------------the best of our knowledge, C2D2 [10] is theonly tool capable of detecting cross-language clones. It usesNRefactory Library to generate the Unified CodeDOM graphfor both C# and VB.NET. A string is generated by traversingthis graph and targeted to string matching algorithm(focusing on singlelanguage clone detection, mostly Java). One of the firststudies on Intermediate Language clone detection is byBaker [9]. After some preprocessing (e.g., remappingoffsets), she uses three comparison techniques (e.g., Diff[22]) to find similar fragments. Davis and Godfrey [23] usethe disassembler for both Java and C/C++ to detect clones insingle language.Selim et al. use “Jimple” [24]Juricic [26] uses Intermediate Language codeto detect plagiarism and similarities. The approach is basedon Levenshtein Distance as the similarity measure tocompare disassembled C# binary, and applies some primitivepreprocessing techniques which are comparable to two of ourfilters.filters. There are also some formal approaches, such as byCuomo et al. [27] that transform Java bytecode tomathematical models for clone detection
{.NET targets multi-language development vs. java multi-platform direction{Now, the problem is changed to the single-language clone detection - so the problem is solved(It seems easy)Actually, it is possible but it is not easy which show in this paper why this is the case.NET: Contrary to Java, which targets application development using one language on several platforms, .NET aims for multi-language development on a single platform. It provides language interoperability, with each program module being able to use code written in the other languages.
{as far as it finds something it worth a try it is tempting to give it a try
{In this research we have observed interesting challenges in this problem, I am going to show some them using an exampleEven in this simple example, we can clearly see serious challenges to be addressedChallenges:1-2-larger unpredictable size [sebyte] (being a lower level language causes 5 to 20 LOC)2-High dissimilarity at bytecode, even in cases of semantically identical source code fragments-----Additional info------------------Being a lower levelrepresentation, CIL code size tends to be much larger thantraditional high-level source code. Fig. 1 (the first twocolumns) shows a comparison between a VB code fragment(a small VB method), and its corresponding CILrepresentation. In this example the method body with fivelines of code has been transformed to more than twenty linesof code in CIL. This creates an additional challenge, makingclone detection on binary rather different from source code.Nevertheless, given this common representation of codefragments written in different programming languagesprovides the ability to use CIL for clone detection across.NET languages. However, a key challenge is the fact that itis possible to have some dissimilarity at CIL level, even incases of semantically identical source code fragments(written in different .Net languages). The first four columnsof Fig. 1 (the Raw Data section) provide an example for suchdissimilarities. Both the VB and C# methods implement thesame program following similar coding pattern and structureas much as possible. However, when we compare the CILpairs, there are three key sections clearly distinguishable: (1)identical CIL content which is marked by the first dashedarea, (2) the first point of dissimilarity which is flagged bythe italic font style, and (3) the rest of the content marked bythe second dashed box that covers CIL content withconsiderable dissimilarity. In general, this examplehighlights the key challenge in binary clone detection, thepossibility of facing dissimilarity by exploiting .NETIntermediate Language even for semantically (and almostsyntactically) identical fragments in cross-language context
{In this research we have observed interesting challenges in this problem, I am going to show some them using an exampleEven in this simple example, we can clearly see serious challenges to be addressedChallenges:1-2-larger unpredictable size [sebyte] (being a lower level language causes 5 to 20 LOC)2-High dissimilarity at bytecode, even in cases of semantically identical source code fragments-----Additional info------------------Being a lower levelrepresentation, CIL code size tends to be much larger thantraditional high-level source code. Fig. 1 (the first twocolumns) shows a comparison between a VB code fragment(a small VB method), and its corresponding CILrepresentation. In this example the method body with fivelines of code has been transformed to more than twenty linesof code in CIL. This creates an additional challenge, makingclone detection on binary rather different from source code.Nevertheless, given this common representation of codefragments written in different programming languagesprovides the ability to use CIL for clone detection across.NET languages. However, a key challenge is the fact that itis possible to have some dissimilarity at CIL level, even incases of semantically identical source code fragments(written in different .Net languages). The first four columnsof Fig. 1 (the Raw Data section) provide an example for suchdissimilarities. Both the VB and C# methods implement thesame program following similar coding pattern and structureas much as possible. However, when we compare the CILpairs, there are three key sections clearly distinguishable: (1)identical CIL content which is marked by the first dashedarea, (2) the first point of dissimilarity which is flagged bythe italic font style, and (3) the rest of the content marked bythe second dashed box that covers CIL content withconsiderable dissimilarity. In general, this examplehighlights the key challenge in binary clone detection, thepossibility of facing dissimilarity by exploiting .NETIntermediate Language even for semantically (and almostsyntactically) identical fragments in cross-language context
{In this research we have observed interesting challenges in this problem, I am going to show some them using an exampleEven in this simple example, we can clearly see serious challenges to be addressedChallenges:1-2-larger unpredictable size [sebyte] (being a lower level language causes 5 to 20 LOC)2-High dissimilarity at bytecode, even in cases of semantically identical source code fragments-----Additional info------------------Being a lower levelrepresentation, CIL code size tends to be much larger thantraditional high-level source code. Fig. 1 (the first twocolumns) shows a comparison between a VB code fragment(a small VB method), and its corresponding CILrepresentation. In this example the method body with fivelines of code has been transformed to more than twenty linesof code in CIL. This creates an additional challenge, makingclone detection on binary rather different from source code.Nevertheless, given this common representation of codefragments written in different programming languagesprovides the ability to use CIL for clone detection across.NET languages. However, a key challenge is the fact that itis possible to have some dissimilarity at CIL level, even incases of semantically identical source code fragments(written in different .Net languages). The first four columnsof Fig. 1 (the Raw Data section) provide an example for suchdissimilarities. Both the VB and C# methods implement thesame program following similar coding pattern and structureas much as possible. However, when we compare the CILpairs, there are three key sections clearly distinguishable: (1)identical CIL content which is marked by the first dashedarea, (2) the first point of dissimilarity which is flagged bythe italic font style, and (3) the rest of the content marked bythe second dashed box that covers CIL content withconsiderable dissimilarity. In general, this examplehighlights the key challenge in binary clone detection, thepossibility of facing dissimilarity by exploiting .NETIntermediate Language even for semantically (and almostsyntactically) identical fragments in cross-language context
Filter 1: Removal of the instruction address (IL_xxxx:) atthe begin of each CIL instruction, eliminating dissimilaritiesdue to application/environment specific variations.Filter 2: Removal of instruction address (IL_xxxx:) forbranching statement. As part of this filtering step we cover all33 branching statements (e.g. beq, beq.s, bge).Filter 3: Removal of integer values that represent argumentnumber in CIL. e.g. ldarg 3 is interpreted in CIL as load theargument number 3 onto the stack. Instructions included in thisfilter are: starg, starg.s, ldrag, ldrag.s, ldrags,and ldraga.s.Filter 4: This filter eliminates constants in the CIL code,e.g. “ldc.i4 num” which corresponds to a Push numof typeint32 onto the stack as int32. Instructions covered by this filterare ldc.i4, ldc.i8, ldc.r4, ldc.r8, and ldc.i4.s.Filter 5: This filter removes all print literals in the CILcode, which are identified through ldstr statements.Filter 6: This filter removes all variable indexes like stlocindex, which correspond to popping a value from stack into alocal variable. Among the instructions removed by this filterare: ldloc, ldloc.s, ldloca.s, stloc and stloc.s.Filter 7: This filter removes some additional data typesand constant integers such as i4 from “ldc.i4. 1”. The completecommand pushes 1 as an int32 onto the stack.Filter 8: Is not actually a new filter, it combines all sevenfiltering techniques mentioned above, including thepreprocessing tasks in one single filter.
{Before filter (50% sim) after filter almost similarWe address this challenge by creating a setof cleaning and filtering steps for CIL to improve theperformance of Type-1, Type-2, Type-3 and Type-4 clonedetection in the CIL code. The filters are designed to improvethe detection rate (i.e., recall) since the CIL data contains asignificant amount of noise (e.g., reference numbers to stringtables, which are compilation context dependent). Due to suchnoise in the CIL files, two semantically identical source codefragments might no longer be considered as highly similar atthe CIL level (e.g., content similar VB and C# methods mighthave less than 50% similarity at the CIL level, see Fig. 1).
Filters increases RECALLBut might decrease PRECisiondrasticlyA major threat to any filter-based approach is the loss ofprecision by filtering out essential data. As a result, excessiveor improper loss of data (due to filtering) can lead to situationwhere non-answers and actual answers become similar to thedecision making algorithm, which eventually leads to anincrease in the false positive ratio
{measures the effectiveness of each filter. That is how much it increases the content similarity after filtering comparing to before{iText.NET (J#) 25 Feature Code Sample (C#, VB.NET, J#)->75 code fragmentsmutually created three true positive clone-pair sets (VB-C#, VB-J#, and C#-J#)The second dataset (a.k.a., NonclonedFragments Dataset) contains 25 non-clone classes andAs well, 75 false positive clone-pair candidates created in the samemanner as clone classes.---Additional Info:To answer this question, we defined a metric called FilterContribution that measures the effectiveness of each filter. Theunderlying idea is to measure the similarity degree of candidateclone-pairs before and after applying different filters. Themeasure will indicate how much a particular filter increases thesimilarity value between two fragments. Note that in the idealcase, we expect that a filter would increase the similarity valuesof true positive cases significantly more than the ones for falsepositive cases. Otherwise, a particular filter would not be usefulto discriminate (with high confidence) against false positives.The Filter Contribution (FltrCntrb) function is defined in Eq. 2,which is based on LCS-based similarity. denotes theparticipant fragments in the clone-pair under investigation and presents the filter function with x being the filter number.
It has no negative effect In the most cases, the filters increased thesimilarity up to ~0.2 (max) for non-cloned pairs whileimproving the similarity of cloned pairs by at least ~0.3.F8: (non-cloned pairs less than 0.5, while for themajority of cloned pairs the similarity increases between 0.5and 0.8.Thisresult supports our research hypothesis that filtering increasesthe similarity values for true positive cases (the cloned dataset)with a higher ratio than the false positive cases (the non-cloneddataset).
Not only has no negative effect but also it contributes to descriminate between themTo support our claim, we conducted another case study onthe same dataset to determine if our filters can be used toidentify an appropriate similarity threshold. Fig. 3 summarizesthe findings, showing that before applying our filters, there wasno clear distinction between similarity values of actual clonepairs(true positives) and false positives. Therefore it isimpossible to determine an adequate threshold that allowsseparating actual clones from false positives. In contrast, Fig. 3shows that filters address this problem by increasing thedistance between the two groups (tagged on the right side ofFig. 3). For example, using our filters, a threshold from 0.4 to0.55 can separate true positives from false positives with highconfidence.
Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation. The idea behind using faces is that humans easily recognize faces and notice small changes without difficulty.Glyphswe produced seven facefeatures for each pair by calculating Filter Contribution on allseven filters separately. That is, each pair can be modeled usinga vector in a multi-dimensional space (in our case, sevendimensions).----Filter 1, 2, and 5 since they are mapped to: (1) theface size, (2) distance between forehead and jaw, and (3)distance between eyes respectively. Therefore, it is alsopossible to intuitively observe that Filter 1, 2, and 5 (includingFilter 7 observed in Fig. 5) play the major role incharacterization of true positives.
participant source code affects the similarity in IL levelA new interesting discovery“Is filtering neutral to the participating programming languages of clone-pairs (in cross-language clone detection context)?”.That is most of the faces are notround shaped comparing to the two other groups
using three editdistance methods (LCS, LEV, SimHash) to avoid comparison function dependency in further case studies
The noticeable difference in project metrics (e.g., LOC)can be attributed to the (1) dissimilarities in the programminglanguages, and (2) re-engineering and refactoring tasks.PDF Lib called iText and iText.NET. While their project namesare similar, both projects are completely independent from eachother. We created our third dataset from the iText (C# branch)and iText.NET (J#) source code.
it is possible to detect numerous candidate clone-pairs even for cross-language case regardlessof the underlying algorithm, -------------Additional Info(2) no candidate clone-pair isdetected for cross-language using 1.0 as the Similarity Factor(i.e., the decision making threshold), which would only reportclone-pairs with complete identical content. Therefore, evenusing filtering on highly similar cross-language clone-pairs(e.g., Fig. 1), some dissimilarities will have to be handled bythe clone detection approach. However, this is not the case forsingle language clone detection (shown in Fig. 7), (3) for alldataset, we can observe a major decrease in the number ofcandidates when the threshold value is set to a range between0.6 and 0.8 (marked by ovals).
Quality evaluation is inherently challenging in our researchsince there is no clear agreement on what constitutes truepositives (TP) and the various clone types definitions.Therefore, we applied in our qualitative evaluation thefollowing approach: (1) since it is possible to easily locate withconfidence false positives among candidate clone-pairs, wefirst tag all false positives; (2) we assume the rest as truepositive. However, in order to provide a more in depth qualityassessment, we also analyze the quality of the reported truepositives.--------Fig. 9 reviews the findings of our quality evaluation frommanually assessing ~2K candidate clone-pairs (answeringRQ4). In general, using the Normal threshold all candidateclone-pairs that were reported are true positive (100% TP). Thequality decreases with less restrictive thresholds. For exampleusing SimHash and the Extreme threshold, the reported TPreduces to ~40%. The optimum, considering the trade-offbetween precision and recall, was achieved using LevenshteinDistance-based comparison with the High threshold (80% TP).Nevertheless, this result is not 100% precise
{Why we need such topic in general from industry points of view, these constitutes our motivation{Application being developed in different lagnuages (customer/contract iText iText.NET, legal issues {community Hibernate > NHibernate)
using three editdistance methods (LCS, LEV, SimHash) to avoid comparison function dependency in further case studies
A major threat to any filter-based approach is the loss ofprecision by filtering out essential data. As a result, excessiveor improper loss of data (due to filtering) can lead to situationwhere non-answers and actual answers become similar to thedecision making algorithm, which eventually leads to anincrease in the false positive ratio
It is a detailed study on challenges, possible solution and evaluation, final resultNot only a clone detection approach but also important study which gives insight for futture research
the best of our knowledge, C2D2 [10] is theonly tool capable of detecting cross-language clones. It usesNRefactory Library to generate the Unified CodeDOM graphfor both C# and VB.NET. A string is generated by traversingthis graph and targeted to string matching algorithm(focusing on singlelanguage clone detection, mostly Java). One of the firststudies on Intermediate Language clone detection is byBaker [9]. After some preprocessing (e.g., remappingoffsets), she uses three comparison techniques (e.g., Diff[22]) to find similar fragments. Davis and Godfrey [23] usethe disassembler for both Java and C/C++ to detect clones insingle language.Selim et al. use “Jimple” [24]Juricic [26] uses Intermediate Language codeto detect plagiarism and similarities. The approach is basedon Levenshtein Distance as the similarity measure tocompare disassembled C# binary, and applies some primitivepreprocessing techniques which are comparable to two of ourfilters.filters. There are also some formal approaches, such as byCuomo et al. [27] that transform Java bytecode tomathematical models for clone detection