This paper proposes a tool to detect plagiarism in Java source code. It first normalizes the input and original codes by removing whitespace, comments, keywords, operators and standardizing identifiers. It then uses the Levenshtein distance algorithm to calculate the distance between the normalized codes. Based on this distance and the code lengths, it calculates a plagiarism percentage. The tool was tested on sample code pairs, finding lower plagiarism percentages than existing tools. It is concluded to be more suitable for detecting plagiarism in Java codes.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
Dynamic Multi Levels Java Code Obfuscation Technique (DMLJCOT)CSCJournals
Several obfuscation tools and software are available for Java programs but larger part of these
software and tools just scramble the names of the classes or the identifiers that stored in a
bytecode by replacing the identifiers and classes names with meaningless names. Unfortunately,
these tools are week, since the java, compiler and java virtual machine (JVM) will never load and
execute scrambled classes. However, these classes must be decrypted in order to enable JVM
loaded them, which make it easy to intercept the original bytecode of programs at that point, as if
it is not been obfuscated. In this paper, we presented a dynamic obfuscation technique for java
programs. In order to deter reverse engineers from de-compilation of software, this technique
integrates three levels of obfuscation, source code, lexical transformation and the data
transformation level in which we obfuscate the data structures of the source code and byte-code
transformation level. By combining these levels, we achieved a high level of code confusion,
which makes the understanding or decompiling the java programs very complex or infeasible.
The proposed technique implemented and tested successfully by many Java de-compilers, like
JV, CAVJ, DJ, JBVD and AndroChef. The results show that all decompiles are deceived by the
proposed obfuscation technique
Software Birthmark Based Theft/Similarity Comparisons of JavaScript ProgramsSwati Patel
A birthmark is a set of characteristic possessed by a program that uniquely recognizes a program. Birthmark of the software is based on Heap Graph. It is generated by using Google Chrome Developer Tools when the program is in execution. Software’s behavioural structure is demonstrated in the heap graph. It describes how the objects are related to each other to deliver the desired functionality of the website. Our aim is to develop and evaluate a system that can find theft/similarity between websites by using Agglomerative Clustering and Improved Frequent Subgraph Mining. To identify if a website is using the original program’s code or its module, birthmark of the original program is explored in the suspected program’s heap graph.
Multi step automated refactoring for code smelleSAT Journals
Abstract
Brain MR Image can detect many abnormalities like tumor, cysts, bleeding, infection etc. Analysis of brain MRI using image
processing techniques has been an active research in the field of medical imaging. In this work, it is shown that MR image of brain
represent a multi fractal system which is described a continuous spectrum of exponents rather than a single exponent (fractal
dimension). Multi fractal analysis has been performed on number of images from OASIS database are analyzed. The properties of
multi fractal spectrum of a system have been exploited to prove the results. Multi fractal spectra are determined using the modified
box-counting method of fractal dimension estimation.
Keywords: Brain MR Image, Multi fractal, Box-counting
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
The major part of risk the development of software orprograms is existence ofduplicate code that can affect the software maintainability. The main aim of Clone
identification technique is to search and detect the parts of the software code which is
identical. In the passed there are various techniques that are used to identify andreflect the code identity and code fragments.Code cloning reduces the time and effort of the softwaredeveloper but it alsodecreases the quality of the software like readability, changeability and increasesmaintainability. So, code clone has to be detected to reducethe cost of maintenance tosome extent. In this paper, a new Generic technique is purposed to detect code clone
from various input source codes (from web, disk and etc.,) by segmenting the code intonumber of sub-programs or modules or functions. I propose a technique that candetect 1-type,2type, 3-type and 4-type clones efficiently.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
Dynamic Multi Levels Java Code Obfuscation Technique (DMLJCOT)CSCJournals
Several obfuscation tools and software are available for Java programs but larger part of these
software and tools just scramble the names of the classes or the identifiers that stored in a
bytecode by replacing the identifiers and classes names with meaningless names. Unfortunately,
these tools are week, since the java, compiler and java virtual machine (JVM) will never load and
execute scrambled classes. However, these classes must be decrypted in order to enable JVM
loaded them, which make it easy to intercept the original bytecode of programs at that point, as if
it is not been obfuscated. In this paper, we presented a dynamic obfuscation technique for java
programs. In order to deter reverse engineers from de-compilation of software, this technique
integrates three levels of obfuscation, source code, lexical transformation and the data
transformation level in which we obfuscate the data structures of the source code and byte-code
transformation level. By combining these levels, we achieved a high level of code confusion,
which makes the understanding or decompiling the java programs very complex or infeasible.
The proposed technique implemented and tested successfully by many Java de-compilers, like
JV, CAVJ, DJ, JBVD and AndroChef. The results show that all decompiles are deceived by the
proposed obfuscation technique
Software Birthmark Based Theft/Similarity Comparisons of JavaScript ProgramsSwati Patel
A birthmark is a set of characteristic possessed by a program that uniquely recognizes a program. Birthmark of the software is based on Heap Graph. It is generated by using Google Chrome Developer Tools when the program is in execution. Software’s behavioural structure is demonstrated in the heap graph. It describes how the objects are related to each other to deliver the desired functionality of the website. Our aim is to develop and evaluate a system that can find theft/similarity between websites by using Agglomerative Clustering and Improved Frequent Subgraph Mining. To identify if a website is using the original program’s code or its module, birthmark of the original program is explored in the suspected program’s heap graph.
Multi step automated refactoring for code smelleSAT Journals
Abstract
Brain MR Image can detect many abnormalities like tumor, cysts, bleeding, infection etc. Analysis of brain MRI using image
processing techniques has been an active research in the field of medical imaging. In this work, it is shown that MR image of brain
represent a multi fractal system which is described a continuous spectrum of exponents rather than a single exponent (fractal
dimension). Multi fractal analysis has been performed on number of images from OASIS database are analyzed. The properties of
multi fractal spectrum of a system have been exploited to prove the results. Multi fractal spectra are determined using the modified
box-counting method of fractal dimension estimation.
Keywords: Brain MR Image, Multi fractal, Box-counting
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
The major part of risk the development of software orprograms is existence ofduplicate code that can affect the software maintainability. The main aim of Clone
identification technique is to search and detect the parts of the software code which is
identical. In the passed there are various techniques that are used to identify andreflect the code identity and code fragments.Code cloning reduces the time and effort of the softwaredeveloper but it alsodecreases the quality of the software like readability, changeability and increasesmaintainability. So, code clone has to be detected to reducethe cost of maintenance tosome extent. In this paper, a new Generic technique is purposed to detect code clone
from various input source codes (from web, disk and etc.,) by segmenting the code intonumber of sub-programs or modules or functions. I propose a technique that candetect 1-type,2type, 3-type and 4-type clones efficiently.
The activities of copy and paste fragments of code from a source code into the other source code
is often done by software developers because it's easier than generate code manually. This behavior leads
to the increase of effort to maintain the code. One of the detection methods of semantic cloning is based
on the behavior of the code. The code behavior detected by observing at an input, output and the effects of
the method. Methods with the same value of input, output, and effect will indicate that semantically the
same. However, the detection method based on the input, output, and effect could not be used in a void
method or method without parameters, another side comprehensively detection is required. The challenge
is how to detect which variable in a method that acts as input, output, and effect. Detection of the variable
input, output, and effects in a void method done using Program Dependence Graph. The use of clone
detection methods semantically based on behavior can increase the agreement value.
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
Code clones have been studied for long, and there is strong evidence that they are a major source of software faults. The copying of code has been studied within software engineering mostly in the area of clone analysis. Software clones are regions of source code which are highly similar; these regions of similarity are called clones, clone classes, or clone pairs In this paper a hybrid approach using metric based technique with the combination of text based technique for detection and reporting of clones is proposed. The Proposed work is divided into two stages selection of potential clones and comparing of potential clones using textual comparison. The proposed technique detects exact clones on the basis of metric match and then by text match.
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
Graph databases offer security teams a new and more efficient way to find zero day vulnerabilities. As software development increases its reliance on open source libraries and release cycles get faster and faster application security is becoming more and more difficult. AppSec still has the same charter -- to find vulnerabilities in dev, before they reach prod, but now with more complexity and less time. Graphing source code, and traversing it to identify technical and business logic vulnerabilities, gives AppSec teams a much needed leg up identify zero days and stay ahead of attackers.
As numerous famous examples demonstrate, open source libraries are a common attack vector. Hence, AppSec teams must secure 3rd party dependencies just as vigorously as custom code. While much of the emphasis for securing open source libraries (OSS) has been on identifying and eliminating known CVEs, because OSS is widely used, zero-day vulnerabilities are often more likely to be found in popular OSS than custom code.
This webinar will cover the following:
An introduction to the emerging graph landscape and why it matters for AppSec
How a Fortune 500 company is using graphs to find zero days
Technical demo of finding technical and business logic vulnerabilities in source code
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUESIJNSA Journal
Malicious JavaScript code is still a problem for website and web users. The complication and equivocation of this code make the detection which is based on signatures of antivirus programs becomes ineffective. So far, the alternative methods using machine learning have achieved encouraging results, and have detected malicious JavaScript code with high accuracy. However, according to the supervised learning method, the models, which are introduced, depend on the number of labeled symbols and require significant computational resources to activate. The rapid growth of malicious JavaScript is a real challenge to the solutions based on supervised learning due to the lacking of experience in detecting new forms of malicious JavaScript code. In this paper, we deal with the challenge by the method of detecting malicious JavaScript based on clustering techniques. The known symbols that will be analyzed, the characteristics which are extracted, and a detection processing technique applied on output clusters are included in the model. This method is not computationally complicated, as well as the typical case experiments gave positive results; specifically, it has detected new forms of malicious JavaScript code.
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
A code smell is an indication in the source code that hypothetically indicates a design problem in the equivalent software. The Code smells are certain code lines which makes problems in source code. It also means that code lines are bad design shape or any code made by bad coding practices. Code smells are structural characteristics of software that may indicates a code or drawing problem that makes software hard to evolve and maintain, and may trigger refactoring of code. In this paper, we proposed some success issues for smell detection tools which can assistance to develop the user experience and therefore the acceptance of such tools. The process of detecting and removing code smells with refactoring can be overwhelming.
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities.
The activities of copy and paste fragments of code from a source code into the other source code
is often done by software developers because it's easier than generate code manually. This behavior leads
to the increase of effort to maintain the code. One of the detection methods of semantic cloning is based
on the behavior of the code. The code behavior detected by observing at an input, output and the effects of
the method. Methods with the same value of input, output, and effect will indicate that semantically the
same. However, the detection method based on the input, output, and effect could not be used in a void
method or method without parameters, another side comprehensively detection is required. The challenge
is how to detect which variable in a method that acts as input, output, and effect. Detection of the variable
input, output, and effects in a void method done using Program Dependence Graph. The use of clone
detection methods semantically based on behavior can increase the agreement value.
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
Code clones have been studied for long, and there is strong evidence that they are a major source of software faults. The copying of code has been studied within software engineering mostly in the area of clone analysis. Software clones are regions of source code which are highly similar; these regions of similarity are called clones, clone classes, or clone pairs In this paper a hybrid approach using metric based technique with the combination of text based technique for detection and reporting of clones is proposed. The Proposed work is divided into two stages selection of potential clones and comparing of potential clones using textual comparison. The proposed technique detects exact clones on the basis of metric match and then by text match.
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
Graph databases offer security teams a new and more efficient way to find zero day vulnerabilities. As software development increases its reliance on open source libraries and release cycles get faster and faster application security is becoming more and more difficult. AppSec still has the same charter -- to find vulnerabilities in dev, before they reach prod, but now with more complexity and less time. Graphing source code, and traversing it to identify technical and business logic vulnerabilities, gives AppSec teams a much needed leg up identify zero days and stay ahead of attackers.
As numerous famous examples demonstrate, open source libraries are a common attack vector. Hence, AppSec teams must secure 3rd party dependencies just as vigorously as custom code. While much of the emphasis for securing open source libraries (OSS) has been on identifying and eliminating known CVEs, because OSS is widely used, zero-day vulnerabilities are often more likely to be found in popular OSS than custom code.
This webinar will cover the following:
An introduction to the emerging graph landscape and why it matters for AppSec
How a Fortune 500 company is using graphs to find zero days
Technical demo of finding technical and business logic vulnerabilities in source code
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUESIJNSA Journal
Malicious JavaScript code is still a problem for website and web users. The complication and equivocation of this code make the detection which is based on signatures of antivirus programs becomes ineffective. So far, the alternative methods using machine learning have achieved encouraging results, and have detected malicious JavaScript code with high accuracy. However, according to the supervised learning method, the models, which are introduced, depend on the number of labeled symbols and require significant computational resources to activate. The rapid growth of malicious JavaScript is a real challenge to the solutions based on supervised learning due to the lacking of experience in detecting new forms of malicious JavaScript code. In this paper, we deal with the challenge by the method of detecting malicious JavaScript based on clustering techniques. The known symbols that will be analyzed, the characteristics which are extracted, and a detection processing technique applied on output clusters are included in the model. This method is not computationally complicated, as well as the typical case experiments gave positive results; specifically, it has detected new forms of malicious JavaScript code.
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
A code smell is an indication in the source code that hypothetically indicates a design problem in the equivalent software. The Code smells are certain code lines which makes problems in source code. It also means that code lines are bad design shape or any code made by bad coding practices. Code smells are structural characteristics of software that may indicates a code or drawing problem that makes software hard to evolve and maintain, and may trigger refactoring of code. In this paper, we proposed some success issues for smell detection tools which can assistance to develop the user experience and therefore the acceptance of such tools. The process of detecting and removing code smells with refactoring can be overwhelming.
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
Machine learning has firmly entrenched in a variety of human fields, from speech recognition to medical diagnosing. The popularity of this approach is so great that people try to use it wherever they can. Some attempts to replace classical approaches with neural networks turn up unsuccessful. This time we'll consider machine learning in terms of creating effective static code analyzers for finding bugs and potential vulnerabilities.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
2. 244 S. Srivastava et al.
Plagiarism in coding is not a completely novel experience. This concern has been
studied earlier by researchers to recognize the rigorousness of the problem [1, 2].
Plagiarism in programming assignment, not only engrossed the replication of source
code but comments and input data are also considered as plagiarism. There are many
reasons for students of getting involved in plagiarism like sometimes they feel lazy
to write their code. Usually, plagiarism in coding is firm to sense since similar coding
is used for the same application. Plagiarism in coding is straightforward to do but
tricky to detect. Scholars facsimile all or part of a program from a source or different
sources and put forward the fake as their work. This includes students who act as a
team and present analogous work. Such plagiarism is felt to be ordinary, even though
the true similarity level is hard to assess. When a teacher in a programming course
gives a common problem to all scholars then all have to work on the same problem.
Consequently, some scholars may inscribe the source code of a problem on their
own. While other scholars just obtain the code and change the variable names, the
order of statements, functions, and variables of a class. Such modifications in source
code are complicated to seize. There are two categories of source code variation:
lexical change and structural change. Lexical change can be done without any prior
programming knowledge. Structural changes need prior knowledge of programming
language. Change in the number of iterations, conditional statements, the order of
statements, a procedure to function, and vice versa, adding comments are structural
changes.
For the code in Fig. 1, one can use the same logic devoid of considering this
code. For sure, this is not considered plagiarism. Such a scenario can be handled by
putting some constraints over the size of the code. The constraints may be like that
if n consecutive lines are similar in two codes then it will be considered as stealing.
We need a system to calculate the similarity percentage of code between two Java
files. We proposed a plagiarism detection system based on a novel normalization
process, to identify the uniqueness of the scholar’s code by comparing the input code
with the original code. It may be used by the teachers to detect whether the student
committed plagiarism or not. This is possible when the plagiarism is estimated for
two Java files. If the percentage of plagiarism is less than the specified threshold,
then the input code is acceptable otherwise not.
Fig. 1 Sample code
3. A Tool to Detect Plagiarism in Java Source Code 245
The rest of the paper is organized as Sect. 2 represents the previous work on
plagiarism detection. Section 3 presents the proposed work. The results are discussed
in Sect. 4. Section 5 concludes the proposal.
2 Related Work
Many researchers have given methods for plagiarism detection in text and program-
ming code [3, 4, 5, 6]. While some researchers gave a comparison among different
plagiarism detection tools [7, 1, 2]. Nurhayati and Busman [8] intended the Leven-
shtein Distance (LD) algorithm for plagiarism detection in the document. They devel-
oped software for Android smartphones. One way to measure the distance is a string
metric which is the result of the LD algorithm. In [9], the authors created an appli-
cation using the LD algorithm to identify similarity in Java codes. A technique
for uncovering the plagiarism between C++ and Java codes based on semantics
has been projected in [10]. It is a multimedia-based e-Learning and smart estima-
tion method. Input code transformed into tokens to determine semantic comparison
token by token. Then it estimated the semantic similarity for the whole input code.
In literature, there exist many similarity detection algorithms. Based on these algo-
rithms, the researchers developed a similarity detection system referred to as SCSDS
[11]. SCSDS was slower than existing methods. By the fusion of various similarity
detection algorithms, the speed and performance of SCSDS became even worse.
SCSDS required speed and performance improvement. In [12], the plagiarism detec-
tion system considered only text documents for plagiarism tasks. No consideration
was given to the syntactical structure of formal programming language. They used
normalization of commonly used identifiers to detect a pair of programs that have the
same objective. They proved that removal of these normalized operations improves
the system.
3 Proposed Method
The proposed system aims to estimate the plagiarism percentage in the given input
code. Initially, the user needs to give an input code that has to be checked for plagia-
rism. The already available codes are called here as original codes that are used
for comparison. These two codes are stored in separate variables. After that, the
code stored in these two variables is converted to a form that can be easily used
for detecting plagiarism. This is done in the normalization step. Following steps are
performed to normalize the code:
• Removing white spaces
• Removing comments
• Removing all the keywords
4. 246 S. Srivastava et al.
• Removing all the operators
• Replacing all the identifiers with **identifier**
• Sorting.
Removing white spaces
Generally, there are white spaces before and after any operator to enhance the read-
ability. If the code is copied from any online platform then users generally take care
of these extra spaces because it looks like it has been copied. So, there is no need for
extra spaces as it will increase the length of our string. As the length of the string
increases, it will reflect on the LD algorithm as its complexity is O(n2
).
Removing comments
As comments do not affect the actual functioning of code, it is merely there for
understanding code in case of complex and long code. We are removing comments
because someone can add an extra comment or edit the copied comment. Since the
LD algorithm checks similarity character by character, it will affect the result of
our plagiarism detection tool. The following regular expression is used to detect the
comments.
replaceAll(“(?:/*(?:[ˆ*]|(?:*+[ˆ*/]))**+/)|(?://.*)”,”“))
Removing all the keywords
This is the most significant step. It involves removing all the keywords that belong to
a language. In our proposal, we check plagiarism only in Java code, so we removed
all the keywords that belong to Java language. We are removing keywords because
the code of the same program will generally have some type of data types and inbuilt
functions. Therefore, they are generally increasing the length of our string which
will again reflect the complexity as O(n2
). So, to save time and space we remove
keywords. Sometimes users come around with some hack and use different data
types and functions to complete the code. Although the code is copied, as he/she
understood the copied code, he/she edited it to avoid plagiarism. Removing all the
keywords will help in detecting the genuine similarity index.
Removing all the operators
Generally, codes of the same program used the same type and the same number of
operators even if they are not copied. They are only increasing the time and space
complexity of our code. To get away from this, we remove all the operators.
Replacing all the identifiers with **identifier**
Users generally change the name of identifiers involved in a code to dodge plagiarism.
So, we are renaming all the identifiers in both the codes that mean original code and
the code to be checked by “**identifier**”.
5. A Tool to Detect Plagiarism in Java Source Code 247
Sorting
Sort both the strings containing original code and the code to be checked alphabeti-
cally. A user can change the position of copied code (function, class, etc). Sometimes
user also changes the position of statements. Therefore, we need to sort both the
strings. The result of sorting is stored separately for original code as well as code to
be checked to detect plagiarism even if the user has changed the position of copied
code. This completes the normalization step.
After performing all these steps, we get normalized code that again can be stored
in a variable. Now, we simply apply the LD algorithm [8]. After that, we store the
result of the LD algorithm in a variable. Now, we calculate the plagiarized value
using the result of the LD algorithm.
Levenshtein Algorithm
The LD algorithm [8] is used to find the distance which is used for measuring the
dissimilarity between two progressions. This distance is referred to as Levenshtein
distance or edit distance. It may also denote a larger family of distance metrics. It
gives a minimum number of single-character alterations, essential to change one
word into the other, between two terms.
Calculating Plagiarism
After performing normalization, we get normalized codes in the form of string both
for original code and code to be checked. The original code is referred to as source
string (δ). The code to be checked string is referred to as the target string (ε). After
this, we fed these two strings to the LD algorithm. It gives us a numeric value which
corresponds to the difference between these two strings. This is called LD distance
( -
d) and is defined as:
(1)
Now, using plagiarized value formula, we can calculate plagiarism between these
two stings. The plagiarized value (ƥ) can be calculated as:
(2)
where -
d is the LD distance, δ represents the original code, ε is code to be checked
for plagiarism, max(δ, ε) is maximum length between δ and ε. Figure 2 shows the
working of the proposed plagiarism detection system.
6. 248 S. Srivastava et al.
Fig. 2 Framework of the
proposed plagiarism
detection system
7. A Tool to Detect Plagiarism in Java Source Code 249
4 Results and Findings
To estimate the plagiarism percentage of the given input code, first, the user needs
to give input code that has to be checked for plagiarism along with the original code.
Figures 3 and 4 show the samples of the original code and code to be checked, respec-
tively. This code is injected into the normalization step which results in normalized
code. Now, the LD algorithm [8] is applied to the normalized code. Then, using the
result of the LD algorithm, the plagiarized value can be estimated. Figure 5 shows
the user interface of the proposed system. Figure 6 shows the interface after filling
the code in the specified area. Figure 7 shows the estimated plagiarism by clicking on
the check fraud button. From Fig. 8, it can be observed that the standard plagiarism
detection software is not suitable to detect the originality of a Java programming
code. Since there are common keywords in a programming language used by the
programmers. Therefore, merely the detection of the same words is not the correct
criteria to investigate the originality of source code. As can be seen from Figs. 7
and 8, standard software (Turnitin) gives the similarity index of 78% whereas the
proposed system gives the similarity index of 51% for the same code. The similarity
index calculated by the proposed method and standard software can be compared
from Table 1. The above comparison can also be seen in Fig. 9. Thus, it can be
stated that the proposed system is more suitable for Java codes than other software
for originality detection of source code.
Fig. 3 Sample original code
8. 250 S. Srivastava et al.
Fig. 4 Sample code to be
checked
Fig. 5 User interface
9. A Tool to Detect Plagiarism in Java Source Code 251
Fig. 6 After filling both the text areas accordingly
Fig. 7 After clicking on check fraud
10. 252 S. Srivastava et al.
Fig. 8 Plagiarism report of a standard plagiarism detection software
Table 1 Comparison of
similarity indexes of proposed
system and existing software
Input Similarity index
(proposed system) (%)
Similarity index (existing
software) (%)
Code 1 51.85 7
Code 2 54.76 80
Code 3 57.29 83
Code 4 53.26 81
Fig. 9 Comparison of similarity indexes of proposed system and existing software
11. A Tool to Detect Plagiarism in Java Source Code 253
5 Conclusion
We have proposed a tool that can efficiently be used to check whether the input
Java code is plagiarized or not. To carry out plagiarism detection, first, the code is
preprocessed through normalization. Normalization of code consists of various steps:
removing white spaces, removing comments, removing all the keywords, removing
all the operators, replacing all the identifiers with **identifier**, sorting. Then the
normalized code is fed into the LD algorithm to obtain LD distance. The value
returned by the LD algorithm is used to calculate the plagiarized value. The proposed
tool only works on Java source code. Further, it could be extended to work on all
programming languages. Plagiarized value has been calculated for 4 codes through
the proposed system as well as the existing system. From the results, it can be
concluded that the proposed system is more suitable for Java codes than the existing
system for originality detection of source code.
References
1. Foltýnek Tomáš, Meuschke Norman, Gipp Bela (2019) Academic plagiarism detection: a
systematic literature review. ACM Comput Surv (CSUR) 52(6):1–42
2. Naik RR, Landge MB, Mahender CN (2015) A review on plagiarism detection tools. Int J
Comput Appl 125(11)
3. Ghanem B, Arafeh L, Rosso P, Sánchez-Vega F (2018) HYPLAG: hybrid Arabic text plagia-
rism detection system. In: International conference on applications of natural language to
information systems. Springer, Cham, pp 315–323
4. Jadalla Ameera, Elnagar Ashraf (2008) PDE4Java: plagiarism detection engine for java, source
code: a clustering approach. IJBIDM 3(2):121–135
5. Alzahrani SM, Salim N, Abraham A (2011) Understanding plagiarism linguistic patterns,
textual features, and detection methods. IEEE Trans Syst Man Cybern Part C (Appl Rev)
42(2):133–149
6. Sulistiani Lisan, Karnalim Oscar (2019) ES-Plag: efficient and sensitive source code plagiarism
detection tool for academic environment. Comput Appl Eng Educ 27(1):166–182
7. Ali AM, Abdulla HM, Snasel V (2011) Overview and comparison of plagiarism detection
tools. In: DATESO, pp 161–172
8. Nurhayati B, Busman B (2017) Development of document plagiarism detection software using
levensthein distance algorithm on Android smartphone. In: 2017 5th International conference
on cyber and IT service management (CITSM), pp 1–6
9. Liaqat AG, Ahmad A (2011) Plagiarism detection in java code
10. Ullah F, Wang J, Farhan M, Jabbar S, Wu Z, Khalid S (2018) Plagiarism detection in students’
programming assignments based on semantics: multimedia e-learning based smart assessment
methodology. In: Multimedia tools and applications, pp 1–18
11. Ðurić Zoran, Gašević Dragan (2013) A source code similarity system for plagiarism detection.
Comput J 56(1):70–86
12. Heblikar S, Sharma P, Munnangi M, Bankapur C (2015) Normalization based stop-word
approach to source code plagiarism detection. In: FIRE workshops, pp 6–9