SlideShare a Scribd company logo
1 of 11
Download to read offline
A Tool to Detect Plagiarism in Java
Source Code
Swati Srivastava, Akshit Rai, and Mahima Varshney
Abstract The act of plagiarism occurs when an author uses other author’s intellec-
tual ideas without his/her permission. In academics, scholars used to submit assign-
ments now and then in the form of codes or text documents. In this work, the primary
focus will be on program codes. In the current scenario, a major perspective is that
the scholars may facsimile the codes from a source record without appropriately
referencing the original writer or programmer. As there is a wide range of program-
ming languages like C, C++, Java, Python, and many more, in this paper, we have
dealt with Java code files. The objective of this work is to estimate the percentage of
plagiarism in the given input programming code.
Keywords Plagiarism · Java code · Normalization · Levenshtein distance ·
Similarity index
1 Introduction
The act of copying one person’s effort without taking permission is referred to as
plagiarism. It is like the act of stealing a car, watch, cell phone, and a variety of
gazettesofotherswhichisliabletobepunishedbylaw.Certainly,theactofstealingan
intellectual’s idea is considered unlawful. Though, this does not signify that scholars
should not observe diverse works or sources for references. To enhance knowledge by
taking opinions and ideas from experts is a good thing. Nevertheless, most prominent
is to ensure that the basis of the sources and references are accordingly credited.
S. Srivastava (B) · A. Rai · M. Varshney
Department of Computer Engineering and Applications, GLA University, Mathura, Uttar Pradesh,
India
e-mail: swati.srivastava@gla.ac.in
A. Rai
e-mail: akshit.rai_cs16@gla.ac.in
M. Varshney
e-mail: mahima.varshney_cs16@gla.ac.in
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Singapore Pte Ltd. 2021
G. Ranganathan et al. (eds.), Inventive Communication and Computational
Technologies, Lecture Notes in Networks and Systems 145,
https://doi.org/10.1007/978-981-15-7345-3_20
243
244 S. Srivastava et al.
Plagiarism in coding is not a completely novel experience. This concern has been
studied earlier by researchers to recognize the rigorousness of the problem [1, 2].
Plagiarism in programming assignment, not only engrossed the replication of source
code but comments and input data are also considered as plagiarism. There are many
reasons for students of getting involved in plagiarism like sometimes they feel lazy
to write their code. Usually, plagiarism in coding is firm to sense since similar coding
is used for the same application. Plagiarism in coding is straightforward to do but
tricky to detect. Scholars facsimile all or part of a program from a source or different
sources and put forward the fake as their work. This includes students who act as a
team and present analogous work. Such plagiarism is felt to be ordinary, even though
the true similarity level is hard to assess. When a teacher in a programming course
gives a common problem to all scholars then all have to work on the same problem.
Consequently, some scholars may inscribe the source code of a problem on their
own. While other scholars just obtain the code and change the variable names, the
order of statements, functions, and variables of a class. Such modifications in source
code are complicated to seize. There are two categories of source code variation:
lexical change and structural change. Lexical change can be done without any prior
programming knowledge. Structural changes need prior knowledge of programming
language. Change in the number of iterations, conditional statements, the order of
statements, a procedure to function, and vice versa, adding comments are structural
changes.
For the code in Fig. 1, one can use the same logic devoid of considering this
code. For sure, this is not considered plagiarism. Such a scenario can be handled by
putting some constraints over the size of the code. The constraints may be like that
if n consecutive lines are similar in two codes then it will be considered as stealing.
We need a system to calculate the similarity percentage of code between two Java
files. We proposed a plagiarism detection system based on a novel normalization
process, to identify the uniqueness of the scholar’s code by comparing the input code
with the original code. It may be used by the teachers to detect whether the student
committed plagiarism or not. This is possible when the plagiarism is estimated for
two Java files. If the percentage of plagiarism is less than the specified threshold,
then the input code is acceptable otherwise not.
Fig. 1 Sample code
A Tool to Detect Plagiarism in Java Source Code 245
The rest of the paper is organized as Sect. 2 represents the previous work on
plagiarism detection. Section 3 presents the proposed work. The results are discussed
in Sect. 4. Section 5 concludes the proposal.
2 Related Work
Many researchers have given methods for plagiarism detection in text and program-
ming code [3, 4, 5, 6]. While some researchers gave a comparison among different
plagiarism detection tools [7, 1, 2]. Nurhayati and Busman [8] intended the Leven-
shtein Distance (LD) algorithm for plagiarism detection in the document. They devel-
oped software for Android smartphones. One way to measure the distance is a string
metric which is the result of the LD algorithm. In [9], the authors created an appli-
cation using the LD algorithm to identify similarity in Java codes. A technique
for uncovering the plagiarism between C++ and Java codes based on semantics
has been projected in [10]. It is a multimedia-based e-Learning and smart estima-
tion method. Input code transformed into tokens to determine semantic comparison
token by token. Then it estimated the semantic similarity for the whole input code.
In literature, there exist many similarity detection algorithms. Based on these algo-
rithms, the researchers developed a similarity detection system referred to as SCSDS
[11]. SCSDS was slower than existing methods. By the fusion of various similarity
detection algorithms, the speed and performance of SCSDS became even worse.
SCSDS required speed and performance improvement. In [12], the plagiarism detec-
tion system considered only text documents for plagiarism tasks. No consideration
was given to the syntactical structure of formal programming language. They used
normalization of commonly used identifiers to detect a pair of programs that have the
same objective. They proved that removal of these normalized operations improves
the system.
3 Proposed Method
The proposed system aims to estimate the plagiarism percentage in the given input
code. Initially, the user needs to give an input code that has to be checked for plagia-
rism. The already available codes are called here as original codes that are used
for comparison. These two codes are stored in separate variables. After that, the
code stored in these two variables is converted to a form that can be easily used
for detecting plagiarism. This is done in the normalization step. Following steps are
performed to normalize the code:
• Removing white spaces
• Removing comments
• Removing all the keywords
246 S. Srivastava et al.
• Removing all the operators
• Replacing all the identifiers with **identifier**
• Sorting.
Removing white spaces
Generally, there are white spaces before and after any operator to enhance the read-
ability. If the code is copied from any online platform then users generally take care
of these extra spaces because it looks like it has been copied. So, there is no need for
extra spaces as it will increase the length of our string. As the length of the string
increases, it will reflect on the LD algorithm as its complexity is O(n2
).
Removing comments
As comments do not affect the actual functioning of code, it is merely there for
understanding code in case of complex and long code. We are removing comments
because someone can add an extra comment or edit the copied comment. Since the
LD algorithm checks similarity character by character, it will affect the result of
our plagiarism detection tool. The following regular expression is used to detect the
comments.
replaceAll(“(?:/*(?:[ˆ*]|(?:*+[ˆ*/]))**+/)|(?://.*)”,”“))
Removing all the keywords
This is the most significant step. It involves removing all the keywords that belong to
a language. In our proposal, we check plagiarism only in Java code, so we removed
all the keywords that belong to Java language. We are removing keywords because
the code of the same program will generally have some type of data types and inbuilt
functions. Therefore, they are generally increasing the length of our string which
will again reflect the complexity as O(n2
). So, to save time and space we remove
keywords. Sometimes users come around with some hack and use different data
types and functions to complete the code. Although the code is copied, as he/she
understood the copied code, he/she edited it to avoid plagiarism. Removing all the
keywords will help in detecting the genuine similarity index.
Removing all the operators
Generally, codes of the same program used the same type and the same number of
operators even if they are not copied. They are only increasing the time and space
complexity of our code. To get away from this, we remove all the operators.
Replacing all the identifiers with **identifier**
Users generally change the name of identifiers involved in a code to dodge plagiarism.
So, we are renaming all the identifiers in both the codes that mean original code and
the code to be checked by “**identifier**”.
A Tool to Detect Plagiarism in Java Source Code 247
Sorting
Sort both the strings containing original code and the code to be checked alphabeti-
cally. A user can change the position of copied code (function, class, etc). Sometimes
user also changes the position of statements. Therefore, we need to sort both the
strings. The result of sorting is stored separately for original code as well as code to
be checked to detect plagiarism even if the user has changed the position of copied
code. This completes the normalization step.
After performing all these steps, we get normalized code that again can be stored
in a variable. Now, we simply apply the LD algorithm [8]. After that, we store the
result of the LD algorithm in a variable. Now, we calculate the plagiarized value
using the result of the LD algorithm.
Levenshtein Algorithm
The LD algorithm [8] is used to find the distance which is used for measuring the
dissimilarity between two progressions. This distance is referred to as Levenshtein
distance or edit distance. It may also denote a larger family of distance metrics. It
gives a minimum number of single-character alterations, essential to change one
word into the other, between two terms.
Calculating Plagiarism
After performing normalization, we get normalized codes in the form of string both
for original code and code to be checked. The original code is referred to as source
string (δ). The code to be checked string is referred to as the target string (ε). After
this, we fed these two strings to the LD algorithm. It gives us a numeric value which
corresponds to the difference between these two strings. This is called LD distance
( -
d) and is defined as:
(1)
Now, using plagiarized value formula, we can calculate plagiarism between these
two stings. The plagiarized value (ƥ) can be calculated as:
(2)
where -
d is the LD distance, δ represents the original code, ε is code to be checked
for plagiarism, max(δ, ε) is maximum length between δ and ε. Figure 2 shows the
working of the proposed plagiarism detection system.
248 S. Srivastava et al.
Fig. 2 Framework of the
proposed plagiarism
detection system
A Tool to Detect Plagiarism in Java Source Code 249
4 Results and Findings
To estimate the plagiarism percentage of the given input code, first, the user needs
to give input code that has to be checked for plagiarism along with the original code.
Figures 3 and 4 show the samples of the original code and code to be checked, respec-
tively. This code is injected into the normalization step which results in normalized
code. Now, the LD algorithm [8] is applied to the normalized code. Then, using the
result of the LD algorithm, the plagiarized value can be estimated. Figure 5 shows
the user interface of the proposed system. Figure 6 shows the interface after filling
the code in the specified area. Figure 7 shows the estimated plagiarism by clicking on
the check fraud button. From Fig. 8, it can be observed that the standard plagiarism
detection software is not suitable to detect the originality of a Java programming
code. Since there are common keywords in a programming language used by the
programmers. Therefore, merely the detection of the same words is not the correct
criteria to investigate the originality of source code. As can be seen from Figs. 7
and 8, standard software (Turnitin) gives the similarity index of 78% whereas the
proposed system gives the similarity index of 51% for the same code. The similarity
index calculated by the proposed method and standard software can be compared
from Table 1. The above comparison can also be seen in Fig. 9. Thus, it can be
stated that the proposed system is more suitable for Java codes than other software
for originality detection of source code.
Fig. 3 Sample original code
250 S. Srivastava et al.
Fig. 4 Sample code to be
checked
Fig. 5 User interface
A Tool to Detect Plagiarism in Java Source Code 251
Fig. 6 After filling both the text areas accordingly
Fig. 7 After clicking on check fraud
252 S. Srivastava et al.
Fig. 8 Plagiarism report of a standard plagiarism detection software
Table 1 Comparison of
similarity indexes of proposed
system and existing software
Input Similarity index
(proposed system) (%)
Similarity index (existing
software) (%)
Code 1 51.85 7
Code 2 54.76 80
Code 3 57.29 83
Code 4 53.26 81
Fig. 9 Comparison of similarity indexes of proposed system and existing software
A Tool to Detect Plagiarism in Java Source Code 253
5 Conclusion
We have proposed a tool that can efficiently be used to check whether the input
Java code is plagiarized or not. To carry out plagiarism detection, first, the code is
preprocessed through normalization. Normalization of code consists of various steps:
removing white spaces, removing comments, removing all the keywords, removing
all the operators, replacing all the identifiers with **identifier**, sorting. Then the
normalized code is fed into the LD algorithm to obtain LD distance. The value
returned by the LD algorithm is used to calculate the plagiarized value. The proposed
tool only works on Java source code. Further, it could be extended to work on all
programming languages. Plagiarized value has been calculated for 4 codes through
the proposed system as well as the existing system. From the results, it can be
concluded that the proposed system is more suitable for Java codes than the existing
system for originality detection of source code.
References
1. Foltýnek Tomáš, Meuschke Norman, Gipp Bela (2019) Academic plagiarism detection: a
systematic literature review. ACM Comput Surv (CSUR) 52(6):1–42
2. Naik RR, Landge MB, Mahender CN (2015) A review on plagiarism detection tools. Int J
Comput Appl 125(11)
3. Ghanem B, Arafeh L, Rosso P, Sánchez-Vega F (2018) HYPLAG: hybrid Arabic text plagia-
rism detection system. In: International conference on applications of natural language to
information systems. Springer, Cham, pp 315–323
4. Jadalla Ameera, Elnagar Ashraf (2008) PDE4Java: plagiarism detection engine for java, source
code: a clustering approach. IJBIDM 3(2):121–135
5. Alzahrani SM, Salim N, Abraham A (2011) Understanding plagiarism linguistic patterns,
textual features, and detection methods. IEEE Trans Syst Man Cybern Part C (Appl Rev)
42(2):133–149
6. Sulistiani Lisan, Karnalim Oscar (2019) ES-Plag: efficient and sensitive source code plagiarism
detection tool for academic environment. Comput Appl Eng Educ 27(1):166–182
7. Ali AM, Abdulla HM, Snasel V (2011) Overview and comparison of plagiarism detection
tools. In: DATESO, pp 161–172
8. Nurhayati B, Busman B (2017) Development of document plagiarism detection software using
levensthein distance algorithm on Android smartphone. In: 2017 5th International conference
on cyber and IT service management (CITSM), pp 1–6
9. Liaqat AG, Ahmad A (2011) Plagiarism detection in java code
10. Ullah F, Wang J, Farhan M, Jabbar S, Wu Z, Khalid S (2018) Plagiarism detection in students’
programming assignments based on semantics: multimedia e-learning based smart assessment
methodology. In: Multimedia tools and applications, pp 1–18
11. Ðurić Zoran, Gašević Dragan (2013) A source code similarity system for plagiarism detection.
Comput J 56(1):70–86
12. Heblikar S, Sharma P, Munnangi M, Bankapur C (2015) Normalization based stop-word
approach to source code plagiarism detection. In: FIRE workshops, pp 6–9

More Related Content

Similar to A Tool to Detect Plagiarism in Java Source Code.pdf

Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...IRJET Journal
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowMassimiliano Di Penta
 
Behavioral Analysis for Detecting Code Clones
Behavioral Analysis for Detecting Code ClonesBehavioral Analysis for Detecting Code Clones
Behavioral Analysis for Detecting Code ClonesTELKOMNIKA JOURNAL
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesIRJET Journal
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
Software Birthmark for Theft Detection of JavaScript Programs: A Survey
Software Birthmark for Theft Detection of JavaScript Programs: A Survey Software Birthmark for Theft Detection of JavaScript Programs: A Survey
Software Birthmark for Theft Detection of JavaScript Programs: A Survey Swati Patel
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
 
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsIRJET Journal
 
Algorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsAlgorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsKarin Faust
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
Online java compiler with security editor
Online java compiler with security editorOnline java compiler with security editor
Online java compiler with security editorIRJET Journal
 
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUES
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUESMALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUES
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUESIJNSA Journal
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsUltraUploader
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeAndrey Karpov
 
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLPIRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLPIRJET Journal
 
IRJET- Obfuscation: Maze of Code
IRJET- Obfuscation: Maze of CodeIRJET- Obfuscation: Maze of Code
IRJET- Obfuscation: Maze of CodeIRJET Journal
 

Similar to A Tool to Detect Plagiarism in Java Source Code.pdf (20)

Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
Study on Different Code-Clone Detection Techniques & Approaches to MitigateCo...
 
Put Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and HowPut Your Hands in the Mud: What Technique, Why, and How
Put Your Hands in the Mud: What Technique, Why, and How
 
Behavioral Analysis for Detecting Code Clones
Behavioral Analysis for Detecting Code ClonesBehavioral Analysis for Detecting Code Clones
Behavioral Analysis for Detecting Code Clones
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Software Birthmark for Theft Detection of JavaScript Programs: A Survey
Software Birthmark for Theft Detection of JavaScript Programs: A Survey Software Birthmark for Theft Detection of JavaScript Programs: A Survey
Software Birthmark for Theft Detection of JavaScript Programs: A Survey
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
 
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyFinding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case Study
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
 
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming Assignments
 
Algorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsAlgorithm Identification In Programming Assignments
Algorithm Identification In Programming Assignments
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Online java compiler with security editor
Online java compiler with security editorOnline java compiler with security editor
Online java compiler with security editor
 
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUES
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUESMALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUES
MALICIOUS JAVASCRIPT DETECTION BASED ON CLUSTERING TECHNIQUES
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
 
Machine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source CodeMachine Learning in Static Analysis of Program Source Code
Machine Learning in Static Analysis of Program Source Code
 
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLPIRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
 
website phishing by NR
website phishing by NRwebsite phishing by NR
website phishing by NR
 
IRJET- Obfuscation: Maze of Code
IRJET- Obfuscation: Maze of CodeIRJET- Obfuscation: Maze of Code
IRJET- Obfuscation: Maze of Code
 

More from Kayla Smith

How To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How ToHow To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How ToKayla Smith
 
The Best Essay Writing Servic
The Best Essay Writing ServicThe Best Essay Writing Servic
The Best Essay Writing ServicKayla Smith
 
Best Tips For Writing A Good Research Paper
Best Tips For Writing A Good Research PaperBest Tips For Writing A Good Research Paper
Best Tips For Writing A Good Research PaperKayla Smith
 
Scholarship Essay Compare And Contrast Essay Outline
Scholarship Essay Compare And Contrast Essay OutlineScholarship Essay Compare And Contrast Essay Outline
Scholarship Essay Compare And Contrast Essay OutlineKayla Smith
 
Illustration Essay (400 Words) - PHDessay.Com
Illustration Essay (400 Words) - PHDessay.ComIllustration Essay (400 Words) - PHDessay.Com
Illustration Essay (400 Words) - PHDessay.ComKayla Smith
 
MBA Essay Writing Service - Get The Best Help
MBA Essay Writing Service - Get The Best HelpMBA Essay Writing Service - Get The Best Help
MBA Essay Writing Service - Get The Best HelpKayla Smith
 
Here Are 7 Reasons Why
Here Are 7 Reasons WhyHere Are 7 Reasons Why
Here Are 7 Reasons WhyKayla Smith
 
27 Outstanding College Essay Examples College
27 Outstanding College Essay Examples College27 Outstanding College Essay Examples College
27 Outstanding College Essay Examples CollegeKayla Smith
 
How To Start An Essay With A Quote Basic TipsSample
How To Start An Essay With A Quote Basic TipsSampleHow To Start An Essay With A Quote Basic TipsSample
How To Start An Essay With A Quote Basic TipsSampleKayla Smith
 
How To Format Essays Ocean County College NJ
How To Format Essays  Ocean County College NJHow To Format Essays  Ocean County College NJ
How To Format Essays Ocean County College NJKayla Smith
 
Essay Writing - A StudentS Guide (Ideal For Yr 12 And
Essay Writing - A StudentS Guide (Ideal For Yr 12 AndEssay Writing - A StudentS Guide (Ideal For Yr 12 And
Essay Writing - A StudentS Guide (Ideal For Yr 12 AndKayla Smith
 
Winter Snowflake Writing Paper By Coffee For The Kid
Winter Snowflake Writing Paper By Coffee For The KidWinter Snowflake Writing Paper By Coffee For The Kid
Winter Snowflake Writing Paper By Coffee For The KidKayla Smith
 
Example Of Case Study Research Paper - 12+ Cas
Example Of Case Study Research Paper - 12+ CasExample Of Case Study Research Paper - 12+ Cas
Example Of Case Study Research Paper - 12+ CasKayla Smith
 
How To Write A Term Paper S
How To Write A Term Paper SHow To Write A Term Paper S
How To Write A Term Paper SKayla Smith
 
Essay Computers For And Against Telegraph
Essay Computers For And Against  TelegraphEssay Computers For And Against  Telegraph
Essay Computers For And Against TelegraphKayla Smith
 
A conceptual framework for international human resource management research i...
A conceptual framework for international human resource management research i...A conceptual framework for international human resource management research i...
A conceptual framework for international human resource management research i...Kayla Smith
 
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdf
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdfA-Guide-to-Reading-and-Writing-Japanese.pdf.pdf
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdfKayla Smith
 
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...Associating to Create Unique Tourist Experiences of Small Wineries in Contine...
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...Kayla Smith
 
Academic Reference Management.pdf
Academic Reference Management.pdfAcademic Reference Management.pdf
Academic Reference Management.pdfKayla Smith
 

More from Kayla Smith (20)

How To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How ToHow To Write A Good Hook For An English Essay - How To
How To Write A Good Hook For An English Essay - How To
 
The Best Essay Writing Servic
The Best Essay Writing ServicThe Best Essay Writing Servic
The Best Essay Writing Servic
 
Best Tips For Writing A Good Research Paper
Best Tips For Writing A Good Research PaperBest Tips For Writing A Good Research Paper
Best Tips For Writing A Good Research Paper
 
Scholarship Essay Compare And Contrast Essay Outline
Scholarship Essay Compare And Contrast Essay OutlineScholarship Essay Compare And Contrast Essay Outline
Scholarship Essay Compare And Contrast Essay Outline
 
Illustration Essay (400 Words) - PHDessay.Com
Illustration Essay (400 Words) - PHDessay.ComIllustration Essay (400 Words) - PHDessay.Com
Illustration Essay (400 Words) - PHDessay.Com
 
MBA Essay Writing Service - Get The Best Help
MBA Essay Writing Service - Get The Best HelpMBA Essay Writing Service - Get The Best Help
MBA Essay Writing Service - Get The Best Help
 
Here Are 7 Reasons Why
Here Are 7 Reasons WhyHere Are 7 Reasons Why
Here Are 7 Reasons Why
 
27 Outstanding College Essay Examples College
27 Outstanding College Essay Examples College27 Outstanding College Essay Examples College
27 Outstanding College Essay Examples College
 
How To Start An Essay With A Quote Basic TipsSample
How To Start An Essay With A Quote Basic TipsSampleHow To Start An Essay With A Quote Basic TipsSample
How To Start An Essay With A Quote Basic TipsSample
 
How To Format Essays Ocean County College NJ
How To Format Essays  Ocean County College NJHow To Format Essays  Ocean County College NJ
How To Format Essays Ocean County College NJ
 
Essay Writing - A StudentS Guide (Ideal For Yr 12 And
Essay Writing - A StudentS Guide (Ideal For Yr 12 AndEssay Writing - A StudentS Guide (Ideal For Yr 12 And
Essay Writing - A StudentS Guide (Ideal For Yr 12 And
 
4
44
4
 
Winter Snowflake Writing Paper By Coffee For The Kid
Winter Snowflake Writing Paper By Coffee For The KidWinter Snowflake Writing Paper By Coffee For The Kid
Winter Snowflake Writing Paper By Coffee For The Kid
 
Example Of Case Study Research Paper - 12+ Cas
Example Of Case Study Research Paper - 12+ CasExample Of Case Study Research Paper - 12+ Cas
Example Of Case Study Research Paper - 12+ Cas
 
How To Write A Term Paper S
How To Write A Term Paper SHow To Write A Term Paper S
How To Write A Term Paper S
 
Essay Computers For And Against Telegraph
Essay Computers For And Against  TelegraphEssay Computers For And Against  Telegraph
Essay Computers For And Against Telegraph
 
A conceptual framework for international human resource management research i...
A conceptual framework for international human resource management research i...A conceptual framework for international human resource management research i...
A conceptual framework for international human resource management research i...
 
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdf
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdfA-Guide-to-Reading-and-Writing-Japanese.pdf.pdf
A-Guide-to-Reading-and-Writing-Japanese.pdf.pdf
 
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...Associating to Create Unique Tourist Experiences of Small Wineries in Contine...
Associating to Create Unique Tourist Experiences of Small Wineries in Contine...
 
Academic Reference Management.pdf
Academic Reference Management.pdfAcademic Reference Management.pdf
Academic Reference Management.pdf
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

A Tool to Detect Plagiarism in Java Source Code.pdf

  • 1. A Tool to Detect Plagiarism in Java Source Code Swati Srivastava, Akshit Rai, and Mahima Varshney Abstract The act of plagiarism occurs when an author uses other author’s intellec- tual ideas without his/her permission. In academics, scholars used to submit assign- ments now and then in the form of codes or text documents. In this work, the primary focus will be on program codes. In the current scenario, a major perspective is that the scholars may facsimile the codes from a source record without appropriately referencing the original writer or programmer. As there is a wide range of program- ming languages like C, C++, Java, Python, and many more, in this paper, we have dealt with Java code files. The objective of this work is to estimate the percentage of plagiarism in the given input programming code. Keywords Plagiarism · Java code · Normalization · Levenshtein distance · Similarity index 1 Introduction The act of copying one person’s effort without taking permission is referred to as plagiarism. It is like the act of stealing a car, watch, cell phone, and a variety of gazettesofotherswhichisliabletobepunishedbylaw.Certainly,theactofstealingan intellectual’s idea is considered unlawful. Though, this does not signify that scholars should not observe diverse works or sources for references. To enhance knowledge by taking opinions and ideas from experts is a good thing. Nevertheless, most prominent is to ensure that the basis of the sources and references are accordingly credited. S. Srivastava (B) · A. Rai · M. Varshney Department of Computer Engineering and Applications, GLA University, Mathura, Uttar Pradesh, India e-mail: swati.srivastava@gla.ac.in A. Rai e-mail: akshit.rai_cs16@gla.ac.in M. Varshney e-mail: mahima.varshney_cs16@gla.ac.in © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 G. Ranganathan et al. (eds.), Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 145, https://doi.org/10.1007/978-981-15-7345-3_20 243
  • 2. 244 S. Srivastava et al. Plagiarism in coding is not a completely novel experience. This concern has been studied earlier by researchers to recognize the rigorousness of the problem [1, 2]. Plagiarism in programming assignment, not only engrossed the replication of source code but comments and input data are also considered as plagiarism. There are many reasons for students of getting involved in plagiarism like sometimes they feel lazy to write their code. Usually, plagiarism in coding is firm to sense since similar coding is used for the same application. Plagiarism in coding is straightforward to do but tricky to detect. Scholars facsimile all or part of a program from a source or different sources and put forward the fake as their work. This includes students who act as a team and present analogous work. Such plagiarism is felt to be ordinary, even though the true similarity level is hard to assess. When a teacher in a programming course gives a common problem to all scholars then all have to work on the same problem. Consequently, some scholars may inscribe the source code of a problem on their own. While other scholars just obtain the code and change the variable names, the order of statements, functions, and variables of a class. Such modifications in source code are complicated to seize. There are two categories of source code variation: lexical change and structural change. Lexical change can be done without any prior programming knowledge. Structural changes need prior knowledge of programming language. Change in the number of iterations, conditional statements, the order of statements, a procedure to function, and vice versa, adding comments are structural changes. For the code in Fig. 1, one can use the same logic devoid of considering this code. For sure, this is not considered plagiarism. Such a scenario can be handled by putting some constraints over the size of the code. The constraints may be like that if n consecutive lines are similar in two codes then it will be considered as stealing. We need a system to calculate the similarity percentage of code between two Java files. We proposed a plagiarism detection system based on a novel normalization process, to identify the uniqueness of the scholar’s code by comparing the input code with the original code. It may be used by the teachers to detect whether the student committed plagiarism or not. This is possible when the plagiarism is estimated for two Java files. If the percentage of plagiarism is less than the specified threshold, then the input code is acceptable otherwise not. Fig. 1 Sample code
  • 3. A Tool to Detect Plagiarism in Java Source Code 245 The rest of the paper is organized as Sect. 2 represents the previous work on plagiarism detection. Section 3 presents the proposed work. The results are discussed in Sect. 4. Section 5 concludes the proposal. 2 Related Work Many researchers have given methods for plagiarism detection in text and program- ming code [3, 4, 5, 6]. While some researchers gave a comparison among different plagiarism detection tools [7, 1, 2]. Nurhayati and Busman [8] intended the Leven- shtein Distance (LD) algorithm for plagiarism detection in the document. They devel- oped software for Android smartphones. One way to measure the distance is a string metric which is the result of the LD algorithm. In [9], the authors created an appli- cation using the LD algorithm to identify similarity in Java codes. A technique for uncovering the plagiarism between C++ and Java codes based on semantics has been projected in [10]. It is a multimedia-based e-Learning and smart estima- tion method. Input code transformed into tokens to determine semantic comparison token by token. Then it estimated the semantic similarity for the whole input code. In literature, there exist many similarity detection algorithms. Based on these algo- rithms, the researchers developed a similarity detection system referred to as SCSDS [11]. SCSDS was slower than existing methods. By the fusion of various similarity detection algorithms, the speed and performance of SCSDS became even worse. SCSDS required speed and performance improvement. In [12], the plagiarism detec- tion system considered only text documents for plagiarism tasks. No consideration was given to the syntactical structure of formal programming language. They used normalization of commonly used identifiers to detect a pair of programs that have the same objective. They proved that removal of these normalized operations improves the system. 3 Proposed Method The proposed system aims to estimate the plagiarism percentage in the given input code. Initially, the user needs to give an input code that has to be checked for plagia- rism. The already available codes are called here as original codes that are used for comparison. These two codes are stored in separate variables. After that, the code stored in these two variables is converted to a form that can be easily used for detecting plagiarism. This is done in the normalization step. Following steps are performed to normalize the code: • Removing white spaces • Removing comments • Removing all the keywords
  • 4. 246 S. Srivastava et al. • Removing all the operators • Replacing all the identifiers with **identifier** • Sorting. Removing white spaces Generally, there are white spaces before and after any operator to enhance the read- ability. If the code is copied from any online platform then users generally take care of these extra spaces because it looks like it has been copied. So, there is no need for extra spaces as it will increase the length of our string. As the length of the string increases, it will reflect on the LD algorithm as its complexity is O(n2 ). Removing comments As comments do not affect the actual functioning of code, it is merely there for understanding code in case of complex and long code. We are removing comments because someone can add an extra comment or edit the copied comment. Since the LD algorithm checks similarity character by character, it will affect the result of our plagiarism detection tool. The following regular expression is used to detect the comments. replaceAll(“(?:/*(?:[ˆ*]|(?:*+[ˆ*/]))**+/)|(?://.*)”,”“)) Removing all the keywords This is the most significant step. It involves removing all the keywords that belong to a language. In our proposal, we check plagiarism only in Java code, so we removed all the keywords that belong to Java language. We are removing keywords because the code of the same program will generally have some type of data types and inbuilt functions. Therefore, they are generally increasing the length of our string which will again reflect the complexity as O(n2 ). So, to save time and space we remove keywords. Sometimes users come around with some hack and use different data types and functions to complete the code. Although the code is copied, as he/she understood the copied code, he/she edited it to avoid plagiarism. Removing all the keywords will help in detecting the genuine similarity index. Removing all the operators Generally, codes of the same program used the same type and the same number of operators even if they are not copied. They are only increasing the time and space complexity of our code. To get away from this, we remove all the operators. Replacing all the identifiers with **identifier** Users generally change the name of identifiers involved in a code to dodge plagiarism. So, we are renaming all the identifiers in both the codes that mean original code and the code to be checked by “**identifier**”.
  • 5. A Tool to Detect Plagiarism in Java Source Code 247 Sorting Sort both the strings containing original code and the code to be checked alphabeti- cally. A user can change the position of copied code (function, class, etc). Sometimes user also changes the position of statements. Therefore, we need to sort both the strings. The result of sorting is stored separately for original code as well as code to be checked to detect plagiarism even if the user has changed the position of copied code. This completes the normalization step. After performing all these steps, we get normalized code that again can be stored in a variable. Now, we simply apply the LD algorithm [8]. After that, we store the result of the LD algorithm in a variable. Now, we calculate the plagiarized value using the result of the LD algorithm. Levenshtein Algorithm The LD algorithm [8] is used to find the distance which is used for measuring the dissimilarity between two progressions. This distance is referred to as Levenshtein distance or edit distance. It may also denote a larger family of distance metrics. It gives a minimum number of single-character alterations, essential to change one word into the other, between two terms. Calculating Plagiarism After performing normalization, we get normalized codes in the form of string both for original code and code to be checked. The original code is referred to as source string (δ). The code to be checked string is referred to as the target string (ε). After this, we fed these two strings to the LD algorithm. It gives us a numeric value which corresponds to the difference between these two strings. This is called LD distance ( - d) and is defined as: (1) Now, using plagiarized value formula, we can calculate plagiarism between these two stings. The plagiarized value (ƥ) can be calculated as: (2) where - d is the LD distance, δ represents the original code, ε is code to be checked for plagiarism, max(δ, ε) is maximum length between δ and ε. Figure 2 shows the working of the proposed plagiarism detection system.
  • 6. 248 S. Srivastava et al. Fig. 2 Framework of the proposed plagiarism detection system
  • 7. A Tool to Detect Plagiarism in Java Source Code 249 4 Results and Findings To estimate the plagiarism percentage of the given input code, first, the user needs to give input code that has to be checked for plagiarism along with the original code. Figures 3 and 4 show the samples of the original code and code to be checked, respec- tively. This code is injected into the normalization step which results in normalized code. Now, the LD algorithm [8] is applied to the normalized code. Then, using the result of the LD algorithm, the plagiarized value can be estimated. Figure 5 shows the user interface of the proposed system. Figure 6 shows the interface after filling the code in the specified area. Figure 7 shows the estimated plagiarism by clicking on the check fraud button. From Fig. 8, it can be observed that the standard plagiarism detection software is not suitable to detect the originality of a Java programming code. Since there are common keywords in a programming language used by the programmers. Therefore, merely the detection of the same words is not the correct criteria to investigate the originality of source code. As can be seen from Figs. 7 and 8, standard software (Turnitin) gives the similarity index of 78% whereas the proposed system gives the similarity index of 51% for the same code. The similarity index calculated by the proposed method and standard software can be compared from Table 1. The above comparison can also be seen in Fig. 9. Thus, it can be stated that the proposed system is more suitable for Java codes than other software for originality detection of source code. Fig. 3 Sample original code
  • 8. 250 S. Srivastava et al. Fig. 4 Sample code to be checked Fig. 5 User interface
  • 9. A Tool to Detect Plagiarism in Java Source Code 251 Fig. 6 After filling both the text areas accordingly Fig. 7 After clicking on check fraud
  • 10. 252 S. Srivastava et al. Fig. 8 Plagiarism report of a standard plagiarism detection software Table 1 Comparison of similarity indexes of proposed system and existing software Input Similarity index (proposed system) (%) Similarity index (existing software) (%) Code 1 51.85 7 Code 2 54.76 80 Code 3 57.29 83 Code 4 53.26 81 Fig. 9 Comparison of similarity indexes of proposed system and existing software
  • 11. A Tool to Detect Plagiarism in Java Source Code 253 5 Conclusion We have proposed a tool that can efficiently be used to check whether the input Java code is plagiarized or not. To carry out plagiarism detection, first, the code is preprocessed through normalization. Normalization of code consists of various steps: removing white spaces, removing comments, removing all the keywords, removing all the operators, replacing all the identifiers with **identifier**, sorting. Then the normalized code is fed into the LD algorithm to obtain LD distance. The value returned by the LD algorithm is used to calculate the plagiarized value. The proposed tool only works on Java source code. Further, it could be extended to work on all programming languages. Plagiarized value has been calculated for 4 codes through the proposed system as well as the existing system. From the results, it can be concluded that the proposed system is more suitable for Java codes than the existing system for originality detection of source code. References 1. Foltýnek Tomáš, Meuschke Norman, Gipp Bela (2019) Academic plagiarism detection: a systematic literature review. ACM Comput Surv (CSUR) 52(6):1–42 2. Naik RR, Landge MB, Mahender CN (2015) A review on plagiarism detection tools. Int J Comput Appl 125(11) 3. Ghanem B, Arafeh L, Rosso P, Sánchez-Vega F (2018) HYPLAG: hybrid Arabic text plagia- rism detection system. In: International conference on applications of natural language to information systems. Springer, Cham, pp 315–323 4. Jadalla Ameera, Elnagar Ashraf (2008) PDE4Java: plagiarism detection engine for java, source code: a clustering approach. IJBIDM 3(2):121–135 5. Alzahrani SM, Salim N, Abraham A (2011) Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(2):133–149 6. Sulistiani Lisan, Karnalim Oscar (2019) ES-Plag: efficient and sensitive source code plagiarism detection tool for academic environment. Comput Appl Eng Educ 27(1):166–182 7. Ali AM, Abdulla HM, Snasel V (2011) Overview and comparison of plagiarism detection tools. In: DATESO, pp 161–172 8. Nurhayati B, Busman B (2017) Development of document plagiarism detection software using levensthein distance algorithm on Android smartphone. In: 2017 5th International conference on cyber and IT service management (CITSM), pp 1–6 9. Liaqat AG, Ahmad A (2011) Plagiarism detection in java code 10. Ullah F, Wang J, Farhan M, Jabbar S, Wu Z, Khalid S (2018) Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology. In: Multimedia tools and applications, pp 1–18 11. Ðurić Zoran, Gašević Dragan (2013) A source code similarity system for plagiarism detection. Comput J 56(1):70–86 12. Heblikar S, Sharma P, Munnangi M, Bankapur C (2015) Normalization based stop-word approach to source code plagiarism detection. In: FIRE workshops, pp 6–9