SlideShare a Scribd company logo
1 of 5
Download to read offline
A Review of Plagiarism Detection Based On
Lexical and Semantic Approach
Shameem Yousuf
Student
Department of Information Technology,
Central University of Kashmir
Srinagar, Jammu and Kashmir, India
Email: bhat.shameem@gmail.com
Muzamil Ahmad
Student
Department of Information Technology,
Central University of Kashmir
Srinagar, Jammu and Kashmir, India
Email: muzamilahmad87@gmail.com
Sheikh Nasrullah
Assistant Professor
Department of Information Technology,
Central University of Kashmir
Srinagar, Jammu and Kashmir, India
Email: nasrullah@cukashmir.ac.in
Abstract—Due to easy availability of documents over the web,
plagiarism has become serious problem to teachers, researchers
and publishers. In this paper, we discuss about the plagiarism
process, types and detection methodologies. Further, we have
classified the different plagiarism detection techniques based on
Lexical and Semantic Approach. Finally we present the brief
study of the different tools (offline and online) available for
detecting plagiarism.
I. INTRODUCTION
Plagiarism is the act of claiming or implementing author-
ship over someone else’s work, wholly or the part of it without
his proper authorization. With the evolution of the modern
web, information and data is available to everybody and is
just a click away from an individual, but with this information
boom there arises certain problems as well like most of the
people started copying and pasting their work from the already
existing digital documents on the web, and this act of cloning
others work without giving the proper credits to the author can
be regarded as the plagiarism. Formally it can be defined as
an act or instance of using or closely imitating the language
and thoughts of another author without authorization and the
representation of that author’s work as one’s own, as by not
crediting the original author1
.
The different areas where plagiarism can be found is the
literature, music, software, scientific articles, research papers,
newspapers, advertisements, websites etc. A study carried in
United States shows that among 18000 university students
almost 40% of them have plagiarized at least once2
.
Plagiarism is a growing challenge in the modern society
and in order to maintain the academic integrity the use of
plagiarism detection tools has become the norm in many higher
education institutions, but the effectiveness of detection level
depends on the type of algorithm and the type of obfuscation
strategy employed by the plagiarist in order to create the
plagiarised text.
A. Plagiarising Process
The process of plagiarising is the simple task of reusing the
existing content in a way that it adheres to your requirements
of the given task.
1http://dictionary.reference.com/browse/plagiarism
2D. McCabe. Research Report of the Center for Academic Integrity.
http://www.academicintegrity.org, 2005
The task includes searching for the related content from
the web, text documents or any other resources and then using
this content by copying and pasting it into the newly created
document by using different obfuscation strategies present to
disguise the act of plagiarism detection. After completing this
we are done with the new plagiarised version of the existing
document, which could be used anywhere required to fulfil
the task. The whole task of the plagiarism process can be
illustrated using this simple illustration in figure 1.
Fig. 1. The basic steps of Plagiarizing.
B. Types of Plagiarism
Plagiarism is very vast and dynamic i.e. there exist a
number of obfuscation strategies which help to create the pla-
giarised text. Plagiarism.org has classified and ranked different
types of plagiarism based on the severity of the intent.
According to the plagiarism.org3
the 10 most common
types of plagiarism are illustrated in figure 2.
1) Clone: This type of plagiarism means when a plagiarist
3http://www.plagiarism.org
submits another person’s work, word by word as one’s
own work.
2) Ctrl-C: When the plagiarised text contains the significant
portions of original text without any alterations.
3) Find-Replace: This type of plagiarism includes changing
of the keywords in the text but retaining the original
content of the source.
4) Remix: When a plagiarist paraphrase the documents from
the multiple sources and combine them in the single
document.
5) Recycle: When an author uses its previous work, without
proper citation in order to form the new documents. this
type is sometimes also called as self plagiarism.
6) Hybrid: When a plagiarist Copies different passages from
the multiple cited sources, without proper citation.
7) Mashup: This is the mixup of the content from different
sources.
8) 404 Error: This type of plagiarism defines when a
plagiarist includes citation to non-existent or inaccurate
information about sources.
9) Aggregator: This type of plagiarism includes proper
citation to sources, but the paper contains no original
work.
10) Re-Tweet: In this type of plagiarism the author mentions
proper citations but relies too closely on the authors
original wording or structure.
In section II we will discuss about the plagiarism detection
methodologies. The section IV provides a brief study of
various plagiarism detection tools.
Fig. 2. Types of Plagiarism.
II. PLAGIARISM DETECTION
Plagiarising means to reuse someone elses work without
the proper citation and pretending it to be ones own work. Text
plagiarism is one of the old forms of plagiarism and remains
difficult to be identified in practice, till this day. The challenges
in automatic plagiarism detection has been widely discussed
in [1].
In order to create an automatic plagiarism detection system
we need to have an existing Corpus and detection techniques.
The general detection process is shown in figure 3. The
detection process is divided into 3 tasks given as under:
• Pre Processing
• Intermediate Processing
• Post Processing
Pre Processing includes uploading source document and
retrieving suspicious documents from Corpus based on the
source document. Once we acquire the specific data we send
this data for intermediate processing. The design issue of this
stage include how accurate the searching of documents is done
from the corpus.
Intermediate processing stage includes the detailed de-
tection and comparison of the source and the suspicious
documents based on the algorithm running. The design issues
of this stage include the running time and the effectiveness of
the comparison logic.
Post Processing is the final stage of this process include
preparing the results of the detection task and based on those
results we are actually able to decide whether the source
document is plagiarised or not.
Fig. 3. Genric retrieval process.
In order to detect the plagiarism in source code of program
and free text (plain natural language text), the following given
methodologies can be used:
• Manual detection: This process of plagiarism detec-
tion is done manually by the humans by comparing
and verifying the given set of document. This process
requires a lot of expertise in the particular as we
have to look for the various plagiarism strategies. This
type of detection is suitable for checking class work,
articles, and short notes but is impractical to verify
large number of documents and infeasible in the terms
of economy and time.
• Computer aided detection: This type of detection
technique refers to the process of detecting plagiarism
with the help of computer system equipped with
plagiarism detection algorithm. Since the rise of the
process of detection, different approaches have been
followed to get the job done in efficient way. Using
this type of detection approach over manual plagiarism
detection has got both advantages and disadvantages
as well, but we always have to trade-off between the
performance and the cost associated with it.
III. PLAGIARISM DETECTION TECHNIQUES
In this section, we present a review on various plagiarism
detection techniques developed. The techniques are classified
into, Source Code Plagiarism Detection and Free Text Plagia-
rism Detection.
Fig. 4. Classification of Plagiarism Detection.
A. Source Code Plagiarism Detection
The different techniques that are used to create the plagia-
rised source code include comment removal, identifier renam-
ing, structured constant renaming, and removing debugging
information.
The Source code plagiarism detection techniques include:
i) Textual Based Approach: This approach is based on
comparison of line or string sequence in the code, this
type usually works with raw source code. An example of
this approach is diff4
file comparison utility which tries
to find out the differences between two files by finding
the longest common subsequence.
ii) Token Based Approach: In this type of approach source
code is parsed into sequence of Tokens depending upon
the rules of the programming language. The example of
such a type of detector is java based detection tool JPLAG
[2].
iii) Tree Based Approach: In this type of approach source
code is parsed into parse tree or Abstract Syntax Tree
(AST). AST represents the syntactic structure of the
parsed source code with abstract representation of every
element, and then a tree matching algorithm is used to
search the similar sub tree in order to detect the code
clones. Clone Digger [3] is the example of such tool.
iv) PDG based/ Semantics-Aware Approach: This approach
aims at analysing the behaviour of the source code rather
than the syntactic features. In this method highly ab-
stracted source code representation called PDG (Program
4http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
Dependency Graph) is obtained that carries the semantic
information of the source code. The PDG contains the data
and control flow and thus ignoring the syntactic structure.
After obtaining the PDG, a sub graph matching algorithm
is applied to discover similar sub graph which are then
returned as clones. Scorpio [4] is example of such a tool.
B. Free Text Plagiarism Detection
i) Lexical Approach: This approach of plagiarism detection
focuses on using the lexical features of the text or docu-
ments, which operate at the charcter or the word level of
the document [5] in order to trace the plagiarism scenario
from the suspicious documents. This type of approach
tries to enhance the standard string matching comparison
in order to detect the plagiarism. The processing tech-
nique that this approach relies upon includes tokeniza-
tion, lowercasing, punctuation removal and stemming [6],
however it can vary from technique to technique. The
comparison units adopted for detecting plagiarism differ
from one technique to another, the different such units
include words, sentences, passages, human defined sliding
window or an n-gram. The summary of the work that have
been done using these techniques include [7], [8], [9],
[10], [11], [12], [13]. With the evolution of the time and
technology researchers are moving forward to breakdown
the problem in simpler forms and to get the efficient and
desired results, In order to achieve this goal researchers
are merging different detection approaches like the usage
of Natural Language Processing (NLP) in order to extract
key features of the text.
ii) Semantic Approach: Plagiarists sometimes use sophisti-
cated obfuscation techniques in order to create the pla-
giarised text, like changing the words or phrases to those
with the similar meaning and in this way they create
the plagiarised copies of the original version of the text.
The basic detection method using semantic approach is
illustrated in the Figure 5.
Fig. 5. Hypothetical semantic retrieval approach.
In this approach different semantic features which include
(Synonyms, hyponyms, hypernyms, semantic dependen-
cies) [5] are extracted from the source documents and
then these features are used to trace out the plagiarism
case from the corpus and the fact database build-up of
already existing documents.
The semantic approach is aimed to attain the high perfor-
mance in terms of detection, and should address the issues
of polysemy (same words referring to different things
based on the context like mouse the computer input device
and mouse the rodent) and synonymy (different words
referring to the same things like car and automobile)
that are not handled by the lexical (straight forward term
matching) approach.
Lin et al. [14] has explored semantic similarity using
lexical databases such as Stanford Wordnet5
to acquire
synonyms, another algorithms that can be used to extract
the semantic features of sentences are Latent Dirichelet
Allocation [15], another novel way of computing the
document is using the RDF framework6
. In this approach
a document is represented as RDF triples, an RDF triple
has the proper format (subject, predicate, object). For e.g.,
the sentence (john, livesIn, ohio) where john and ohio are
known as entities and the livesIn (predicate) is a relation
between two entities. It has a domain (restricting the set
of subjects) and a range (restricting the set of objects).
The predicate livesIn for instance has the domain humans
and the range locations, this is denoted by livesIn(humans,
locations) which is a two notion relation. A set of RDF
facts is referred to as an ontology and can be extracted
from the text documents.
The Semantic based approach is not widely used because
of the level of difficulty incorporated with this approach
however different work that has been done in this area
include [16].
IV. TOOLS
Various tools have been developed so far to detect the
plagiarism. The tools are classified as Online and Offline.
Offline tools include those set of plagiarism detection tools
which can be run in the offline environment in order to perform
the detection process, these tools usually include the inbuilt
corpus against which the suspicious document is checked, and
this can be one of the limitation of these kind of tools. On
the other hand, online tools include those tools which perform
the operation of detection in the online enviroment and check
the documents against the indexed documents. These tools
constantly build up their corpus by indexing the web. Hence,
the detection is fairly good as compared to offline tools.
A. Offline Tools
Some of the offline tools for plagiarism detection are given
under:
i) CopyCatch: CopyCatch is a plagiarism detection tool that
evolved from the WordCheck. Earlier it was a primary
plagiarism detection tool used for research papers, and
essays etc., its algorithm relied on the principle of
hapexlegomena words. These are the words that only
appear once in a text, so instead of counting occurrence
of every word, it returned a list that only exist once in
the document. If a document shared over 50 percent of
its hapexlegomena words with another, it was marked as
possible plagiarism. The idea was based on the research
5http://ai.stanford.edu/ rion/swn/
6http://www.w3.org/RDF/
that found independently written texts on the same
subject can have upto 50 percent hapexlegomena overlap,
but anymore indicates potential plagiarism [17].
ii) SCAM: SCAM (Stanford Copy Analysis Mechanism)
[18] is a plagiarism detection tool that first appeared in
1995. Its algorithm was much more statistics oriented
than many of the other programs. It first gathers the
list of word occurrences exactly like WordCheck. Then
it statistically normalise the list based on the number
of occurrences essentially i.e., putting the data in bell
shaped curve. The list is then stored as vector. SCAM
then uses the vector space model [19] to compare this
vector with other documents resulting vectors. The
vectors were compared using a dot product or a cosine
function for similarity. In other words, if distribution of
words is similar, then the documents must be similar.
iii) CHECK: CHECK is a tool that combines statistical
analysis with computer science techniques. It still has
to maintain the huge database of documents to compare
against the submission. However, it was able to narrow
down the search process by restricting the document
comparison by attempting to determine the contents and
semantics of the paper. So instead of comparing the
submitted paper with every document in its database, it
would have to compare it with those that were determined
to be of similar content.
CHECK determines the semantics of the paper by creat-
ing its document tree. The CHECK algorithm creates a
tree from a document by layering sections, subsections,
paragraphs, sentences, etc.
B. Online Tools
The various online tools for detecting Plagiarism include:
i) TurnItIn:It was designed by four UC Berkeley graduate
students as a peer review application to use for their
classes. Eventually, that prototype developed into one
of the most recognizable names in plagiarism detection.
TurnItIn, which processed over 60 million academic
papers in 2011, is accessible for a fee per educator.
Students can use TurnItIn’s WriteCheck service to
maintain proper citations and to access various writing
tools.
ii) IThenticate: Like TurnItIn, iThenticate is a service
offered by Plagiarism.org, but is geared more toward
professional writing and scholarly research. Publishers
like Oxford University Press use iThenticate for its Cross
Check software, which includes a database of more than
31 million articles and 67,664 books and journals.
iii) Viper: Viper calls itself the ”Free TurnItIn Alternative.”
It scans a large database of academic essays and other
online sources, offering side-by-side comparisons for
plagiarism. But the limitation of this is that it is available
for Microsoft Windows users only.
iv) PlagiarismChecker.com: PlagiarismChecker.com makes
it simple for educators to check for copied work by
pasting phrases from a student’s paper into a search box.
The system can search through either Google or Yahoo.
Users can also use the ”Author” option to check if others
have plagiarized their work online.
v) PlagiarismDetect.com: PlagiarismDetect.com scans
text at a rate of dollar 0.50 per page. The system
takes about 5-7 minutes per page, which makes for
thorough examination. According to the website,
PlagiarismDetect.com has recently updated its system
with a new advanced algorithm, combining multi-layered
technology and SMART scanning (which supposedly
scans papers like humans).
vi) Plagiarisma.net: Plagiarisma.net has a search box as well
as a software download available for Windows. Users can
also search for entire URLs and files in HTML, DOC,
DOCX, RTF, TXT, ODT and PDF formats.
vii) PlagiarismSoftware.net: Formerly known as
(Duplichecker), this minimalistic checker lets users
search for text and upload text files.
viii) CheckForPlagiarism.net: CheckForPlagiarism.net claims
its licensing fees are, on average, between 35% and 70%
lower than competing services. Its basic account, meant
for high school students, costs dollar 20 and allows users
to scan five documents. The service can scan multiple
languages, and users can compare papers.
ix) Essay Verification Engine (EVE2): The EVE plagiarism
detection system is one of the older services on this
list, having performed almost 150 million scans since its
creation in 2000. It runs users dollar 29.99 for unlimited
use and includes a 10-day money-back guarantee.
REFERENCES
[1] P. Clough and D. O. I. Studies, “Old and new challenges in automatic
plagiarism detection,” in National Plagiarism Advisory Service, 2003;
http://ir.shef.ac.uk/cloughie/index.html, 2003, pp. 391–407.
[2] L. Prechelt, G. Malpohl, and M. Phlippsen, “Jplag: Finding plagiarisms
among a set of programs,” Tech. Rep., 2000.
[3] M. M. Peter Bulychev, “An evaluation of duplicate code detection using
anti-unification,” in Proceedings of the 3rd International Workshop on
Software Clones at CSMR, 2009.
[4] S. K. Yoshiki Higo, “Code clone detection on specialized pdgs with
heuristics,” 2011 15th European Conference on Software Maintenance
and Reengineering, vol. 0, pp. 75–84, 2011.
[5] S. M. Alzahrani, N. Salim, and A. Abraham, “Understanding plagiarism
linguistic patterns, textual features, and detection methods,” Trans. Sys.
Man Cyber Part C, vol. 42, no. 2, pp. 133–149, Mar. 2012. [Online].
Available: http://dx.doi.org/10.1109/TSMCC.2011.2134847
[6] M. Chong and L. Specia, “Lexical generalisation for word-level match-
ing in plagiarism detection,” in RANLP, 2011, pp. 704–709.
[7] S. Brin, J. Davis, and H. Garcia-Molina, “Copy detection mechanisms
for digital documents,” in SIGMOD Conference, 1995, pp. 398–409.
[8] D. R. White and M. Joy, “Sentence-based natural language plagiarism
detection,” ACM Journal of Educational Resources in Computing,
vol. 4, no. 4, pp. 1–20, 2004.
[9] S. Niezgoda and T. P. Way, “Snitch: a software tool for detecting cut
and paste plagiarism,” in SIGCSE, 2006, pp. 51–55.
[10] A. Barrón-Cedeño and P. Rosso, “On automatic plagiarism detection
based on n-grams comparison,” in ECIR, 2009, pp. 696–700.
[11] M. S. Pera and Y.-K. Ng, “A naı̈ve bayes classifier for web document
summaries created by using word similarity and significant factors,”
International Journal on Artificial Intelligence Tools, vol. 19, no. 4, pp.
465–486, 2010.
[12] E. Stamatatos, “Plagiarism detection using stopword n-grams,” JASIST,
vol. 62, no. 12, pp. 2512–2527, 2011.
[13] J. Grman and R. Ravas, “Improved implementation for finding text
similarities in large sets of data - notebook for pan at clef 2011,” in
CLEF (Notebook Papers/Labs/Workshop), 2011.
[14] H.-H. Chen, M.-S. Lin, and Y.-C. Wei, “Novel association measures
using web search with double checking,” in ACL, 2006.
[15] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”
Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
[16] G. Tsatsaronis, I. Varlamis, and M. Vazirgiannis, “Text relatedness based
on a word thesaurus,” J. Artif. Intell. Res. (JAIR), vol. 37, pp. 1–39,
2010.
[17] P. Clough, “Plagiarism in natural and programming languages: an
overview of current tools and technologies,” 2000.
[18] N. Shivakumar and H. Garcia-molina, “Scam: A copy detection
mechanism for digital documents,” in In Proceedings of the Second
Annual Conference on the Theory and Practice of Digital Libraries,
1995. [Online]. Available: http://ilpubs.stanford.edu:8090/95/
[19] G. Salton, A. Wong, and C. S. Yang, “A vector space model for
automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov.
1975. [Online]. Available: http://doi.acm.org/10.1145/361219.361220

More Related Content

Similar to A Review Of Plagiarism Detection Based On Lexical And Semantic Approach

‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
 
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...cscpconf
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM cscpconf
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
 
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsIRJET Journal
 
Review of plagiarism detection and control & copyrights in India
Review of plagiarism detection and control & copyrights in IndiaReview of plagiarism detection and control & copyrights in India
Review of plagiarism detection and control & copyrights in Indiaijiert bestjournal
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
 
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLPIRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLPIRJET Journal
 
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.vivatechijri
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueINFOGAIN PUBLICATION
 
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGA STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGAllison Thompson
 
@@@Rf8 polymorphic worm detection using structural infor (control flow gra...
@@@Rf8 polymorphic worm detection using structural infor    (control flow gra...@@@Rf8 polymorphic worm detection using structural infor    (control flow gra...
@@@Rf8 polymorphic worm detection using structural infor (control flow gra...zeinabmovasaghinia
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacksNilu Desai
 
Online Plagiarism Checker
Online Plagiarism CheckerOnline Plagiarism Checker
Online Plagiarism CheckerIRJET Journal
 
Malicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueMalicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueDr. Amarjeet Singh
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterDan Nguyen
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...IJCSIS Research Publications
 

Similar to A Review Of Plagiarism Detection Based On Lexical And Semantic Approach (20)

‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
A Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming AssignmentsA Literature Review on Plagiarism Detection in Computer Programming Assignments
A Literature Review on Plagiarism Detection in Computer Programming Assignments
 
P33077080
P33077080P33077080
P33077080
 
Review of plagiarism detection and control & copyrights in India
Review of plagiarism detection and control & copyrights in IndiaReview of plagiarism detection and control & copyrights in India
Review of plagiarism detection and control & copyrights in India
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLPIRJET -  	  Online Assignment Plagiarism Checking using Data Mining and NLP
IRJET - Online Assignment Plagiarism Checking using Data Mining and NLP
 
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.A Hybrid Approach For Phishing Website Detection Using Machine Learning.
A Hybrid Approach For Phishing Website Detection Using Machine Learning.
 
Ijet journal
Ijet journalIjet journal
Ijet journal
 
A Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid TechniqueA Novel Approach for Code Clone Detection Using Hybrid Technique
A Novel Approach for Code Clone Detection Using Hybrid Technique
 
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGA STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
 
@@@Rf8 polymorphic worm detection using structural infor (control flow gra...
@@@Rf8 polymorphic worm detection using structural infor    (control flow gra...@@@Rf8 polymorphic worm detection using structural infor    (control flow gra...
@@@Rf8 polymorphic worm detection using structural infor (control flow gra...
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
 
Online Plagiarism Checker
Online Plagiarism CheckerOnline Plagiarism Checker
Online Plagiarism Checker
 
Malicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression TechniqueMalicious-URL Detection using Logistic Regression Technique
Malicious-URL Detection using Logistic Regression Technique
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Groundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitterGroundhog day: near duplicate detection on twitter
Groundhog day: near duplicate detection on twitter
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
 

More from Courtney Esco

I Need Someone To Write My Essay - UK Essay Writing
I Need Someone To Write My Essay - UK Essay WritingI Need Someone To Write My Essay - UK Essay Writing
I Need Someone To Write My Essay - UK Essay WritingCourtney Esco
 
Do Urgent Essay Writing ,Research Writing ,Summary And Report Writi
Do Urgent Essay Writing ,Research Writing ,Summary And Report WritiDo Urgent Essay Writing ,Research Writing ,Summary And Report Writi
Do Urgent Essay Writing ,Research Writing ,Summary And Report WritiCourtney Esco
 
003 Compare And Contrast Essay Intro Example Fres
003 Compare And Contrast Essay Intro Example Fres003 Compare And Contrast Essay Intro Example Fres
003 Compare And Contrast Essay Intro Example FresCourtney Esco
 
Pencil And Paper Games For Kids Fo
Pencil And Paper Games For Kids FoPencil And Paper Games For Kids Fo
Pencil And Paper Games For Kids FoCourtney Esco
 
Essay Written For Me. Do Essay For Me Write Essay
Essay Written For Me. Do Essay For Me Write EssayEssay Written For Me. Do Essay For Me Write Essay
Essay Written For Me. Do Essay For Me Write EssayCourtney Esco
 
ValentineS Day Writing Paper Heart Writing Pape
ValentineS Day Writing Paper Heart Writing PapeValentineS Day Writing Paper Heart Writing Pape
ValentineS Day Writing Paper Heart Writing PapeCourtney Esco
 
Cloud Writing Paper - Printable Teaching Resources - Print Play Learn
Cloud Writing Paper - Printable Teaching Resources - Print Play LearnCloud Writing Paper - Printable Teaching Resources - Print Play Learn
Cloud Writing Paper - Printable Teaching Resources - Print Play LearnCourtney Esco
 
Literature Review Example Harvard Style
Literature Review Example Harvard StyleLiterature Review Example Harvard Style
Literature Review Example Harvard StyleCourtney Esco
 
When I Look Back To My First Experience Teaching Five P
When I Look Back To My First Experience Teaching Five PWhen I Look Back To My First Experience Teaching Five P
When I Look Back To My First Experience Teaching Five PCourtney Esco
 
ACS Citation Sample By Annotatedbib-S
ACS Citation Sample By Annotatedbib-SACS Citation Sample By Annotatedbib-S
ACS Citation Sample By Annotatedbib-SCourtney Esco
 
WhoLl Generate Eyes Watching Go
WhoLl Generate Eyes Watching GoWhoLl Generate Eyes Watching Go
WhoLl Generate Eyes Watching GoCourtney Esco
 
Thesis Essay Examples Telegr
Thesis Essay Examples  TelegrThesis Essay Examples  Telegr
Thesis Essay Examples TelegrCourtney Esco
 
010 English Essays Essay Example Student ~ Thatsn
010 English Essays Essay Example Student ~ Thatsn010 English Essays Essay Example Student ~ Thatsn
010 English Essays Essay Example Student ~ ThatsnCourtney Esco
 
Free Printable Polar Bear Craftivity Creative Writin
Free Printable Polar Bear Craftivity  Creative WritinFree Printable Polar Bear Craftivity  Creative Writin
Free Printable Polar Bear Craftivity Creative WritinCourtney Esco
 
Write My Apa Paper For Me For Free Write My Paper Fo
Write My Apa Paper For Me For Free Write My Paper FoWrite My Apa Paper For Me For Free Write My Paper Fo
Write My Apa Paper For Me For Free Write My Paper FoCourtney Esco
 
How Important It Is To Help Others Who Are In Need F
How Important It Is To Help Others Who Are In Need FHow Important It Is To Help Others Who Are In Need F
How Important It Is To Help Others Who Are In Need FCourtney Esco
 
🏷️ Ama Essay Format Example. AMA Style Form
🏷️ Ama Essay Format Example. AMA Style Form🏷️ Ama Essay Format Example. AMA Style Form
🏷️ Ama Essay Format Example. AMA Style FormCourtney Esco
 
Printable Paper With Lines For Wr
Printable Paper With Lines For WrPrintable Paper With Lines For Wr
Printable Paper With Lines For WrCourtney Esco
 

More from Courtney Esco (20)

I Need Someone To Write My Essay - UK Essay Writing
I Need Someone To Write My Essay - UK Essay WritingI Need Someone To Write My Essay - UK Essay Writing
I Need Someone To Write My Essay - UK Essay Writing
 
Do Urgent Essay Writing ,Research Writing ,Summary And Report Writi
Do Urgent Essay Writing ,Research Writing ,Summary And Report WritiDo Urgent Essay Writing ,Research Writing ,Summary And Report Writi
Do Urgent Essay Writing ,Research Writing ,Summary And Report Writi
 
003 Compare And Contrast Essay Intro Example Fres
003 Compare And Contrast Essay Intro Example Fres003 Compare And Contrast Essay Intro Example Fres
003 Compare And Contrast Essay Intro Example Fres
 
Pencil And Paper Games For Kids Fo
Pencil And Paper Games For Kids FoPencil And Paper Games For Kids Fo
Pencil And Paper Games For Kids Fo
 
Essay Written For Me. Do Essay For Me Write Essay
Essay Written For Me. Do Essay For Me Write EssayEssay Written For Me. Do Essay For Me Write Essay
Essay Written For Me. Do Essay For Me Write Essay
 
ValentineS Day Writing Paper Heart Writing Pape
ValentineS Day Writing Paper Heart Writing PapeValentineS Day Writing Paper Heart Writing Pape
ValentineS Day Writing Paper Heart Writing Pape
 
Speech Sample Essay
Speech Sample EssaySpeech Sample Essay
Speech Sample Essay
 
Cloud Writing Paper - Printable Teaching Resources - Print Play Learn
Cloud Writing Paper - Printable Teaching Resources - Print Play LearnCloud Writing Paper - Printable Teaching Resources - Print Play Learn
Cloud Writing Paper - Printable Teaching Resources - Print Play Learn
 
Literature Review Example Harvard Style
Literature Review Example Harvard StyleLiterature Review Example Harvard Style
Literature Review Example Harvard Style
 
When I Look Back To My First Experience Teaching Five P
When I Look Back To My First Experience Teaching Five PWhen I Look Back To My First Experience Teaching Five P
When I Look Back To My First Experience Teaching Five P
 
Buy Essay
Buy EssayBuy Essay
Buy Essay
 
ACS Citation Sample By Annotatedbib-S
ACS Citation Sample By Annotatedbib-SACS Citation Sample By Annotatedbib-S
ACS Citation Sample By Annotatedbib-S
 
WhoLl Generate Eyes Watching Go
WhoLl Generate Eyes Watching GoWhoLl Generate Eyes Watching Go
WhoLl Generate Eyes Watching Go
 
Thesis Essay Examples Telegr
Thesis Essay Examples  TelegrThesis Essay Examples  Telegr
Thesis Essay Examples Telegr
 
010 English Essays Essay Example Student ~ Thatsn
010 English Essays Essay Example Student ~ Thatsn010 English Essays Essay Example Student ~ Thatsn
010 English Essays Essay Example Student ~ Thatsn
 
Free Printable Polar Bear Craftivity Creative Writin
Free Printable Polar Bear Craftivity  Creative WritinFree Printable Polar Bear Craftivity  Creative Writin
Free Printable Polar Bear Craftivity Creative Writin
 
Write My Apa Paper For Me For Free Write My Paper Fo
Write My Apa Paper For Me For Free Write My Paper FoWrite My Apa Paper For Me For Free Write My Paper Fo
Write My Apa Paper For Me For Free Write My Paper Fo
 
How Important It Is To Help Others Who Are In Need F
How Important It Is To Help Others Who Are In Need FHow Important It Is To Help Others Who Are In Need F
How Important It Is To Help Others Who Are In Need F
 
🏷️ Ama Essay Format Example. AMA Style Form
🏷️ Ama Essay Format Example. AMA Style Form🏷️ Ama Essay Format Example. AMA Style Form
🏷️ Ama Essay Format Example. AMA Style Form
 
Printable Paper With Lines For Wr
Printable Paper With Lines For WrPrintable Paper With Lines For Wr
Printable Paper With Lines For Wr
 

Recently uploaded

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 

Recently uploaded (20)

Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

A Review Of Plagiarism Detection Based On Lexical And Semantic Approach

  • 1. A Review of Plagiarism Detection Based On Lexical and Semantic Approach Shameem Yousuf Student Department of Information Technology, Central University of Kashmir Srinagar, Jammu and Kashmir, India Email: bhat.shameem@gmail.com Muzamil Ahmad Student Department of Information Technology, Central University of Kashmir Srinagar, Jammu and Kashmir, India Email: muzamilahmad87@gmail.com Sheikh Nasrullah Assistant Professor Department of Information Technology, Central University of Kashmir Srinagar, Jammu and Kashmir, India Email: nasrullah@cukashmir.ac.in Abstract—Due to easy availability of documents over the web, plagiarism has become serious problem to teachers, researchers and publishers. In this paper, we discuss about the plagiarism process, types and detection methodologies. Further, we have classified the different plagiarism detection techniques based on Lexical and Semantic Approach. Finally we present the brief study of the different tools (offline and online) available for detecting plagiarism. I. INTRODUCTION Plagiarism is the act of claiming or implementing author- ship over someone else’s work, wholly or the part of it without his proper authorization. With the evolution of the modern web, information and data is available to everybody and is just a click away from an individual, but with this information boom there arises certain problems as well like most of the people started copying and pasting their work from the already existing digital documents on the web, and this act of cloning others work without giving the proper credits to the author can be regarded as the plagiarism. Formally it can be defined as an act or instance of using or closely imitating the language and thoughts of another author without authorization and the representation of that author’s work as one’s own, as by not crediting the original author1 . The different areas where plagiarism can be found is the literature, music, software, scientific articles, research papers, newspapers, advertisements, websites etc. A study carried in United States shows that among 18000 university students almost 40% of them have plagiarized at least once2 . Plagiarism is a growing challenge in the modern society and in order to maintain the academic integrity the use of plagiarism detection tools has become the norm in many higher education institutions, but the effectiveness of detection level depends on the type of algorithm and the type of obfuscation strategy employed by the plagiarist in order to create the plagiarised text. A. Plagiarising Process The process of plagiarising is the simple task of reusing the existing content in a way that it adheres to your requirements of the given task. 1http://dictionary.reference.com/browse/plagiarism 2D. McCabe. Research Report of the Center for Academic Integrity. http://www.academicintegrity.org, 2005 The task includes searching for the related content from the web, text documents or any other resources and then using this content by copying and pasting it into the newly created document by using different obfuscation strategies present to disguise the act of plagiarism detection. After completing this we are done with the new plagiarised version of the existing document, which could be used anywhere required to fulfil the task. The whole task of the plagiarism process can be illustrated using this simple illustration in figure 1. Fig. 1. The basic steps of Plagiarizing. B. Types of Plagiarism Plagiarism is very vast and dynamic i.e. there exist a number of obfuscation strategies which help to create the pla- giarised text. Plagiarism.org has classified and ranked different types of plagiarism based on the severity of the intent. According to the plagiarism.org3 the 10 most common types of plagiarism are illustrated in figure 2. 1) Clone: This type of plagiarism means when a plagiarist 3http://www.plagiarism.org
  • 2. submits another person’s work, word by word as one’s own work. 2) Ctrl-C: When the plagiarised text contains the significant portions of original text without any alterations. 3) Find-Replace: This type of plagiarism includes changing of the keywords in the text but retaining the original content of the source. 4) Remix: When a plagiarist paraphrase the documents from the multiple sources and combine them in the single document. 5) Recycle: When an author uses its previous work, without proper citation in order to form the new documents. this type is sometimes also called as self plagiarism. 6) Hybrid: When a plagiarist Copies different passages from the multiple cited sources, without proper citation. 7) Mashup: This is the mixup of the content from different sources. 8) 404 Error: This type of plagiarism defines when a plagiarist includes citation to non-existent or inaccurate information about sources. 9) Aggregator: This type of plagiarism includes proper citation to sources, but the paper contains no original work. 10) Re-Tweet: In this type of plagiarism the author mentions proper citations but relies too closely on the authors original wording or structure. In section II we will discuss about the plagiarism detection methodologies. The section IV provides a brief study of various plagiarism detection tools. Fig. 2. Types of Plagiarism. II. PLAGIARISM DETECTION Plagiarising means to reuse someone elses work without the proper citation and pretending it to be ones own work. Text plagiarism is one of the old forms of plagiarism and remains difficult to be identified in practice, till this day. The challenges in automatic plagiarism detection has been widely discussed in [1]. In order to create an automatic plagiarism detection system we need to have an existing Corpus and detection techniques. The general detection process is shown in figure 3. The detection process is divided into 3 tasks given as under: • Pre Processing • Intermediate Processing • Post Processing Pre Processing includes uploading source document and retrieving suspicious documents from Corpus based on the source document. Once we acquire the specific data we send this data for intermediate processing. The design issue of this stage include how accurate the searching of documents is done from the corpus. Intermediate processing stage includes the detailed de- tection and comparison of the source and the suspicious documents based on the algorithm running. The design issues of this stage include the running time and the effectiveness of the comparison logic. Post Processing is the final stage of this process include preparing the results of the detection task and based on those results we are actually able to decide whether the source document is plagiarised or not. Fig. 3. Genric retrieval process. In order to detect the plagiarism in source code of program and free text (plain natural language text), the following given methodologies can be used: • Manual detection: This process of plagiarism detec- tion is done manually by the humans by comparing and verifying the given set of document. This process requires a lot of expertise in the particular as we have to look for the various plagiarism strategies. This type of detection is suitable for checking class work, articles, and short notes but is impractical to verify large number of documents and infeasible in the terms of economy and time. • Computer aided detection: This type of detection technique refers to the process of detecting plagiarism with the help of computer system equipped with plagiarism detection algorithm. Since the rise of the process of detection, different approaches have been followed to get the job done in efficient way. Using this type of detection approach over manual plagiarism detection has got both advantages and disadvantages as well, but we always have to trade-off between the performance and the cost associated with it.
  • 3. III. PLAGIARISM DETECTION TECHNIQUES In this section, we present a review on various plagiarism detection techniques developed. The techniques are classified into, Source Code Plagiarism Detection and Free Text Plagia- rism Detection. Fig. 4. Classification of Plagiarism Detection. A. Source Code Plagiarism Detection The different techniques that are used to create the plagia- rised source code include comment removal, identifier renam- ing, structured constant renaming, and removing debugging information. The Source code plagiarism detection techniques include: i) Textual Based Approach: This approach is based on comparison of line or string sequence in the code, this type usually works with raw source code. An example of this approach is diff4 file comparison utility which tries to find out the differences between two files by finding the longest common subsequence. ii) Token Based Approach: In this type of approach source code is parsed into sequence of Tokens depending upon the rules of the programming language. The example of such a type of detector is java based detection tool JPLAG [2]. iii) Tree Based Approach: In this type of approach source code is parsed into parse tree or Abstract Syntax Tree (AST). AST represents the syntactic structure of the parsed source code with abstract representation of every element, and then a tree matching algorithm is used to search the similar sub tree in order to detect the code clones. Clone Digger [3] is the example of such tool. iv) PDG based/ Semantics-Aware Approach: This approach aims at analysing the behaviour of the source code rather than the syntactic features. In this method highly ab- stracted source code representation called PDG (Program 4http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html Dependency Graph) is obtained that carries the semantic information of the source code. The PDG contains the data and control flow and thus ignoring the syntactic structure. After obtaining the PDG, a sub graph matching algorithm is applied to discover similar sub graph which are then returned as clones. Scorpio [4] is example of such a tool. B. Free Text Plagiarism Detection i) Lexical Approach: This approach of plagiarism detection focuses on using the lexical features of the text or docu- ments, which operate at the charcter or the word level of the document [5] in order to trace the plagiarism scenario from the suspicious documents. This type of approach tries to enhance the standard string matching comparison in order to detect the plagiarism. The processing tech- nique that this approach relies upon includes tokeniza- tion, lowercasing, punctuation removal and stemming [6], however it can vary from technique to technique. The comparison units adopted for detecting plagiarism differ from one technique to another, the different such units include words, sentences, passages, human defined sliding window or an n-gram. The summary of the work that have been done using these techniques include [7], [8], [9], [10], [11], [12], [13]. With the evolution of the time and technology researchers are moving forward to breakdown the problem in simpler forms and to get the efficient and desired results, In order to achieve this goal researchers are merging different detection approaches like the usage of Natural Language Processing (NLP) in order to extract key features of the text. ii) Semantic Approach: Plagiarists sometimes use sophisti- cated obfuscation techniques in order to create the pla- giarised text, like changing the words or phrases to those with the similar meaning and in this way they create the plagiarised copies of the original version of the text. The basic detection method using semantic approach is illustrated in the Figure 5. Fig. 5. Hypothetical semantic retrieval approach. In this approach different semantic features which include (Synonyms, hyponyms, hypernyms, semantic dependen- cies) [5] are extracted from the source documents and then these features are used to trace out the plagiarism case from the corpus and the fact database build-up of already existing documents.
  • 4. The semantic approach is aimed to attain the high perfor- mance in terms of detection, and should address the issues of polysemy (same words referring to different things based on the context like mouse the computer input device and mouse the rodent) and synonymy (different words referring to the same things like car and automobile) that are not handled by the lexical (straight forward term matching) approach. Lin et al. [14] has explored semantic similarity using lexical databases such as Stanford Wordnet5 to acquire synonyms, another algorithms that can be used to extract the semantic features of sentences are Latent Dirichelet Allocation [15], another novel way of computing the document is using the RDF framework6 . In this approach a document is represented as RDF triples, an RDF triple has the proper format (subject, predicate, object). For e.g., the sentence (john, livesIn, ohio) where john and ohio are known as entities and the livesIn (predicate) is a relation between two entities. It has a domain (restricting the set of subjects) and a range (restricting the set of objects). The predicate livesIn for instance has the domain humans and the range locations, this is denoted by livesIn(humans, locations) which is a two notion relation. A set of RDF facts is referred to as an ontology and can be extracted from the text documents. The Semantic based approach is not widely used because of the level of difficulty incorporated with this approach however different work that has been done in this area include [16]. IV. TOOLS Various tools have been developed so far to detect the plagiarism. The tools are classified as Online and Offline. Offline tools include those set of plagiarism detection tools which can be run in the offline environment in order to perform the detection process, these tools usually include the inbuilt corpus against which the suspicious document is checked, and this can be one of the limitation of these kind of tools. On the other hand, online tools include those tools which perform the operation of detection in the online enviroment and check the documents against the indexed documents. These tools constantly build up their corpus by indexing the web. Hence, the detection is fairly good as compared to offline tools. A. Offline Tools Some of the offline tools for plagiarism detection are given under: i) CopyCatch: CopyCatch is a plagiarism detection tool that evolved from the WordCheck. Earlier it was a primary plagiarism detection tool used for research papers, and essays etc., its algorithm relied on the principle of hapexlegomena words. These are the words that only appear once in a text, so instead of counting occurrence of every word, it returned a list that only exist once in the document. If a document shared over 50 percent of its hapexlegomena words with another, it was marked as possible plagiarism. The idea was based on the research 5http://ai.stanford.edu/ rion/swn/ 6http://www.w3.org/RDF/ that found independently written texts on the same subject can have upto 50 percent hapexlegomena overlap, but anymore indicates potential plagiarism [17]. ii) SCAM: SCAM (Stanford Copy Analysis Mechanism) [18] is a plagiarism detection tool that first appeared in 1995. Its algorithm was much more statistics oriented than many of the other programs. It first gathers the list of word occurrences exactly like WordCheck. Then it statistically normalise the list based on the number of occurrences essentially i.e., putting the data in bell shaped curve. The list is then stored as vector. SCAM then uses the vector space model [19] to compare this vector with other documents resulting vectors. The vectors were compared using a dot product or a cosine function for similarity. In other words, if distribution of words is similar, then the documents must be similar. iii) CHECK: CHECK is a tool that combines statistical analysis with computer science techniques. It still has to maintain the huge database of documents to compare against the submission. However, it was able to narrow down the search process by restricting the document comparison by attempting to determine the contents and semantics of the paper. So instead of comparing the submitted paper with every document in its database, it would have to compare it with those that were determined to be of similar content. CHECK determines the semantics of the paper by creat- ing its document tree. The CHECK algorithm creates a tree from a document by layering sections, subsections, paragraphs, sentences, etc. B. Online Tools The various online tools for detecting Plagiarism include: i) TurnItIn:It was designed by four UC Berkeley graduate students as a peer review application to use for their classes. Eventually, that prototype developed into one of the most recognizable names in plagiarism detection. TurnItIn, which processed over 60 million academic papers in 2011, is accessible for a fee per educator. Students can use TurnItIn’s WriteCheck service to maintain proper citations and to access various writing tools. ii) IThenticate: Like TurnItIn, iThenticate is a service offered by Plagiarism.org, but is geared more toward professional writing and scholarly research. Publishers like Oxford University Press use iThenticate for its Cross Check software, which includes a database of more than 31 million articles and 67,664 books and journals. iii) Viper: Viper calls itself the ”Free TurnItIn Alternative.” It scans a large database of academic essays and other online sources, offering side-by-side comparisons for plagiarism. But the limitation of this is that it is available for Microsoft Windows users only. iv) PlagiarismChecker.com: PlagiarismChecker.com makes it simple for educators to check for copied work by
  • 5. pasting phrases from a student’s paper into a search box. The system can search through either Google or Yahoo. Users can also use the ”Author” option to check if others have plagiarized their work online. v) PlagiarismDetect.com: PlagiarismDetect.com scans text at a rate of dollar 0.50 per page. The system takes about 5-7 minutes per page, which makes for thorough examination. According to the website, PlagiarismDetect.com has recently updated its system with a new advanced algorithm, combining multi-layered technology and SMART scanning (which supposedly scans papers like humans). vi) Plagiarisma.net: Plagiarisma.net has a search box as well as a software download available for Windows. Users can also search for entire URLs and files in HTML, DOC, DOCX, RTF, TXT, ODT and PDF formats. vii) PlagiarismSoftware.net: Formerly known as (Duplichecker), this minimalistic checker lets users search for text and upload text files. viii) CheckForPlagiarism.net: CheckForPlagiarism.net claims its licensing fees are, on average, between 35% and 70% lower than competing services. Its basic account, meant for high school students, costs dollar 20 and allows users to scan five documents. The service can scan multiple languages, and users can compare papers. ix) Essay Verification Engine (EVE2): The EVE plagiarism detection system is one of the older services on this list, having performed almost 150 million scans since its creation in 2000. It runs users dollar 29.99 for unlimited use and includes a 10-day money-back guarantee. REFERENCES [1] P. Clough and D. O. I. Studies, “Old and new challenges in automatic plagiarism detection,” in National Plagiarism Advisory Service, 2003; http://ir.shef.ac.uk/cloughie/index.html, 2003, pp. 391–407. [2] L. Prechelt, G. Malpohl, and M. Phlippsen, “Jplag: Finding plagiarisms among a set of programs,” Tech. Rep., 2000. [3] M. M. Peter Bulychev, “An evaluation of duplicate code detection using anti-unification,” in Proceedings of the 3rd International Workshop on Software Clones at CSMR, 2009. [4] S. K. Yoshiki Higo, “Code clone detection on specialized pdgs with heuristics,” 2011 15th European Conference on Software Maintenance and Reengineering, vol. 0, pp. 75–84, 2011. [5] S. M. Alzahrani, N. Salim, and A. Abraham, “Understanding plagiarism linguistic patterns, textual features, and detection methods,” Trans. Sys. Man Cyber Part C, vol. 42, no. 2, pp. 133–149, Mar. 2012. [Online]. Available: http://dx.doi.org/10.1109/TSMCC.2011.2134847 [6] M. Chong and L. Specia, “Lexical generalisation for word-level match- ing in plagiarism detection,” in RANLP, 2011, pp. 704–709. [7] S. Brin, J. Davis, and H. Garcia-Molina, “Copy detection mechanisms for digital documents,” in SIGMOD Conference, 1995, pp. 398–409. [8] D. R. White and M. Joy, “Sentence-based natural language plagiarism detection,” ACM Journal of Educational Resources in Computing, vol. 4, no. 4, pp. 1–20, 2004. [9] S. Niezgoda and T. P. Way, “Snitch: a software tool for detecting cut and paste plagiarism,” in SIGCSE, 2006, pp. 51–55. [10] A. Barrón-Cedeño and P. Rosso, “On automatic plagiarism detection based on n-grams comparison,” in ECIR, 2009, pp. 696–700. [11] M. S. Pera and Y.-K. Ng, “A naı̈ve bayes classifier for web document summaries created by using word similarity and significant factors,” International Journal on Artificial Intelligence Tools, vol. 19, no. 4, pp. 465–486, 2010. [12] E. Stamatatos, “Plagiarism detection using stopword n-grams,” JASIST, vol. 62, no. 12, pp. 2512–2527, 2011. [13] J. Grman and R. Ravas, “Improved implementation for finding text similarities in large sets of data - notebook for pan at clef 2011,” in CLEF (Notebook Papers/Labs/Workshop), 2011. [14] H.-H. Chen, M.-S. Lin, and Y.-C. Wei, “Novel association measures using web search with double checking,” in ACL, 2006. [15] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. [16] G. Tsatsaronis, I. Varlamis, and M. Vazirgiannis, “Text relatedness based on a word thesaurus,” J. Artif. Intell. Res. (JAIR), vol. 37, pp. 1–39, 2010. [17] P. Clough, “Plagiarism in natural and programming languages: an overview of current tools and technologies,” 2000. [18] N. Shivakumar and H. Garcia-molina, “Scam: A copy detection mechanism for digital documents,” in In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995. [Online]. Available: http://ilpubs.stanford.edu:8090/95/ [19] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, Nov. 1975. [Online]. Available: http://doi.acm.org/10.1145/361219.361220