SlideShare a Scribd company logo
Plagiarism Detection
Techniques
Nimisha .T
13MCA11030
Contents:
 Introduction
 Definition of Plagiarism
 Avoiding plagiarism
 Text based plagiarism detection techniques
 Tools used for text based plagiarism
 Source code based plagiarism detection techniques
 Tools used for code based plagiarism
 Disadvantages of the plagiarism detection technology
 Conclusion
1
Plagiarism Detection Techniques
Introduction
 Plagiarism is a significant problem on almost every
college and university campus.
 The problems of plagiarism go beyond the campus, and
have become an issue in industry, journalism, and
government activities.
2
Plagiarism Detection Techniques
Definition of Plagiarism
Plagiarize according to the Merriam-Webster Online
dictionary is:
 To steal and pass off the idea or words of another as one’s
own.
 To use another’s production without crediting the source
 To commit literary theft
 To present as new and original idea or product derived
from an existing source.
Plagiarism Detection Techniques 3
The following are considered
as Plagiarism:
 Turning in someone else’s work as your own.
 Copying words or ideas from someone else without
giving credit.
 Failing to put a quotation in quotation marks
 Giving incorrect information about the source of a
quotation.
 Changing words but copying sentence structure.
 Copying so many words or ideas from a source that it
makes up the majority of your work, even though by
credit.
4
Plagiarism Detection Techniques
Deliberate and Accidental
Plagiarism
Deliberate (intentional) Plagiarism :
Steals the property of somebody else and claims it
to be his own.
Accidental (unintentional) Plagiarism :
Somebody unknowingly cites a phrase or copies
words without acknowledging the author of the
material.
5
Plagiarism Detection Techniques
Deliberate and Accidental
Plagiarism
6
Plagiarism Detection Techniques
Avoiding plagiarism
Two methods :
 Plagiarism prevention
 Plagiarism detection
7
Plagiarism Detection Techniques
Plagiarism Prevention
 Collaborative effort for recognize and counter plagiarism
at every level.
 Educate students about the appropriate use of intellectual
material.
 Minimize the possibility of submission of plagiarized
content.
 Plagiarism prevention is difficult to achieve & also take
a long time.
8
Plagiarism Detection Techniques
Plagiarism Detection
9
Plagiarism Detection Techniques
Culwin and Lancaster’s four stages of
detecting plagiarism:
Plagiarism Detection technique
10
Plagiarism Detection Techniques
 Text based plagiarism detection techniques
 Source code based plagiarism detection techniques
Text based plagiarism
detection techniques
 Substring matching
 Keyword similarity
 Exact fingerprint match
 Text parsing
11
Plagiarism Detection Techniques
Substring Matching
 Try to identify maximal matches in pairs
 which then are used as plagiarism indicators.
 Typically, the substrings are represented in suffix trees.
 Graph-based measures are employed to capture the
fraction of the plagiarized sections.
12
Plagiarism Detection Techniques
Keyword Similarity
 Extract topic identifying keywords from a document.
 Compare with keywords of other document.
 If the similarity exceeds a threshold, the candidate
documents are divided into smaller pieces.
 which are then compared recursively.
 This approach assumes that plagiarism usually happens
in topically similar documents.
13
Plagiarism Detection Techniques
Exact Fingerprint Match
 The documents are partitioned into term sequences
called chunks.
 which then are used as plagiarism indicators.
 from which digital digests are computed that form the
document’s fingerprint.
 digests are inserted into a hash table then collisions
indicate matching sequences.
 For the fingerprint computation some standard hashing
suffers from two severe problems:
 Computationally expensive,
 A small chunk size (3-10 words) must be chosen
to identify matching passages
14
Plagiarism Detection Techniques
Text parsing
 Any sentence of the text can be automatically
represented in the form of the tree.
 which reflects the structure of the sentence
 Example: The phrase the monkey ate the banana will
be parsed by such software as,
┌──────SENTENCE─────┐
SUBJECT └─VERB OBJECT
ARTICLE └─ ate ARTICLE
└─ the └─ the
NOUN NOUN
└─monkey └─ banana
15
Plagiarism Detection Techniques
Text parsing (Continue…)
 Once a parse tree is created, we can invoke a tree
matching procedure
 Initially the algorithm builds a flowchart-styled parse
tree for each file to be analyzed
 Then for each pair of files, the algorithm performs a
rough “abstract comparison”, when only types of the
parse tree elements ( like Assignment, Loop, Branching)
are taken into account.
 This is done recursively for the each level of tree nodes
If the similarity percentage becomes lower, the trees are
immediately treated as not similar.
16
Plagiarism Detection Techniques
Text parsing (Continue…)
 If the abstract comparison indicates enough similarity,
a special low-level “micro comparison” procedure is
invoked.
 Each node represents an individual statement
 Each tree node turns into a separate sub tree that has to
be compared with the corresponding sub tree taken from
another file.
 E.g. the phrases the monkey ate the banana and the
banana was eaten by the monkey will be very close after
the tokenization.
17
Plagiarism Detection Techniques
Tools used for text based plagiarism
Some tools are:
 PlagAware
 PlagScan
 CheckForPlagiarism.net
 iThenticate
 PlagiarismDetection.org
18
Plagiarism Detection Techniques
PlagAware
 Is an online-service used for plagiarism detection
 It can search, find, analyze and trace plagiarism in the
specified topic similar to the topics
 PlagAware is a search Engine
 provide different types of report that help the user to
decide that is his document has been plagiarized or not
 Mainly used in academic filed
 Multiple Document Comparison
 Does not support synonym and sentence structure
checking.
19
Plagiarism Detection Techniques
PlagScan
 It is online software used for textual plagiarism checker
 Complex algorithms for checking and analyzing
uploaded document
 Unique signature extracted from the document’s
structure that is then compared with PlagScan database
and millions of online documents.
 Detect most of plagiarism types either directs copy and
paste or words switching
 PlagScan supports all the language that use the
international UTF-8 encoding and all language with
Latin or Arabic characters
20
Plagiarism Detection Techniques
CheckForPlagiarism.net
 One of the best online plagiarism checkers that used to
stop or prevention of online plagiarism.
 The fingerprint-based approach used to analyze and
summarize collection of document and create a kind of
fingerprint for it.
 Uses its own database that include millions documents
and articles over World Wide Web
 Support synonym and sentence structure checking
 Can compare set of different documents simultaneously
with other documents
21
Plagiarism Detection Techniques
iThenticate
 One of the application or services designed especially
for the researchers and authors’ publisher
 It have own database that contain millions of documents
 Users who have account can do either online and
offline comparison of submitted documents against it
and to identify plagiarized content.
 Considered as the first online plagiarism checker
 Document to document and multiple documents checking
 Supports more than 30 languages
 Does not support synonym and sentence structure
checking
22
Plagiarism Detection Techniques
PlagiarismDetection.org
 It is an online service provides high level of accuracy
result in plagiarism detection
 Use its own database that contains millions of documents
 Supports English languages and all languages that using
Latin characters
 Does not support multiple document comparison
 Does not support synonym and sentence structure
checking
23
Plagiarism Detection Techniques
Source code based plagiarism
detection techniques
 Lexical Similarities
 Parse Tree Similarities
 Program Dependence Graphs
 Metrics
24
Plagiarism Detection Techniques
Lexical Similarities
 Converts source code into a stream of lexical tokens
from which compiler extract meaning from the source.
 During the lexical analysis phase, the source code
undergoes a series of transformation
 Some of these transformations, such as the identification
of reserved words, identifiers are beneficial for plagiarism
detection.
Plagiarism Detection Techniques 25
Lexical Similarities (Continue…)
 Consider the following two snippets of Java Code:
Plagiarism Detection Techniques 26
int[] A = {1,2,3,4};
for(int i = 0; i < A.length; i++)
{
A[i] = A[i] + 1;
}
int[] B = {1, 2, 3, 4};
for(int j = 0; j < B.length; j++)
{
B[j] = B[j] + 1;
}
Lexical Similarities (Continue…)
The lexical stream of the 2 snippets of code is :
LITERAL_int LBRACK RBRACK IDENT ASSIGN
LCURLY NUM_INT COMMA NUM_INT COMMA
NUM_INT COMMA NUM_INT RCURLY SEMI
LITERAL_for LPAREN LITERAL_int IDENT ASSIGN
NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI
IDENT INC RPAREN LCURLY NUM_INT SEMI
Both the java snippets will have the exact lexical stream
Plagiarism Detection Techniques 27
0
Parse Tree Similarities
 The parse tree or derivation tree built from the lexical
for a program also exhibits structure for a given program
 A compiler, during the compilation process builds a
parse tree which represents the program.
 The parse tree will have the same structure for both the
snippet of code as the lexical streams are same.
 An algorithm for detecting plagiarism using this method
would first, parse each program.
Next, for each pair of parse trees, it attempts to find as
many common sub trees as possible.
Use this number as a measure of similarity between the
two programs
Plagiarism Detection Techniques 28
Plagiarism Detection Techniques 29
Program Dependence Graphs(PDG)
 PDG is a graph representation of the source Code
 It is a directed, labeled graph which represents the data
and the control dependencies within one procedure.
 Basic statements like variable declarations, assignments,
and procedure calls are represented by vertices in PDGs.
 It depicts how the data flows between statements and
how statements are controlled by other statements.
 The data and control dependencies between statements
are represented by edges between vertices in PDGs
 Data and control dependencies are plotted in solid and
dashed lines respectively.
Plagiarism Detection Techniques 30
Program Dependence Graphs
(Continue…)
Plagiarism Detection Techniques 31
Example:
Metrics
 Plagiarism detection by similarity analysis using
software metrics.
 Software metrics are:
 Number of function calls
 Number of used or defined local variables
 Number of used or defined non-local variables
 Number of parameters
 Number of statements
 Number of branches
 Number of loops
Plagiarism Detection Techniques 32
Metrics (Continue…)
 Each fragment characterized by a set of features
measured by metrics
 Metrics computation requires the parsing of source code
to identifying interesting fragments
 Metrics are simple to calculate and can be compared
quickly
 False positives: two fragments with the same scores on a
set of metrics may do entirely different things.
Plagiarism Detection Techniques 33
Tools used for code based
plagiarism
 MOSS
 JPlag
 CodeMatch
Plagiarism Detection Techniques 34
MOSS (Measure of Software Similarity)
 Can be applied to a range of programming languages
 Registered instructors can submit batches of programs to
the moss server.
 Result is placed on a web page on the moss web server.
 A link to that web page is returned when checking the
document is finished.
 The MOSS database stores an internal representation of
programs, and then looks for similarities between them.
35Plagiarism Detection Techniques
JPlag
 JPlag compares submitted programs in pairs
 It assumes that plagiarists may vary the names of
variables or classes, but they are least likely to change
the control structure of a program.
 It presents its results as a set of HTML pages.
 The pages are sent back to the client and stored locally.
 JPlag was easier to use but supported fewer languages
than MOSS
36Plagiarism Detection Techniques
CodeMatch
 Compares thousands of source code files in multiple
directories and subdirectories
 Determine those files which are closely correlated.
 Useful for finding open source code within
proprietary code.
 Discovering common standard algorithms within
different programs.
37Plagiarism Detection Techniques
Disadvantages of the plagiarism
detection technology
 Plagiarism detection systems are built based on a few
languages. To check for plagiarism with the same
software can be difficult.
 Most of the detection software checking is done with
some repository situated in an organization. Other
people are unable to access it and verify for plagiarism.
 As the number of digital copies are going up the
repository size should be large and the plagiarism
detection software should be able to handle it.
 Some software ask us to load a file to their link .The
file is copied to their database . This cause our data
being leaked or hacked for other purposes.
38
Plagiarism Detection Techniques
Conclusion
 Plagiarism is rampant now. With most of the data
available to us in digital format the venues for
plagiarism is opening up.
 To avoid this kind of cheating and to acknowledge
the originality of the author new detection techniques
are to be created.
 To protect the intellectual property source code new
techniques are to be developed and implemented.
39Plagiarism Detection Techniques
References
39Plagiarism Detection Techniques
 http://www.ijirae.com/volumes/vol1/issue7/AUCS10085.
06.pdf
 http://dspace.cusat.ac.in/jspui/bitstream/123456789/3618/
1/PDT.pdf
 http://elearningindustry.com/top-10-free-plagiarism-
detection-tools-for-teachers
 http://www.plagiarism.org/plagiarism-101/what-is-
plagiarism/
 http://www.cs.uu.nl/research/techreps/repo/CS2010/2010-
015.Pdf
 https://en.wikipedia.org/wiki/Category:Plagiarism_detecto
rs
10
Plagiarism Detection Techniques
?
10
Plagiarism Detection Techniques

More Related Content

What's hot

Web of Science
Web of ScienceWeb of Science
Web of Science
Shuvra Ghosh
 
Scientific misconduct
Scientific misconductScientific misconduct
Scientific misconduct
Pokhara University, Pokhara, Nepal
 
Plagiarism:-Types and Causes
Plagiarism:-Types and CausesPlagiarism:-Types and Causes
Plagiarism:-Types and Causes
VANDANAKELKAR
 
Impact Factor
Impact Factor Impact Factor
Impact Factor
haroldtaylor1113
 
Plagiarism ppt
Plagiarism pptPlagiarism ppt
Plagiarism ppt
Rocío Bautista
 
Types of plagiarism
Types of plagiarismTypes of plagiarism
Types of plagiarism
gopinathannsriramachandraeduin
 
Presentation on journal suggestion tool and journal finder
Presentation on journal suggestion tool and journal finderPresentation on journal suggestion tool and journal finder
Presentation on journal suggestion tool and journal finder
shilpasharma203749
 
Open Access Publications PPT.pdf
Open Access Publications PPT.pdfOpen Access Publications PPT.pdf
Open Access Publications PPT.pdf
Ramesh K Kuri
 
Plagiarism
PlagiarismPlagiarism
Plagiarism
Dr Usha (Physio)
 
Publication ethics
Publication ethicsPublication ethics
Publication ethics
Roger Watson
 
Complaints appeals example and fraud from india and abroad
Complaints appeals  example and fraud from india and abroadComplaints appeals  example and fraud from india and abroad
Complaints appeals example and fraud from india and abroad
geerijalavania
 
Predatory publishers and journals
Predatory publishers and journalsPredatory publishers and journals
Predatory publishers and journals
NEETHU P RAJEEV
 
Research Methodology-02: Quality Indices
Research Methodology-02: Quality IndicesResearch Methodology-02: Quality Indices
Research Methodology-02: Quality Indices
Bhasker Vijaykumar Bhatt
 
Citation Database
Citation Database Citation Database
Citation Database
Dr. Utpal Das
 
Research Metrics
Research MetricsResearch Metrics
Research Metrics
Surendra Kumar Pal
 
Author Level Metrics
Author Level MetricsAuthor Level Metrics
Author Level Metrics
Dr. M Vijayakumar
 
Publication ethics: Definitions, Introduction and Importance
Publication ethics: Definitions, Introduction and ImportancePublication ethics: Definitions, Introduction and Importance
Publication ethics: Definitions, Introduction and Importance
Vasantha Raju N
 
Selective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptxSelective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptx
DrDollyThankachan
 
Citation indexing
Citation indexingCitation indexing
Citation indexing
silambu111
 

What's hot (20)

Web of Science
Web of ScienceWeb of Science
Web of Science
 
Scientific misconduct
Scientific misconductScientific misconduct
Scientific misconduct
 
Plagiarism:-Types and Causes
Plagiarism:-Types and CausesPlagiarism:-Types and Causes
Plagiarism:-Types and Causes
 
Impact Factor
Impact Factor Impact Factor
Impact Factor
 
Plagiarism ppt
Plagiarism pptPlagiarism ppt
Plagiarism ppt
 
Types of plagiarism
Types of plagiarismTypes of plagiarism
Types of plagiarism
 
Presentation on journal suggestion tool and journal finder
Presentation on journal suggestion tool and journal finderPresentation on journal suggestion tool and journal finder
Presentation on journal suggestion tool and journal finder
 
Open Access Publications PPT.pdf
Open Access Publications PPT.pdfOpen Access Publications PPT.pdf
Open Access Publications PPT.pdf
 
Citation indexing
Citation indexingCitation indexing
Citation indexing
 
Plagiarism
PlagiarismPlagiarism
Plagiarism
 
Publication ethics
Publication ethicsPublication ethics
Publication ethics
 
Complaints appeals example and fraud from india and abroad
Complaints appeals  example and fraud from india and abroadComplaints appeals  example and fraud from india and abroad
Complaints appeals example and fraud from india and abroad
 
Predatory publishers and journals
Predatory publishers and journalsPredatory publishers and journals
Predatory publishers and journals
 
Research Methodology-02: Quality Indices
Research Methodology-02: Quality IndicesResearch Methodology-02: Quality Indices
Research Methodology-02: Quality Indices
 
Citation Database
Citation Database Citation Database
Citation Database
 
Research Metrics
Research MetricsResearch Metrics
Research Metrics
 
Author Level Metrics
Author Level MetricsAuthor Level Metrics
Author Level Metrics
 
Publication ethics: Definitions, Introduction and Importance
Publication ethics: Definitions, Introduction and ImportancePublication ethics: Definitions, Introduction and Importance
Publication ethics: Definitions, Introduction and Importance
 
Selective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptxSelective Reporting and Misrepresentation.pptx
Selective Reporting and Misrepresentation.pptx
 
Citation indexing
Citation indexingCitation indexing
Citation indexing
 

Similar to plagiarism detection tools and techniques

A Survey On Plagiarism Detection
A Survey On Plagiarism DetectionA Survey On Plagiarism Detection
A Survey On Plagiarism Detection
Karla Adamson
 
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic ApproachA Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
Courtney Esco
 
A framework for plagiarism
A framework for plagiarismA framework for plagiarism
A framework for plagiarism
csandit
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
acijjournal
 
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docxRESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
infantkimber
 
Cognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best FriendCognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best Friend
SparkCognition
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
IOSR Journals
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Waqas Tariq
 
Importance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptxImportance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptx
helloprassy
 
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
cscpconf
 
csmalware_malware
csmalware_malwarecsmalware_malware
csmalware_malwareJoshua Saxe
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdf
Kayla Smith
 
Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...
CSCJournals
 
Plagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docxPlagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docx
Northern University,Bangladesh.
 
FNC Corporate Protect Workshop
FNC Corporate Protect WorkshopFNC Corporate Protect Workshop
FNC Corporate Protect Workshopforensicsnation
 
Classification with R
Classification with RClassification with R
Classification with R
Najima Begum
 
03.fnc corporate protect workshop new
03.fnc corporate protect workshop new03.fnc corporate protect workshop new
03.fnc corporate protect workshop newforensicsnation
 

Similar to plagiarism detection tools and techniques (20)

A Survey On Plagiarism Detection
A Survey On Plagiarism DetectionA Survey On Plagiarism Detection
A Survey On Plagiarism Detection
 
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic ApproachA Review Of Plagiarism Detection Based On Lexical And Semantic Approach
A Review Of Plagiarism Detection Based On Lexical And Semantic Approach
 
Plag detection
Plag detectionPlag detection
Plag detection
 
A framework for plagiarism
A framework for plagiarismA framework for plagiarism
A framework for plagiarism
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docxRESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
RESPOND TO THIS DISCUSSION POST BASED ON THE TOPIC Compare and co.docx
 
Cognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best FriendCognitive Security: How Artificial Intelligence is Your New Best Friend
Cognitive Security: How Artificial Intelligence is Your New Best Friend
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
 
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
Comparing Three Plagiarism Tools (Ferret, Sherlock, and Turnitin)
 
Research Report
Research ReportResearch Report
Research Report
 
Importance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptxImportance of String in Programming Languages.pptx
Importance of String in Programming Languages.pptx
 
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
EVALUATION OF THE SHAPD2 ALGORITHM EFFICIENCY IN PLAGIARISM DETECTION TASK US...
 
csmalware_malware
csmalware_malwarecsmalware_malware
csmalware_malware
 
A Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdfA Tool to Detect Plagiarism in Java Source Code.pdf
A Tool to Detect Plagiarism in Java Source Code.pdf
 
Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...Combining Approximate String Matching Algorithms and Term Frequency In The De...
Combining Approximate String Matching Algorithms and Term Frequency In The De...
 
Plagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docxPlagiarism Preventive Initiatives and AI Tools......docx
Plagiarism Preventive Initiatives and AI Tools......docx
 
FNC Corporate Protect Workshop
FNC Corporate Protect WorkshopFNC Corporate Protect Workshop
FNC Corporate Protect Workshop
 
Classification with R
Classification with RClassification with R
Classification with R
 
03.fnc corporate protect workshop new
03.fnc corporate protect workshop new03.fnc corporate protect workshop new
03.fnc corporate protect workshop new
 
FNC Corporate Protect
FNC Corporate ProtectFNC Corporate Protect
FNC Corporate Protect
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

plagiarism detection tools and techniques

  • 2. Contents:  Introduction  Definition of Plagiarism  Avoiding plagiarism  Text based plagiarism detection techniques  Tools used for text based plagiarism  Source code based plagiarism detection techniques  Tools used for code based plagiarism  Disadvantages of the plagiarism detection technology  Conclusion 1 Plagiarism Detection Techniques
  • 3. Introduction  Plagiarism is a significant problem on almost every college and university campus.  The problems of plagiarism go beyond the campus, and have become an issue in industry, journalism, and government activities. 2 Plagiarism Detection Techniques
  • 4. Definition of Plagiarism Plagiarize according to the Merriam-Webster Online dictionary is:  To steal and pass off the idea or words of another as one’s own.  To use another’s production without crediting the source  To commit literary theft  To present as new and original idea or product derived from an existing source. Plagiarism Detection Techniques 3
  • 5. The following are considered as Plagiarism:  Turning in someone else’s work as your own.  Copying words or ideas from someone else without giving credit.  Failing to put a quotation in quotation marks  Giving incorrect information about the source of a quotation.  Changing words but copying sentence structure.  Copying so many words or ideas from a source that it makes up the majority of your work, even though by credit. 4 Plagiarism Detection Techniques
  • 6. Deliberate and Accidental Plagiarism Deliberate (intentional) Plagiarism : Steals the property of somebody else and claims it to be his own. Accidental (unintentional) Plagiarism : Somebody unknowingly cites a phrase or copies words without acknowledging the author of the material. 5 Plagiarism Detection Techniques
  • 8. Avoiding plagiarism Two methods :  Plagiarism prevention  Plagiarism detection 7 Plagiarism Detection Techniques
  • 9. Plagiarism Prevention  Collaborative effort for recognize and counter plagiarism at every level.  Educate students about the appropriate use of intellectual material.  Minimize the possibility of submission of plagiarized content.  Plagiarism prevention is difficult to achieve & also take a long time. 8 Plagiarism Detection Techniques
  • 10. Plagiarism Detection 9 Plagiarism Detection Techniques Culwin and Lancaster’s four stages of detecting plagiarism:
  • 11. Plagiarism Detection technique 10 Plagiarism Detection Techniques  Text based plagiarism detection techniques  Source code based plagiarism detection techniques
  • 12. Text based plagiarism detection techniques  Substring matching  Keyword similarity  Exact fingerprint match  Text parsing 11 Plagiarism Detection Techniques
  • 13. Substring Matching  Try to identify maximal matches in pairs  which then are used as plagiarism indicators.  Typically, the substrings are represented in suffix trees.  Graph-based measures are employed to capture the fraction of the plagiarized sections. 12 Plagiarism Detection Techniques
  • 14. Keyword Similarity  Extract topic identifying keywords from a document.  Compare with keywords of other document.  If the similarity exceeds a threshold, the candidate documents are divided into smaller pieces.  which are then compared recursively.  This approach assumes that plagiarism usually happens in topically similar documents. 13 Plagiarism Detection Techniques
  • 15. Exact Fingerprint Match  The documents are partitioned into term sequences called chunks.  which then are used as plagiarism indicators.  from which digital digests are computed that form the document’s fingerprint.  digests are inserted into a hash table then collisions indicate matching sequences.  For the fingerprint computation some standard hashing suffers from two severe problems:  Computationally expensive,  A small chunk size (3-10 words) must be chosen to identify matching passages 14 Plagiarism Detection Techniques
  • 16. Text parsing  Any sentence of the text can be automatically represented in the form of the tree.  which reflects the structure of the sentence  Example: The phrase the monkey ate the banana will be parsed by such software as, ┌──────SENTENCE─────┐ SUBJECT └─VERB OBJECT ARTICLE └─ ate ARTICLE └─ the └─ the NOUN NOUN └─monkey └─ banana 15 Plagiarism Detection Techniques
  • 17. Text parsing (Continue…)  Once a parse tree is created, we can invoke a tree matching procedure  Initially the algorithm builds a flowchart-styled parse tree for each file to be analyzed  Then for each pair of files, the algorithm performs a rough “abstract comparison”, when only types of the parse tree elements ( like Assignment, Loop, Branching) are taken into account.  This is done recursively for the each level of tree nodes If the similarity percentage becomes lower, the trees are immediately treated as not similar. 16 Plagiarism Detection Techniques
  • 18. Text parsing (Continue…)  If the abstract comparison indicates enough similarity, a special low-level “micro comparison” procedure is invoked.  Each node represents an individual statement  Each tree node turns into a separate sub tree that has to be compared with the corresponding sub tree taken from another file.  E.g. the phrases the monkey ate the banana and the banana was eaten by the monkey will be very close after the tokenization. 17 Plagiarism Detection Techniques
  • 19. Tools used for text based plagiarism Some tools are:  PlagAware  PlagScan  CheckForPlagiarism.net  iThenticate  PlagiarismDetection.org 18 Plagiarism Detection Techniques
  • 20. PlagAware  Is an online-service used for plagiarism detection  It can search, find, analyze and trace plagiarism in the specified topic similar to the topics  PlagAware is a search Engine  provide different types of report that help the user to decide that is his document has been plagiarized or not  Mainly used in academic filed  Multiple Document Comparison  Does not support synonym and sentence structure checking. 19 Plagiarism Detection Techniques
  • 21. PlagScan  It is online software used for textual plagiarism checker  Complex algorithms for checking and analyzing uploaded document  Unique signature extracted from the document’s structure that is then compared with PlagScan database and millions of online documents.  Detect most of plagiarism types either directs copy and paste or words switching  PlagScan supports all the language that use the international UTF-8 encoding and all language with Latin or Arabic characters 20 Plagiarism Detection Techniques
  • 22. CheckForPlagiarism.net  One of the best online plagiarism checkers that used to stop or prevention of online plagiarism.  The fingerprint-based approach used to analyze and summarize collection of document and create a kind of fingerprint for it.  Uses its own database that include millions documents and articles over World Wide Web  Support synonym and sentence structure checking  Can compare set of different documents simultaneously with other documents 21 Plagiarism Detection Techniques
  • 23. iThenticate  One of the application or services designed especially for the researchers and authors’ publisher  It have own database that contain millions of documents  Users who have account can do either online and offline comparison of submitted documents against it and to identify plagiarized content.  Considered as the first online plagiarism checker  Document to document and multiple documents checking  Supports more than 30 languages  Does not support synonym and sentence structure checking 22 Plagiarism Detection Techniques
  • 24. PlagiarismDetection.org  It is an online service provides high level of accuracy result in plagiarism detection  Use its own database that contains millions of documents  Supports English languages and all languages that using Latin characters  Does not support multiple document comparison  Does not support synonym and sentence structure checking 23 Plagiarism Detection Techniques
  • 25. Source code based plagiarism detection techniques  Lexical Similarities  Parse Tree Similarities  Program Dependence Graphs  Metrics 24 Plagiarism Detection Techniques
  • 26. Lexical Similarities  Converts source code into a stream of lexical tokens from which compiler extract meaning from the source.  During the lexical analysis phase, the source code undergoes a series of transformation  Some of these transformations, such as the identification of reserved words, identifiers are beneficial for plagiarism detection. Plagiarism Detection Techniques 25
  • 27. Lexical Similarities (Continue…)  Consider the following two snippets of Java Code: Plagiarism Detection Techniques 26 int[] A = {1,2,3,4}; for(int i = 0; i < A.length; i++) { A[i] = A[i] + 1; } int[] B = {1, 2, 3, 4}; for(int j = 0; j < B.length; j++) { B[j] = B[j] + 1; }
  • 28. Lexical Similarities (Continue…) The lexical stream of the 2 snippets of code is : LITERAL_int LBRACK RBRACK IDENT ASSIGN LCURLY NUM_INT COMMA NUM_INT COMMA NUM_INT COMMA NUM_INT RCURLY SEMI LITERAL_for LPAREN LITERAL_int IDENT ASSIGN NUM_INT SEMI IDENT LT IDENT DOT IDENT SEMI IDENT INC RPAREN LCURLY NUM_INT SEMI Both the java snippets will have the exact lexical stream Plagiarism Detection Techniques 27 0
  • 29. Parse Tree Similarities  The parse tree or derivation tree built from the lexical for a program also exhibits structure for a given program  A compiler, during the compilation process builds a parse tree which represents the program.  The parse tree will have the same structure for both the snippet of code as the lexical streams are same.  An algorithm for detecting plagiarism using this method would first, parse each program. Next, for each pair of parse trees, it attempts to find as many common sub trees as possible. Use this number as a measure of similarity between the two programs Plagiarism Detection Techniques 28
  • 31. Program Dependence Graphs(PDG)  PDG is a graph representation of the source Code  It is a directed, labeled graph which represents the data and the control dependencies within one procedure.  Basic statements like variable declarations, assignments, and procedure calls are represented by vertices in PDGs.  It depicts how the data flows between statements and how statements are controlled by other statements.  The data and control dependencies between statements are represented by edges between vertices in PDGs  Data and control dependencies are plotted in solid and dashed lines respectively. Plagiarism Detection Techniques 30
  • 32. Program Dependence Graphs (Continue…) Plagiarism Detection Techniques 31 Example:
  • 33. Metrics  Plagiarism detection by similarity analysis using software metrics.  Software metrics are:  Number of function calls  Number of used or defined local variables  Number of used or defined non-local variables  Number of parameters  Number of statements  Number of branches  Number of loops Plagiarism Detection Techniques 32
  • 34. Metrics (Continue…)  Each fragment characterized by a set of features measured by metrics  Metrics computation requires the parsing of source code to identifying interesting fragments  Metrics are simple to calculate and can be compared quickly  False positives: two fragments with the same scores on a set of metrics may do entirely different things. Plagiarism Detection Techniques 33
  • 35. Tools used for code based plagiarism  MOSS  JPlag  CodeMatch Plagiarism Detection Techniques 34
  • 36. MOSS (Measure of Software Similarity)  Can be applied to a range of programming languages  Registered instructors can submit batches of programs to the moss server.  Result is placed on a web page on the moss web server.  A link to that web page is returned when checking the document is finished.  The MOSS database stores an internal representation of programs, and then looks for similarities between them. 35Plagiarism Detection Techniques
  • 37. JPlag  JPlag compares submitted programs in pairs  It assumes that plagiarists may vary the names of variables or classes, but they are least likely to change the control structure of a program.  It presents its results as a set of HTML pages.  The pages are sent back to the client and stored locally.  JPlag was easier to use but supported fewer languages than MOSS 36Plagiarism Detection Techniques
  • 38. CodeMatch  Compares thousands of source code files in multiple directories and subdirectories  Determine those files which are closely correlated.  Useful for finding open source code within proprietary code.  Discovering common standard algorithms within different programs. 37Plagiarism Detection Techniques
  • 39. Disadvantages of the plagiarism detection technology  Plagiarism detection systems are built based on a few languages. To check for plagiarism with the same software can be difficult.  Most of the detection software checking is done with some repository situated in an organization. Other people are unable to access it and verify for plagiarism.  As the number of digital copies are going up the repository size should be large and the plagiarism detection software should be able to handle it.  Some software ask us to load a file to their link .The file is copied to their database . This cause our data being leaked or hacked for other purposes. 38 Plagiarism Detection Techniques
  • 40. Conclusion  Plagiarism is rampant now. With most of the data available to us in digital format the venues for plagiarism is opening up.  To avoid this kind of cheating and to acknowledge the originality of the author new detection techniques are to be created.  To protect the intellectual property source code new techniques are to be developed and implemented. 39Plagiarism Detection Techniques
  • 41. References 39Plagiarism Detection Techniques  http://www.ijirae.com/volumes/vol1/issue7/AUCS10085. 06.pdf  http://dspace.cusat.ac.in/jspui/bitstream/123456789/3618/ 1/PDT.pdf  http://elearningindustry.com/top-10-free-plagiarism- detection-tools-for-teachers  http://www.plagiarism.org/plagiarism-101/what-is- plagiarism/  http://www.cs.uu.nl/research/techreps/repo/CS2010/2010- 015.Pdf  https://en.wikipedia.org/wiki/Category:Plagiarism_detecto rs