SlideShare a Scribd company logo
1 of 3
Abstract - This paper introduces a computer
based plagiarism detection technique which
combines the functionality of substring
matching and keyword similarity to give more
accurate results. Also to make the algorithm
more efficient clustering is done in which
cluster of similar documents is created using
LCS (Longest Common Subsequence Method).
1. INTRODUCTION
Different people define plagiarism in different
ways. In layman language, plagiarism is an
unacknowledged, act of copying someone’s
work. Technically, as described by Wikipedia
[1], it is “wrongful appropriation" and "stealing
and publication" of another author’s "language,
thoughts, ideas, or expressions" and the
representation of them as one's own original
work”.
So, to detect plagiarism one can use either
manual or computer based techniques. As
manual techniques to detect plagiarism are
difficult to implement so there is a need to
develop computer based techniques which
would efficiently detect plagiarism.
So, this paper introduces an efficient technique
to detect plagiarism for text which uses:
1. Clustering using longest common substring
algorithm.
2. Substring Matching
3. Keyword Similarity
A. CLUSTERING
If all documents present are compared with the
reference document it would require a lot of time.
So to save time clustering based plagiarism
detection technique is used in which a cluster or
group of similar kind of documents is created
among which document is compared. These
similar documents are than compared by other
plagiarism detection techniques like substring
matching to get more efficient result of plagiarism.
B. SUBSTRING MATCHING
Here, in this technique, a pattern or a string is
compared with the document. The document is
divided using any indicator like ‘.’ , ’,’ , ’?’ etc.
It plays an important role in detecting
plagiarism in application source codes.
C. KEYWORD SIMILARITY
In this method of plagiarism detection, a
keyword is given and based on that the
similarity between the document is calculated.
2. PROPOSED WORK
Our main aim is to develop an efficient
algorithm to determine text based plagiarism.
We have developed a plagiarism detection
application in which clustering based on longest
common subsequence (LCS) [2], keyword
similarity [3] and substring matching [4]
algorithms are used. We have implemented it by
using C++ and Python as a programming
language.
Plagiarism Detection Technique
Vibhanshu
Manav Bagai
Siddharth Gupta
Department of Computer Engineering,
Zakir Hussain College of Engineering and Technology,
Aligarh.
A. FLOW OF THE SYSTEM
Fig. 1. Flow of the system.
Fig. 1, illustrates the general outline how the
whole algorithm works which start from
clustering which develop a cluster of similar
documents on which substring matching and at
last keyword similarity is applied to detect
plagiarism. Detailed outline of each process will
be given below.
B. CLUSTERING
The longest common subsequence (or LCS)
algorithm [4] finds the longest string between
two given strings that are common between the
two groups and in the same order in each string.
To add to the functionality and accuracy of
above algorithm clustering is proposed by us,
As there are many documents to be compared so
this may take a lot of time. So, to solve this
problem clustering is used. Here a cluster is
created which mainly contain those files which
are similar to the document to be compared by
Longest Common Subsequence Method.
Let the two documents compared be X and Y
where X is reference document and Y is
document to be compared having length m and
n respectively.
From longest common subsequence method we
will get the length of lowest common
subsequence, let it be lcs. To find the similarity
between the documents following method is
used:
R=lcs/m (1.1)
S= lcs/n (1.2)
Then we will find F which is equal to
F= ((1+(B*B))*R*S)/(R+B*B*S) (1.3)
Where, B= S/R
What is F, R,S AND B? *mention equation nos.
in the para all mention fig2*
The document which is having more value of F
is more similar. If documents are completely
same we get the value of F equal to 1. A cluster
is made for documents which are more similar
to reference document so that next step is
applied to only those document.
Fig. 2. xxxxxxx
C. SUBSTRING MATCHING AND
KEYWORD SIMILARITY
After getting the clusters of the similar
documents using LCS, they are no compared
using substring matching technique. In this
method, we will break the strings from both the
Clustering
Substring
Matching
Keyword
Similarity
Finding length of
longest common
substring
Perform Calculation
and find F
Compare F for
different
documents and
creating a cluster
among them.
documents into substring based on ‘.’, ‘!’, ‘?’.
Then, we will compare all the substrings from
both the documents (the reference and the
documents from the cluster). If the substrings
are found similar, we will increase the
plagiarism count. After this, we will apply
keyword similarity method, we will ask for a
keyword of the document and then using that
keyword we will find the sentences from both
the documents with that keyword. Then, we will
compare those sentences again and if found
same we will add them to the plagiarism set.
This is shown in the Algorithm [5] below:
Terms Used:
Suspected document - Q;
Reference document - D;
Sentences in Q - {q1,q2,…qn};
Sentences in reference document - {d1,
d2,…dn};
Plagiarism set - P=Null;
Input: Q.
For Q
Separate sentences Q= {q1,q2,….,qn};
For every q in Q,
Compare with reference
document D,
If (q==d)
Add sentence to plagiarized set
P,
Update result. P=P+q;
End if
End for
If (P==NULL)
Display “document is plagiarism free”
Else
Display,
set P as a plagiarized sentences.
Highlight all P in Q as plagiarism text.
End else
End for
3. CONCLUSION
In this paper we have proposed a new
plagiarism detection method which is a
combination of LCS algorithm, Substring
matching and Keyword similarity. This
algorithm is much more efficient than the
traditional substring based algorithm [6] as it
firstly forms the cluster of more similar
documents and then apply a hybrid algorithm of
substring matching and keyword similarity to
get more accurate results.
4. REFERENCES
[1] Font size :10pt, Font type: Times new
roman, First authors, title of research
reference,volume.
[2] G. Eason, B. Noble, and I.N. Sneddon, “On
certain integrals of Lipschitz-Hankel type
involving products of Bessel functions,” Phil.
Trans. Roy. Soc. London, vol. A247, pp. 529-
551, April 1955.
[3] Sangeetha Jamal, on “Plagiarism Detection
Techniques”, Cochin University of Science and
Technology, Cochin 682022 , 2010.
[4]Du Zou, Wei-jiang Long, Zhang Ling “A
Cluster Based Plagiarism Detection Technique”
for PAN , CLEF, 2010.
[5] Chow Kok Kent, Naomi Salim “Features
Based Text Similarity Detection”, Faculty of
Computer Science and Informatics System,
University Teknologi Malaysia, 81310 Skudai,
Johor Malaysia, Journal of Computing , vol 2,
issue 1, January 2010.
[6]Sudhir D. Salukhe, S. Z. Gawali, “A
Plagiarism Detection Technique Using
Reinforcement Learning”, International Journal
of Advanced Research in Computer Science and
Management Studies, vol. 1 ,issue 6, November
2013.

More Related Content

What's hot

Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionCentrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionIJCNCJournal
 
Enhancing security in cloud storage
Enhancing security in cloud storageEnhancing security in cloud storage
Enhancing security in cloud storageShivam Singh
 
Solutions crypto4e
Solutions crypto4eSolutions crypto4e
Solutions crypto4eJack Ndahiro
 
Analysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuAnalysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuIJNSA Journal
 
Analysis of Searchable Encryption
Analysis of Searchable EncryptionAnalysis of Searchable Encryption
Analysis of Searchable EncryptionNagendra Posani
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copySelva Raj
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationSelva Raj
 
Towards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationTowards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationIDES Editor
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic MusicIRJET Journal
 
Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery IJECEIAES
 
Exploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficExploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficSandipan Biswas
 
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...IJRESJOURNAL
 

What's hot (18)

75227-144257-1-PB
75227-144257-1-PB75227-144257-1-PB
75227-144257-1-PB
 
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content DistributionCentrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
 
Enhancing security in cloud storage
Enhancing security in cloud storageEnhancing security in cloud storage
Enhancing security in cloud storage
 
Distributed Hash Table
Distributed Hash TableDistributed Hash Table
Distributed Hash Table
 
Solutions crypto4e
Solutions crypto4eSolutions crypto4e
Solutions crypto4e
 
Analysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpuAnalysis of rsa algorithm using gpu
Analysis of rsa algorithm using gpu
 
N sys s 32
N sys s 32N sys s 32
N sys s 32
 
15 82-87
15 82-8715 82-87
15 82-87
 
Analysis of Searchable Encryption
Analysis of Searchable EncryptionAnalysis of Searchable Encryption
Analysis of Searchable Encryption
 
poster
posterposter
poster
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copy
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authentication
 
Spatial approximate string search
Spatial approximate string searchSpatial approximate string search
Spatial approximate string search
 
Towards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key GenerationTowards Practical Homomorphic Encryption with Efficient Public key Generation
Towards Practical Homomorphic Encryption with Efficient Public key Generation
 
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
Genetic Algorithm Based Cryptographic Approach using  Karnatic  MusicGenetic Algorithm Based Cryptographic Approach using  Karnatic  Music
Genetic Algorithm Based Cryptographic Approach using Karnatic Music
 
Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery Identity-Based Blind Signature Scheme with Message Recovery
Identity-Based Blind Signature Scheme with Message Recovery
 
Exploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's trafficExploiting tls to disrupt privacy of web application's traffic
Exploiting tls to disrupt privacy of web application's traffic
 
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
Based on the Influence Factors in the Heterogeneous Network t-path Similarity...
 

Similar to 2 column paper

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
The Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated DocumentsThe Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated Documentsijceronline
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSijdms
 

Similar to 2 column paper (20)

International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
L0261075078
L0261075078L0261075078
L0261075078
 
L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated DocumentsThe Search of New Issues in the Detection of Near-duplicated Documents
The Search of New Issues in the Detection of Near-duplicated Documents
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
F0431025031
F0431025031F0431025031
F0431025031
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKSTEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
TEXT ADVERTISEMENTS ANALYSIS USING CONVOLUTIONAL NEURAL NETWORKS
 
P13 corley
P13 corleyP13 corley
P13 corley
 
P33077080
P33077080P33077080
P33077080
 

2 column paper

  • 1. Abstract - This paper introduces a computer based plagiarism detection technique which combines the functionality of substring matching and keyword similarity to give more accurate results. Also to make the algorithm more efficient clustering is done in which cluster of similar documents is created using LCS (Longest Common Subsequence Method). 1. INTRODUCTION Different people define plagiarism in different ways. In layman language, plagiarism is an unacknowledged, act of copying someone’s work. Technically, as described by Wikipedia [1], it is “wrongful appropriation" and "stealing and publication" of another author’s "language, thoughts, ideas, or expressions" and the representation of them as one's own original work”. So, to detect plagiarism one can use either manual or computer based techniques. As manual techniques to detect plagiarism are difficult to implement so there is a need to develop computer based techniques which would efficiently detect plagiarism. So, this paper introduces an efficient technique to detect plagiarism for text which uses: 1. Clustering using longest common substring algorithm. 2. Substring Matching 3. Keyword Similarity A. CLUSTERING If all documents present are compared with the reference document it would require a lot of time. So to save time clustering based plagiarism detection technique is used in which a cluster or group of similar kind of documents is created among which document is compared. These similar documents are than compared by other plagiarism detection techniques like substring matching to get more efficient result of plagiarism. B. SUBSTRING MATCHING Here, in this technique, a pattern or a string is compared with the document. The document is divided using any indicator like ‘.’ , ’,’ , ’?’ etc. It plays an important role in detecting plagiarism in application source codes. C. KEYWORD SIMILARITY In this method of plagiarism detection, a keyword is given and based on that the similarity between the document is calculated. 2. PROPOSED WORK Our main aim is to develop an efficient algorithm to determine text based plagiarism. We have developed a plagiarism detection application in which clustering based on longest common subsequence (LCS) [2], keyword similarity [3] and substring matching [4] algorithms are used. We have implemented it by using C++ and Python as a programming language. Plagiarism Detection Technique Vibhanshu Manav Bagai Siddharth Gupta Department of Computer Engineering, Zakir Hussain College of Engineering and Technology, Aligarh.
  • 2. A. FLOW OF THE SYSTEM Fig. 1. Flow of the system. Fig. 1, illustrates the general outline how the whole algorithm works which start from clustering which develop a cluster of similar documents on which substring matching and at last keyword similarity is applied to detect plagiarism. Detailed outline of each process will be given below. B. CLUSTERING The longest common subsequence (or LCS) algorithm [4] finds the longest string between two given strings that are common between the two groups and in the same order in each string. To add to the functionality and accuracy of above algorithm clustering is proposed by us, As there are many documents to be compared so this may take a lot of time. So, to solve this problem clustering is used. Here a cluster is created which mainly contain those files which are similar to the document to be compared by Longest Common Subsequence Method. Let the two documents compared be X and Y where X is reference document and Y is document to be compared having length m and n respectively. From longest common subsequence method we will get the length of lowest common subsequence, let it be lcs. To find the similarity between the documents following method is used: R=lcs/m (1.1) S= lcs/n (1.2) Then we will find F which is equal to F= ((1+(B*B))*R*S)/(R+B*B*S) (1.3) Where, B= S/R What is F, R,S AND B? *mention equation nos. in the para all mention fig2* The document which is having more value of F is more similar. If documents are completely same we get the value of F equal to 1. A cluster is made for documents which are more similar to reference document so that next step is applied to only those document. Fig. 2. xxxxxxx C. SUBSTRING MATCHING AND KEYWORD SIMILARITY After getting the clusters of the similar documents using LCS, they are no compared using substring matching technique. In this method, we will break the strings from both the Clustering Substring Matching Keyword Similarity Finding length of longest common substring Perform Calculation and find F Compare F for different documents and creating a cluster among them.
  • 3. documents into substring based on ‘.’, ‘!’, ‘?’. Then, we will compare all the substrings from both the documents (the reference and the documents from the cluster). If the substrings are found similar, we will increase the plagiarism count. After this, we will apply keyword similarity method, we will ask for a keyword of the document and then using that keyword we will find the sentences from both the documents with that keyword. Then, we will compare those sentences again and if found same we will add them to the plagiarism set. This is shown in the Algorithm [5] below: Terms Used: Suspected document - Q; Reference document - D; Sentences in Q - {q1,q2,…qn}; Sentences in reference document - {d1, d2,…dn}; Plagiarism set - P=Null; Input: Q. For Q Separate sentences Q= {q1,q2,….,qn}; For every q in Q, Compare with reference document D, If (q==d) Add sentence to plagiarized set P, Update result. P=P+q; End if End for If (P==NULL) Display “document is plagiarism free” Else Display, set P as a plagiarized sentences. Highlight all P in Q as plagiarism text. End else End for 3. CONCLUSION In this paper we have proposed a new plagiarism detection method which is a combination of LCS algorithm, Substring matching and Keyword similarity. This algorithm is much more efficient than the traditional substring based algorithm [6] as it firstly forms the cluster of more similar documents and then apply a hybrid algorithm of substring matching and keyword similarity to get more accurate results. 4. REFERENCES [1] Font size :10pt, Font type: Times new roman, First authors, title of research reference,volume. [2] G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529- 551, April 1955. [3] Sangeetha Jamal, on “Plagiarism Detection Techniques”, Cochin University of Science and Technology, Cochin 682022 , 2010. [4]Du Zou, Wei-jiang Long, Zhang Ling “A Cluster Based Plagiarism Detection Technique” for PAN , CLEF, 2010. [5] Chow Kok Kent, Naomi Salim “Features Based Text Similarity Detection”, Faculty of Computer Science and Informatics System, University Teknologi Malaysia, 81310 Skudai, Johor Malaysia, Journal of Computing , vol 2, issue 1, January 2010. [6]Sudhir D. Salukhe, S. Z. Gawali, “A Plagiarism Detection Technique Using Reinforcement Learning”, International Journal of Advanced Research in Computer Science and Management Studies, vol. 1 ,issue 6, November 2013.