SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5237
Analysis of Paraphrase Detection using NLP Techniques
Mrunal Badade1, Vaibhav Adsul 2, Jagruti Thombare3, Akshata Deshpande4,
Prof. Mansi Kulkarni5
1,2,3,4Student, Computer Engineering, Pillai College of Engineering, Maharashtra, India
5Faculty, Computer Engineering, Pillai College of Engineering, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Major difficulties faced in natural language
processing are ambiguity where the same text has several
possible interpretations. The paraphrase isawayofconveying
the same content without compromising the meaning. It is an
alternate form in the same language stating the same
semantic content with the help of reframing or rearranging
the phrases of a sentence. Paraphrases can occur at the word
level, phrase level or even sentence level. Paraphrasing is of
two types: Paraphrase Generation and Paraphrase Detection.
We propose an application to detect the semantic similarity
between two texts of the same language to establish the
similarity. The proposed solutionoftacklingsimilarcontent or
text can be used as an application to detect plagiarism as well
as conduct an evaluation for a machine translation system.
Not only Paraphrase Detection can be used to tackle the
uniqueness of a text and retain its meaning, but also provide a
measure to assess the Machine Translations of a text. Since,
current available applications fall short to verify the integrity
of a text if it is paraphrased and fails to mark it as plagiarized.
We will be using already established traditional algorithmsto
detect if the content is a duplication of an already existing
work and on top of that, we will be using our application to
measure if the content has been paraphrased in any way and
on the basis of the performance we would evaluate our
system’s efficiency compared to the state of the art systems
that already exists.
Keywords - Ambiguity, Paraphrase Generation, Paraphrase
Detection, Semantic similarity, Plagiarism, Machine
Translations, Integrity.
1. INTRODUCTION
Paraphrase detection, which means analyzingsentencesthat
are semantically identical. We propose to detect the
semantic similarity between two texts of the same language
to establish the similarity. To find related sentences written
in natural language is complex for various applications, like
text summarization, plagiarism detection, information
retrieval and question answering system etc. Realizing this
gravity, we examine in particularhowtomark thechallenges
with detecting paraphrases using Natural Language
Processing Techniques. We proposethemulti-headattention
mechanism helps the model learn the words relevant
information in different presentation subspaces.
1.1 Fundamentals Natural
Natural language processing (NLP)isa subfieldoflinguistics,
computer science, information engineering, and artificial
intelligence concerned with the interactions between
computers and human(natural) languages,inparticularhow
to program computers to process andanalyzelargeamounts
of natural language data. [1]
Challenges in NLP often refer to natural language
understanding, speech recognition, and natural language
generation.
Fig-1: NLP Processing
Figure 1 shows the basic processing of an input or source
sentence in natural language which is done by the machine.
1.2 Terminologies
● Tokenization -Tokenization is, generally, an early
step in the NLP process, a step which splits longer
strings of text into smaller pieces, or tokens. Larger
chunks of text can be tokenized into sentences,
sentences can be tokenized into words, etc. Further
processing is usually achieved after a piece of text
has been properly tokenized.
● Normalization - Before processing ahead, text must
be normalized. Normalization usually refers to a
sequence of related tasks involved to put all text on
a level playing field: converting all text to the same
case (upper or lower), eliminating punctuation,
increasing contractions, converting numbers to
their word identification, and so on. Normalization
brings all words on the same footing, and allows
processing to continue consistently.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5238
● Stemming - Stemming is the process of discarding
affixes from a word in order to gain a word stem.
● Corpus -In linguistics and NLP, corpus refers to a
collection of texts. Such collections maybe builtona
single language of texts, or can span different
languages -- there are various reasons for which
multilingual corpora may be useful. Corpora may
also subsist of the media texts. Corpora are
frequently individually used for statistical linguistic
analysis and hypothesis testing.
● Stop Words - Stop words are those wordswhichare
clean out of text, since these words share limited to,
overall meanings, given that they are usually the
most familiar words in a language. For example,
"the," "and," and "a," while all needed words in an
appropriate passage, don't usually contribute
greatly to one's understanding of content.
● Parts-of-speech(POS)Tagging-POStaggingsubsists
of appointing a category tag to the tokenized parts
of a sentence. The most famous POS tagging would
be classified words as nouns, verbs, adjectives, etc.
● Bag of Words- Bag of words is a special
representative model used to clarify the details of a
selection of text. The bag of words model discards
grammar and word order, but is attentive in the
number of existence of words within the text. The
entire illustration of the document selection is that
of a bag of words.
1.3 Similarity Measures
There are various similarity measures whichcanbe enforced
to NLP. What are we measuring the similarity of? Mostly,
strings.
● Levenshtein - The number of characters that must
be deleted, inserted, or substituted inordertomake
a pair of strings equal.
● Jaccard - The limit of overlay between 2 sets; in the
case of Natural language processing (NLP),
generally, documents are sets of words.
● Smith Waterman –Smith Waterman is identical to
Levenshtein, but with costs appointed to
substitution, insertion, and deletion.
1.4 Syntactic Analysis
Also introduced as parsing, syntactic analysis is the function
of analyzing strings as symbols, and providing them
according to a well-establishedsetofgrammatical rules.This
step must, out of necessity, come beforeanyfurtheranalysis,
which attempts to extract insight from the text--semantic,
sentiment, etc. -- treating it as something beyond symbols.
Fig-2: Semantic Textual Similarity
2. Proposed System Architecture
Fig-3: Proposed System Architecture
● Tokenization
Tokenization is the procedure of splitting up the given text
into units called tokens.
Algorithm:
Input: A single sentence
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5239
Output: A list of tokens
● Stopwords
Stop words are words which are cleaned out before or after
preparing natural language data (text).
Algorithm:
1.Input
2.if words in sentence == stopwords list then goto step-4 3.
else message (“No stopwords”) then goto step-4
4.output
5.Exit Stopword
Input:
Output:
● Stemming
A stemming is a procedure in which the various forms of
similar words are reduced to a frequent form.
Algorithm:
1.Input
2.if words suffix in sentence == suffix list then goto step-4 3.
else message (“No stopwords”) then goto step-4
4.output
5.Exit
Input:
Output:
● POS Tagging
A POS Tagger is a sample of software that study text in some
language and assign POS to each word, such as noun, verb,
etc., despite usually computing operations use moreover
fine-grained Part of speech tags like 'noun-plural'.
Input:
Output:
● Siamese Deep Neural Networks for semantic
similarity (SNN)
A Siamese neural network (SNN) is an artificial neural
network that uses equal pressure while working in tandem
on two distinct input vectors to compute proportional
output vectors. Generally, one of the output vectors is
precomputed, thus constructing a baseline across which the
other output vector is correlated.[3] This is identical to
match fingerprints, but can be expressedmoretechnicallyas
a distance function for locality-sensitive hashing.
Fig-4: Siamese Deep Neural Network
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5240
SNN is a class of neural network architectures that consistof
two or more duplicate subnetworks. Duplicate here means
they have a similar configuration with a similar framework
and density. Framework renewing is illustrated over both
subnetworks. Siamese NNs are famous amongfunctionsthat
require finding the same or a relation between two
proportionate things. Some examples are paraphrased
scoring, where the inputs are two sentences and the output
is a score of how similar they are; or signature verification,
where they figure out whether two signatures are from the
same person. Generally, in such tasks, two identical
subnetworks are used toprocessthetwoinputs,andanother
module will take their outputs and produce the final output.
● Multi-head Attention Network
Attention takes two sentences, changes them into a matrix
where the altercation of one sentence form thecolumns,and
the altercations of another sentence form therows,andthen
it makes correlations, finding related context. This is
especially helpful in machine translation. Youdon’tjusthave
to use attention to correspond meaning between sentences
in two distinct languages. You can also put the similar
sentence simultaneously the columns and the rows, inorder
to figure out how few parts of that sentence refer to others.
So, study allows you to view at the completion of a sentence,
to make relations between any specific word and its related
context. This is highly diverse from the small-memory,
upstream-focused RNNs, and also quite specific from the
proximity-focused convolutional networks.
Fig-5: Multi-head Attention Network
3. Sample Dataset Used
Table -1: Sample Dataset Used for Experiment
Dataset Source Items Type
SNLI The Standford Natural
Language Processing
Group
570,152
Sentence pairs
Corpus
QQP Quora Question Pairs 400,000
Sentence pairs
Corpus
● Corpus
Corpus is a collection of recorded utterances used as a basis
for the descriptive analysis of a language. A corpus may
contain texts in a single language (monolingual corpus) or
text data in multiple languages (multilingual corpus).
Multilingual corpora that have uniquely formatted for side-
by-side identification are known as aligned parallel corpora.
There are two prime forms of parallel corpora which have
texts in two languages. In a translation corpus, the texts in
one language are translations of texts in the other language.
In a comparable corpus, the texts are of the same kind and
cover the same content, but they are not translations ofeach
other.[6] To utilize a parallel text, some kind of text
arrangement classifying identical text portions is essential
for analysis. Machine translation methods for converting
between two languages are usually prepared using parallel
fragments composing a first and a second language corpus
which is an element-for-element conversion of the first
language corpus.
4. LITERATURE SURVEY
1. Paraphrase Detection in Hindi Language using
Syntactic Features of Phrase by Rupal Bhargava,
Anushka Baoni, Harshit Jain &YashvardhanSharma
in 2016.
Observations:Afeaturevector-basedapproachwith
3 options (POS Tags, Word Stems and Soundex
codes) is mentioned for paraphrase detection of
Hindi Language. when extracting these 3 options,
the python package "fuzzywuzzy" is employed to
calculate the similarity scores.LevenshteinDistance
wants to calculate the variations between string
sequences.
2. Malayalam Paraphrase Detection by Sindhu.L &
Sumam Mary Idicula in 2016.
Observations: South Dravidian isemployedas input
language. The projected linguistics approach for
distinguishing the paraphrases includes 3 phases –
matching identical tokens, matching lemmas &
matching with synonyms replaced. Similarity
comparison is performed at the sentence level
mistreatment the Jaccard, Containment, Overlap
and cos similarity metrics.
3. Learning Paraphrase Identification with Structural
Alignment by Chen Liang, Praveen Paritosh,Vinodh
Rajendran, and Kenneth D. Forbus in 2016.
Observations: Planned a brand-new alignment-
based approach to find out linguistics similarity of
texts. Used a hybrid illustration, attributed relative
graphs, to inscribe native and structural data.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5241
4. Convolutional Neural Network for Paraphrase
Identification by Wenpeng Yin & Hinrich Schutze in
2015.
Observations: Conferred the deep learning design
"Bi-CNN-MI" for paraphrase identification (PI). Bi-
CNN-MI: “Bi-CNN” stands for double CNNs, “MI” for
multi granular interaction options. supported the
insight that PI needs examination 2 sentences on
multiple levels of coarseness, we havea tendencyto
learn multi granular sentence representations
exploitation convolutional neural network (CNN)
and model interaction options at every level.
5. Paraphrase DetectionBased onIdentical Phraseand
SimilarWordMatching byHoang-QuocNguyen-Son,
Yusuke Miyao, & Isao Echizen in 2015.
Observations: Developed a similarity matching
(SimMat) metric for police work paraphrases that's
supported matching identical phrases and similar
words and quantifying the minor words. It's
calculated by matching identical phrases and
similar words. Phrase-by-phrase matching is
completed employing a "Heuristic algorithm". Word
matching is completed using the "Kuhn-Munkres
algorithm".
6. Paraphrase Detection Using RecursiveAutoencoder
by Eric Huang in 2011.
Observations:Theautoencoderlearningalgorithmic
rule is an associate degree approach to
mechanically extract options from inputs in an
associate degree unattended manner. Applied
autoencoders recursively to the input. algorithmic
Autoencoder "RAE" used. For the combination
options, SVMs are used as the classifier.
7. A Semantic Similarity Approach to Paraphrase
Detection by Samuel Fernando &Mark Stevensonin
2009.
Observations: Planned technique makes use of
WordNet-based lexical similarity measures applied
otherwise from previous approaches. The system
was evaluated on the MicrosoftanalysisParaphrase
Corpus and located to shell antecedentlyreportable
approaches.
8. A Metric for Paraphrase Detection by JoaoCordeiro,
Gael Dias, & Pavel Brazdil in 2007.
Observations: Planned a brand-new metric, the
Sumo-Metric, for finding paraphrases. "Sumo-
Metric" outperforms all progressivemetricsoverall
corpora in conducted experiments by the authors.
conjointly performed a comparative studybetween
already existing metrics and new tailored ones and
planned a brand-new benchmark of paraphrase
take a look at corpora.
9. Methods for Detecting Paraphrase Plagiarism by
Victor U Thompson & Chris Bowerman in 2018.
Observations: To sight lexical substitutions, a
mixture of question growth and word try similarity
measure mistreatment WordNet and therefore the
word2vec word embedding model is employed.
question growth is employed to come up with
synonyms for every question wordina verysuspect
sentence. The word2vec model transforms the
synonyms into word vectors and compares them
with word vectors of the supply sentence
mistreatment cos similarity.
10. A Survey on Paraphrase Detection Techniques for
Indian Regional Languages by Shruti Srivastava &
Sharvari Govilkar in 2017.
Observations: Techniques like Vector based mostly
"Bi-CNN-MI" approach & MT approach "SimMat
Metric, grappling Metric" yield the foremost
effective results for paraphrase detection in West
Germanic.
11. The Study and Review of Paraphrase Detection
Techniques in Machine Learning by Darshana S
Bhole, Sandip S. Patil in 2017.
Observations: Stemmer algorithm used for
removing the suffixes. The 2 sentencesparaphrased
or not is calculated by token count, token matching
and synonyms token matching done by the
classifier.
12. Detecting Paraphrases inTamil LanguageSentences
by Dr.S.V.Kogilavani1, Dr.R.Thangarajan,
Dr.C.S.Kanimozhiselvi, Dr.S.Malliga in 2017.
Observations: The sentences area unit classified
mistreatment 2 supervised machine learning
algorithms like SVM and goop Entropy utilize
sixteen completely different syntactical and
linguistics options to best represent the similarity
between sentences.
13. Detecting Paraphrases in Indian Languages Using
Multinomial Logistic Regression Model by Kamal
Sarkar in 2016.
Observations: The foremost usually used corpora
for paraphrase detection is the MSRP corpus
(Microsoft Research Paraphrase Corpus).
"Word2Vec" model (Python) accustomed to sight
linguistics. Similarity between word vectors for the
words.
14. Overview of Shared Task on Detecting Paraphrases
in Indian Languages (DPIL) by M. Anand Kumar,
Shivkaran Singh, Kavirajan B & Soman K P in 2016.
Observations: Four Indian languages (Hindi,
Punjabi, Tamil and Malayalam) thought of. No
annotated corpora or automatic linguistics
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5242
interpretation systems like MSRP offeredforIndian
languages thus Created benchmark knowledge for
paraphrases. Vocabulary size for Hindi & Punjabi
languages is a smaller amount than Tamil and
Malayalam. Tamil and Malayalam are extremely
agglutinative in nature. Tamil & Malayalam
language accuracy is low as compared to the
accuracy obtained by Hindi & Punjabi language.
15. A Novel Approach to Paraphrase Hindi Sentences
using Natural Language Processing by Nandini
Sethi, Prateek Agrawal, Vishu Madaan & Sanjay
Kumar Singh in 2016.
Observations: Input taken as "Hindi" sentence.
Segmentation, Parsing & linguistics Analysis
performed step by step. Applied Reframing Rules
with the assistance of "Synonym Replacement" &
"Antonym Replacement" Combined the results to
make a new paragraph. Technique used wasthe"N-
Gram" technique.
5. CONCLUSION
Given a set of two sentences, finding semantic similarity
between those sentences is a difficult task as it requires
analyzing a language which the system doesnotunderstand.
(Human natural language). Paraphrase Detection has
multiple real-world applications thatsolvereal lifeproblems
as discussed earlier. We propose a system to detect
paraphrasing of sentences in natural language using SNN in
TensorFlow. By training the system in its initial phase via
deep learning architecture such as CNN, RNN & MAN, pre-
processing the sentence to extract features & derive results
through a classifier in testing phase.
ACKNOWLEDGEMENT
We have taken efforts in building this paper. However, it
would not have been possible without the kind of support
and encouragement from many individuals and college. We
would first like to extend our sincere thanks to all of them.
A hearty thanks to our project supervisor Prof. Manasi
Kulkarni for the constant supervision and guidance in all
difficulties faced during the project and report making.
We are thankful to H.O.D of the Computer Department Dr.
Sharvari Govilkar for providing the necessary information
and resources.
We are highly indebted to the Principal Dr. SandeepJoshifor
the constant motivation and encouragement which helpeda
lot in completion of this report.
We would like to express our gratitude towards our parents
for their kind cooperation. We would also like to thank the
authors whose books we have referred to in making the
project and report.
REFERENCES
[1] Natural language processing.
https://en.wikipedia.org/wiki/Natural_language_proces
sing
[2] Victor U Thompson & Chris Bowerman, (2018)
“Methods for Detecting Paraphrase Plagiarism”.
[3] Domain generalization in sketch-based image retrieval
systems.
http://reports.ias.ac.in/report/20604/domain-
generalization-in-sketch-based-image-retrieval-sbir-
systems
[4] Shruti Srivastava & Sharvari Govilkar, (2017) “A Survey
on Paraphrase DetectionTechniquesforIndianRegional
Languages”.
[5] Darshana S Bhole, Sandip S. Patil, (2017)“TheStudyand
Review of Paraphrase Detection Techniques in Machine
Learning”.
[6] Textcorpus.https://en.wikipedia.org/wiki/Text_corpus
[7] Dr.S.V.Kogilavani1, Dr.R.Thangarajan,
Dr.C.S.Kanimozhiselvi, Dr.S.Malliga, (2017) “Detecting
Paraphrases in Tamil Language Sentences”.
[8] Kamal Sarkar, (2016) “Detecting Paraphrases in Indian
Languages Using Multinomial Logistic Regression
Model”.
[9] Nandini Sethi, Prateek Agrawal, Vishu Madaan & Sanjay
Kumar Singh, (2016) “A Novel Approach to Paraphrase
Hindi Sentences using Natural Language Processing”.
[10] M. Anand Kumar, Shivkaran Singh, KavirajanB&Soman
K P, (2016) “Overview of Shared Task on Detecting
Paraphrases in Indian Languages (DPIL)”.
BIOGRAPHIES
Mrunal Badade, Student of Pillai
College of Engineering, completing
last year of my degree in Computer
Engineering department.
Vaibhav Adsul, Student of Pillai
College of Engineering, completing
last year of my degree in Computer
Engineering department.
Jagruti Thombare, Student of Pillai
College of Engineering, completing
last year of my degree in Computer
Engineering department.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5243
Akshata Deshpande, Student of
Pillai College of Engineering,
completing last year of my degree
in Computer Engineering
department.
Prof. Mansi Kulkarni, Faculty of
Pillai College of Engineering.
Assistant professor in department
of Information Technology.

More Related Content

What's hot

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
csandit
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
Hiroki Shimanaka
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
Journal For Research
 
Requirement Analysis - ijcee 2(3)
Requirement Analysis - ijcee 2(3)Requirement Analysis - ijcee 2(3)
Requirement Analysis - ijcee 2(3)
IT Industry
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ijaia
 
L1803058388
L1803058388L1803058388
L1803058388
IOSR Journals
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
Milind Gokhale
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
ijtsrd
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET Journal
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
IRJET Journal
 
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET Journal
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
Gaetano Rossiello, PhD
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
journalBEEI
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
mlaij
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
International Journal of Engineering Inventions www.ijeijournal.com
 

What's hot (17)

Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Requirement Analysis - ijcee 2(3)
Requirement Analysis - ijcee 2(3)Requirement Analysis - ijcee 2(3)
Requirement Analysis - ijcee 2(3)
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
L1803058388
L1803058388L1803058388
L1803058388
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a SentenceIRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
IRJET- Survey on Generating Suggestions for Erroneous Part in a Sentence
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
 
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
 
Performance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganographyPerformance analysis on secured data method in natural language steganography
Performance analysis on secured data method in natural language steganography
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 

Similar to IRJET - Analysis of Paraphrase Detection using NLP Techniques

IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
IRJET Journal
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
IRJET Journal
 
Language and Offensive Word Detection
Language and Offensive Word DetectionLanguage and Offensive Word Detection
Language and Offensive Word Detection
IRJET Journal
 
IRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech Analysis
IRJET Journal
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based Models
IRJET Journal
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
IRJET Journal
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
Scott Faria
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
IRJET - Voice based Natural Language Query Processing
IRJET -  	  Voice based Natural Language Query ProcessingIRJET -  	  Voice based Natural Language Query Processing
IRJET - Voice based Natural Language Query Processing
IRJET Journal
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
IRJET Journal
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language Processing
IRJET Journal
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Publishing House
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Journals
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET Journal
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
IRJET Journal
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
IRJET Journal
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
IRJET Journal
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 

Similar to IRJET - Analysis of Paraphrase Detection using NLP Techniques (20)

IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Language and Offensive Word Detection
Language and Offensive Word DetectionLanguage and Offensive Word Detection
Language and Offensive Word Detection
 
IRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech Analysis
 
Contextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based ModelsContextual Emotion Recognition Using Transformer-Based Models
Contextual Emotion Recognition Using Transformer-Based Models
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 
An Overview Of Natural Language Processing
An Overview Of Natural Language ProcessingAn Overview Of Natural Language Processing
An Overview Of Natural Language Processing
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
IRJET - Voice based Natural Language Query Processing
IRJET -  	  Voice based Natural Language Query ProcessingIRJET -  	  Voice based Natural Language Query Processing
IRJET - Voice based Natural Language Query Processing
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 
Automatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language ProcessingAutomatic Text Summarization using Natural Language Processing
Automatic Text Summarization using Natural Language Processing
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
IRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software TechnologiesIRJET- A Novel Approch Automatically Categorizing Software Technologies
IRJET- A Novel Approch Automatically Categorizing Software Technologies
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 

Recently uploaded (20)

CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 

IRJET - Analysis of Paraphrase Detection using NLP Techniques

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5237 Analysis of Paraphrase Detection using NLP Techniques Mrunal Badade1, Vaibhav Adsul 2, Jagruti Thombare3, Akshata Deshpande4, Prof. Mansi Kulkarni5 1,2,3,4Student, Computer Engineering, Pillai College of Engineering, Maharashtra, India 5Faculty, Computer Engineering, Pillai College of Engineering, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Major difficulties faced in natural language processing are ambiguity where the same text has several possible interpretations. The paraphrase isawayofconveying the same content without compromising the meaning. It is an alternate form in the same language stating the same semantic content with the help of reframing or rearranging the phrases of a sentence. Paraphrases can occur at the word level, phrase level or even sentence level. Paraphrasing is of two types: Paraphrase Generation and Paraphrase Detection. We propose an application to detect the semantic similarity between two texts of the same language to establish the similarity. The proposed solutionoftacklingsimilarcontent or text can be used as an application to detect plagiarism as well as conduct an evaluation for a machine translation system. Not only Paraphrase Detection can be used to tackle the uniqueness of a text and retain its meaning, but also provide a measure to assess the Machine Translations of a text. Since, current available applications fall short to verify the integrity of a text if it is paraphrased and fails to mark it as plagiarized. We will be using already established traditional algorithmsto detect if the content is a duplication of an already existing work and on top of that, we will be using our application to measure if the content has been paraphrased in any way and on the basis of the performance we would evaluate our system’s efficiency compared to the state of the art systems that already exists. Keywords - Ambiguity, Paraphrase Generation, Paraphrase Detection, Semantic similarity, Plagiarism, Machine Translations, Integrity. 1. INTRODUCTION Paraphrase detection, which means analyzingsentencesthat are semantically identical. We propose to detect the semantic similarity between two texts of the same language to establish the similarity. To find related sentences written in natural language is complex for various applications, like text summarization, plagiarism detection, information retrieval and question answering system etc. Realizing this gravity, we examine in particularhowtomark thechallenges with detecting paraphrases using Natural Language Processing Techniques. We proposethemulti-headattention mechanism helps the model learn the words relevant information in different presentation subspaces. 1.1 Fundamentals Natural Natural language processing (NLP)isa subfieldoflinguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human(natural) languages,inparticularhow to program computers to process andanalyzelargeamounts of natural language data. [1] Challenges in NLP often refer to natural language understanding, speech recognition, and natural language generation. Fig-1: NLP Processing Figure 1 shows the basic processing of an input or source sentence in natural language which is done by the machine. 1.2 Terminologies ● Tokenization -Tokenization is, generally, an early step in the NLP process, a step which splits longer strings of text into smaller pieces, or tokens. Larger chunks of text can be tokenized into sentences, sentences can be tokenized into words, etc. Further processing is usually achieved after a piece of text has been properly tokenized. ● Normalization - Before processing ahead, text must be normalized. Normalization usually refers to a sequence of related tasks involved to put all text on a level playing field: converting all text to the same case (upper or lower), eliminating punctuation, increasing contractions, converting numbers to their word identification, and so on. Normalization brings all words on the same footing, and allows processing to continue consistently.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5238 ● Stemming - Stemming is the process of discarding affixes from a word in order to gain a word stem. ● Corpus -In linguistics and NLP, corpus refers to a collection of texts. Such collections maybe builtona single language of texts, or can span different languages -- there are various reasons for which multilingual corpora may be useful. Corpora may also subsist of the media texts. Corpora are frequently individually used for statistical linguistic analysis and hypothesis testing. ● Stop Words - Stop words are those wordswhichare clean out of text, since these words share limited to, overall meanings, given that they are usually the most familiar words in a language. For example, "the," "and," and "a," while all needed words in an appropriate passage, don't usually contribute greatly to one's understanding of content. ● Parts-of-speech(POS)Tagging-POStaggingsubsists of appointing a category tag to the tokenized parts of a sentence. The most famous POS tagging would be classified words as nouns, verbs, adjectives, etc. ● Bag of Words- Bag of words is a special representative model used to clarify the details of a selection of text. The bag of words model discards grammar and word order, but is attentive in the number of existence of words within the text. The entire illustration of the document selection is that of a bag of words. 1.3 Similarity Measures There are various similarity measures whichcanbe enforced to NLP. What are we measuring the similarity of? Mostly, strings. ● Levenshtein - The number of characters that must be deleted, inserted, or substituted inordertomake a pair of strings equal. ● Jaccard - The limit of overlay between 2 sets; in the case of Natural language processing (NLP), generally, documents are sets of words. ● Smith Waterman –Smith Waterman is identical to Levenshtein, but with costs appointed to substitution, insertion, and deletion. 1.4 Syntactic Analysis Also introduced as parsing, syntactic analysis is the function of analyzing strings as symbols, and providing them according to a well-establishedsetofgrammatical rules.This step must, out of necessity, come beforeanyfurtheranalysis, which attempts to extract insight from the text--semantic, sentiment, etc. -- treating it as something beyond symbols. Fig-2: Semantic Textual Similarity 2. Proposed System Architecture Fig-3: Proposed System Architecture ● Tokenization Tokenization is the procedure of splitting up the given text into units called tokens. Algorithm: Input: A single sentence
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5239 Output: A list of tokens ● Stopwords Stop words are words which are cleaned out before or after preparing natural language data (text). Algorithm: 1.Input 2.if words in sentence == stopwords list then goto step-4 3. else message (“No stopwords”) then goto step-4 4.output 5.Exit Stopword Input: Output: ● Stemming A stemming is a procedure in which the various forms of similar words are reduced to a frequent form. Algorithm: 1.Input 2.if words suffix in sentence == suffix list then goto step-4 3. else message (“No stopwords”) then goto step-4 4.output 5.Exit Input: Output: ● POS Tagging A POS Tagger is a sample of software that study text in some language and assign POS to each word, such as noun, verb, etc., despite usually computing operations use moreover fine-grained Part of speech tags like 'noun-plural'. Input: Output: ● Siamese Deep Neural Networks for semantic similarity (SNN) A Siamese neural network (SNN) is an artificial neural network that uses equal pressure while working in tandem on two distinct input vectors to compute proportional output vectors. Generally, one of the output vectors is precomputed, thus constructing a baseline across which the other output vector is correlated.[3] This is identical to match fingerprints, but can be expressedmoretechnicallyas a distance function for locality-sensitive hashing. Fig-4: Siamese Deep Neural Network
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5240 SNN is a class of neural network architectures that consistof two or more duplicate subnetworks. Duplicate here means they have a similar configuration with a similar framework and density. Framework renewing is illustrated over both subnetworks. Siamese NNs are famous amongfunctionsthat require finding the same or a relation between two proportionate things. Some examples are paraphrased scoring, where the inputs are two sentences and the output is a score of how similar they are; or signature verification, where they figure out whether two signatures are from the same person. Generally, in such tasks, two identical subnetworks are used toprocessthetwoinputs,andanother module will take their outputs and produce the final output. ● Multi-head Attention Network Attention takes two sentences, changes them into a matrix where the altercation of one sentence form thecolumns,and the altercations of another sentence form therows,andthen it makes correlations, finding related context. This is especially helpful in machine translation. Youdon’tjusthave to use attention to correspond meaning between sentences in two distinct languages. You can also put the similar sentence simultaneously the columns and the rows, inorder to figure out how few parts of that sentence refer to others. So, study allows you to view at the completion of a sentence, to make relations between any specific word and its related context. This is highly diverse from the small-memory, upstream-focused RNNs, and also quite specific from the proximity-focused convolutional networks. Fig-5: Multi-head Attention Network 3. Sample Dataset Used Table -1: Sample Dataset Used for Experiment Dataset Source Items Type SNLI The Standford Natural Language Processing Group 570,152 Sentence pairs Corpus QQP Quora Question Pairs 400,000 Sentence pairs Corpus ● Corpus Corpus is a collection of recorded utterances used as a basis for the descriptive analysis of a language. A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have uniquely formatted for side- by-side identification are known as aligned parallel corpora. There are two prime forms of parallel corpora which have texts in two languages. In a translation corpus, the texts in one language are translations of texts in the other language. In a comparable corpus, the texts are of the same kind and cover the same content, but they are not translations ofeach other.[6] To utilize a parallel text, some kind of text arrangement classifying identical text portions is essential for analysis. Machine translation methods for converting between two languages are usually prepared using parallel fragments composing a first and a second language corpus which is an element-for-element conversion of the first language corpus. 4. LITERATURE SURVEY 1. Paraphrase Detection in Hindi Language using Syntactic Features of Phrase by Rupal Bhargava, Anushka Baoni, Harshit Jain &YashvardhanSharma in 2016. Observations:Afeaturevector-basedapproachwith 3 options (POS Tags, Word Stems and Soundex codes) is mentioned for paraphrase detection of Hindi Language. when extracting these 3 options, the python package "fuzzywuzzy" is employed to calculate the similarity scores.LevenshteinDistance wants to calculate the variations between string sequences. 2. Malayalam Paraphrase Detection by Sindhu.L & Sumam Mary Idicula in 2016. Observations: South Dravidian isemployedas input language. The projected linguistics approach for distinguishing the paraphrases includes 3 phases – matching identical tokens, matching lemmas & matching with synonyms replaced. Similarity comparison is performed at the sentence level mistreatment the Jaccard, Containment, Overlap and cos similarity metrics. 3. Learning Paraphrase Identification with Structural Alignment by Chen Liang, Praveen Paritosh,Vinodh Rajendran, and Kenneth D. Forbus in 2016. Observations: Planned a brand-new alignment- based approach to find out linguistics similarity of texts. Used a hybrid illustration, attributed relative graphs, to inscribe native and structural data.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5241 4. Convolutional Neural Network for Paraphrase Identification by Wenpeng Yin & Hinrich Schutze in 2015. Observations: Conferred the deep learning design "Bi-CNN-MI" for paraphrase identification (PI). Bi- CNN-MI: “Bi-CNN” stands for double CNNs, “MI” for multi granular interaction options. supported the insight that PI needs examination 2 sentences on multiple levels of coarseness, we havea tendencyto learn multi granular sentence representations exploitation convolutional neural network (CNN) and model interaction options at every level. 5. Paraphrase DetectionBased onIdentical Phraseand SimilarWordMatching byHoang-QuocNguyen-Son, Yusuke Miyao, & Isao Echizen in 2015. Observations: Developed a similarity matching (SimMat) metric for police work paraphrases that's supported matching identical phrases and similar words and quantifying the minor words. It's calculated by matching identical phrases and similar words. Phrase-by-phrase matching is completed employing a "Heuristic algorithm". Word matching is completed using the "Kuhn-Munkres algorithm". 6. Paraphrase Detection Using RecursiveAutoencoder by Eric Huang in 2011. Observations:Theautoencoderlearningalgorithmic rule is an associate degree approach to mechanically extract options from inputs in an associate degree unattended manner. Applied autoencoders recursively to the input. algorithmic Autoencoder "RAE" used. For the combination options, SVMs are used as the classifier. 7. A Semantic Similarity Approach to Paraphrase Detection by Samuel Fernando &Mark Stevensonin 2009. Observations: Planned technique makes use of WordNet-based lexical similarity measures applied otherwise from previous approaches. The system was evaluated on the MicrosoftanalysisParaphrase Corpus and located to shell antecedentlyreportable approaches. 8. A Metric for Paraphrase Detection by JoaoCordeiro, Gael Dias, & Pavel Brazdil in 2007. Observations: Planned a brand-new metric, the Sumo-Metric, for finding paraphrases. "Sumo- Metric" outperforms all progressivemetricsoverall corpora in conducted experiments by the authors. conjointly performed a comparative studybetween already existing metrics and new tailored ones and planned a brand-new benchmark of paraphrase take a look at corpora. 9. Methods for Detecting Paraphrase Plagiarism by Victor U Thompson & Chris Bowerman in 2018. Observations: To sight lexical substitutions, a mixture of question growth and word try similarity measure mistreatment WordNet and therefore the word2vec word embedding model is employed. question growth is employed to come up with synonyms for every question wordina verysuspect sentence. The word2vec model transforms the synonyms into word vectors and compares them with word vectors of the supply sentence mistreatment cos similarity. 10. A Survey on Paraphrase Detection Techniques for Indian Regional Languages by Shruti Srivastava & Sharvari Govilkar in 2017. Observations: Techniques like Vector based mostly "Bi-CNN-MI" approach & MT approach "SimMat Metric, grappling Metric" yield the foremost effective results for paraphrase detection in West Germanic. 11. The Study and Review of Paraphrase Detection Techniques in Machine Learning by Darshana S Bhole, Sandip S. Patil in 2017. Observations: Stemmer algorithm used for removing the suffixes. The 2 sentencesparaphrased or not is calculated by token count, token matching and synonyms token matching done by the classifier. 12. Detecting Paraphrases inTamil LanguageSentences by Dr.S.V.Kogilavani1, Dr.R.Thangarajan, Dr.C.S.Kanimozhiselvi, Dr.S.Malliga in 2017. Observations: The sentences area unit classified mistreatment 2 supervised machine learning algorithms like SVM and goop Entropy utilize sixteen completely different syntactical and linguistics options to best represent the similarity between sentences. 13. Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model by Kamal Sarkar in 2016. Observations: The foremost usually used corpora for paraphrase detection is the MSRP corpus (Microsoft Research Paraphrase Corpus). "Word2Vec" model (Python) accustomed to sight linguistics. Similarity between word vectors for the words. 14. Overview of Shared Task on Detecting Paraphrases in Indian Languages (DPIL) by M. Anand Kumar, Shivkaran Singh, Kavirajan B & Soman K P in 2016. Observations: Four Indian languages (Hindi, Punjabi, Tamil and Malayalam) thought of. No annotated corpora or automatic linguistics
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5242 interpretation systems like MSRP offeredforIndian languages thus Created benchmark knowledge for paraphrases. Vocabulary size for Hindi & Punjabi languages is a smaller amount than Tamil and Malayalam. Tamil and Malayalam are extremely agglutinative in nature. Tamil & Malayalam language accuracy is low as compared to the accuracy obtained by Hindi & Punjabi language. 15. A Novel Approach to Paraphrase Hindi Sentences using Natural Language Processing by Nandini Sethi, Prateek Agrawal, Vishu Madaan & Sanjay Kumar Singh in 2016. Observations: Input taken as "Hindi" sentence. Segmentation, Parsing & linguistics Analysis performed step by step. Applied Reframing Rules with the assistance of "Synonym Replacement" & "Antonym Replacement" Combined the results to make a new paragraph. Technique used wasthe"N- Gram" technique. 5. CONCLUSION Given a set of two sentences, finding semantic similarity between those sentences is a difficult task as it requires analyzing a language which the system doesnotunderstand. (Human natural language). Paraphrase Detection has multiple real-world applications thatsolvereal lifeproblems as discussed earlier. We propose a system to detect paraphrasing of sentences in natural language using SNN in TensorFlow. By training the system in its initial phase via deep learning architecture such as CNN, RNN & MAN, pre- processing the sentence to extract features & derive results through a classifier in testing phase. ACKNOWLEDGEMENT We have taken efforts in building this paper. However, it would not have been possible without the kind of support and encouragement from many individuals and college. We would first like to extend our sincere thanks to all of them. A hearty thanks to our project supervisor Prof. Manasi Kulkarni for the constant supervision and guidance in all difficulties faced during the project and report making. We are thankful to H.O.D of the Computer Department Dr. Sharvari Govilkar for providing the necessary information and resources. We are highly indebted to the Principal Dr. SandeepJoshifor the constant motivation and encouragement which helpeda lot in completion of this report. We would like to express our gratitude towards our parents for their kind cooperation. We would also like to thank the authors whose books we have referred to in making the project and report. REFERENCES [1] Natural language processing. https://en.wikipedia.org/wiki/Natural_language_proces sing [2] Victor U Thompson & Chris Bowerman, (2018) “Methods for Detecting Paraphrase Plagiarism”. [3] Domain generalization in sketch-based image retrieval systems. http://reports.ias.ac.in/report/20604/domain- generalization-in-sketch-based-image-retrieval-sbir- systems [4] Shruti Srivastava & Sharvari Govilkar, (2017) “A Survey on Paraphrase DetectionTechniquesforIndianRegional Languages”. [5] Darshana S Bhole, Sandip S. Patil, (2017)“TheStudyand Review of Paraphrase Detection Techniques in Machine Learning”. [6] Textcorpus.https://en.wikipedia.org/wiki/Text_corpus [7] Dr.S.V.Kogilavani1, Dr.R.Thangarajan, Dr.C.S.Kanimozhiselvi, Dr.S.Malliga, (2017) “Detecting Paraphrases in Tamil Language Sentences”. [8] Kamal Sarkar, (2016) “Detecting Paraphrases in Indian Languages Using Multinomial Logistic Regression Model”. [9] Nandini Sethi, Prateek Agrawal, Vishu Madaan & Sanjay Kumar Singh, (2016) “A Novel Approach to Paraphrase Hindi Sentences using Natural Language Processing”. [10] M. Anand Kumar, Shivkaran Singh, KavirajanB&Soman K P, (2016) “Overview of Shared Task on Detecting Paraphrases in Indian Languages (DPIL)”. BIOGRAPHIES Mrunal Badade, Student of Pillai College of Engineering, completing last year of my degree in Computer Engineering department. Vaibhav Adsul, Student of Pillai College of Engineering, completing last year of my degree in Computer Engineering department. Jagruti Thombare, Student of Pillai College of Engineering, completing last year of my degree in Computer Engineering department.
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 5243 Akshata Deshpande, Student of Pillai College of Engineering, completing last year of my degree in Computer Engineering department. Prof. Mansi Kulkarni, Faculty of Pillai College of Engineering. Assistant professor in department of Information Technology.