The document describes a system for detecting paraphrases in the Marathi language. It calculates both statistical and semantic similarity between pairs of Marathi sentences. Statistical similarity looks at word order, distance, and vectors, while semantic similarity generates Universal Networking Language (UNL) graphs to compare meaning. The total paraphrase score combines the statistical and semantic similarity scores to judge whether sentences are paraphrases or not. The system aims to address the lack of existing paraphrase detection capabilities for Marathi.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
"Attention Is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, https://bit.ly/2y7yAD2 presented by Maroua Maachou (Veepee)
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validityalessio_ferrari
Complete lecture on controlled experiments in software engineering. It explains practical guidelines on conducting controlled experiments and describes the concepts of dependent, independent, and control variables, significance, and p-value. It also explains how to select the appropriate statistic test for a hypothesis, and gives example of data for different typical tests.
Finally, it discusses threats to validity in controlled experiments and gives indications for reporting.
Find the video lectures here: https://www.youtube.com/playlist?list=PLSKM4VZcJjV-P3fFJYMu2OhlTjEr9Bjl0
RoFormer: Enhanced Transformer with Rotary Position Embeddingtaeseon ryu
안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 올해 발표된, RoFormer: Enhanced Transformer with Rotary Position Embedding 라는 제목의 논문입니다.
해당 논문은 Rotary Position Embedding을 이용하여 Transformer를 개선 시킨 논문입니다. Position embedding은 Self attention의 포지션에 대한 위치를 기억 시키기 위해 사용이 되는 중요한 요소중 하나 인대요, Rotary Position Embedding은 선형대수학 시간때 배우는 회전행렬을 사용하여 위치에 대한 정보를 인코딩 하는 방식으로 대체하여 모델의 성능을 끌어 올렸습니다.
논문에 대한 백그라운드 부터, 수식에 대한 디테일한 리뷰까지,
논문 리뷰를 자연어 처리 진명훈님이 디테일한 논문 리뷰 도와주셨습니다!
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
"Attention Is All You Need" (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, https://bit.ly/2y7yAD2 presented by Maroua Maachou (Veepee)
Controlled experiments, Hypothesis Testing, Test Selection, Threats to Validityalessio_ferrari
Complete lecture on controlled experiments in software engineering. It explains practical guidelines on conducting controlled experiments and describes the concepts of dependent, independent, and control variables, significance, and p-value. It also explains how to select the appropriate statistic test for a hypothesis, and gives example of data for different typical tests.
Finally, it discusses threats to validity in controlled experiments and gives indications for reporting.
Find the video lectures here: https://www.youtube.com/playlist?list=PLSKM4VZcJjV-P3fFJYMu2OhlTjEr9Bjl0
RoFormer: Enhanced Transformer with Rotary Position Embeddingtaeseon ryu
안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 올해 발표된, RoFormer: Enhanced Transformer with Rotary Position Embedding 라는 제목의 논문입니다.
해당 논문은 Rotary Position Embedding을 이용하여 Transformer를 개선 시킨 논문입니다. Position embedding은 Self attention의 포지션에 대한 위치를 기억 시키기 위해 사용이 되는 중요한 요소중 하나 인대요, Rotary Position Embedding은 선형대수학 시간때 배우는 회전행렬을 사용하여 위치에 대한 정보를 인코딩 하는 방식으로 대체하여 모델의 성능을 끌어 올렸습니다.
논문에 대한 백그라운드 부터, 수식에 대한 디테일한 리뷰까지,
논문 리뷰를 자연어 처리 진명훈님이 디테일한 논문 리뷰 도와주셨습니다!
This slide introduces transformer-xl which is the base paper for xl-net. You can understand what is the major contribution of this paper using this slide. This slide also explains the transformer for comparing differences between transformer and transformer-xl. Happy NLP!
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-sze
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Vivienne Sze, Associate Professor at MIT, presents the "Approaches for Energy Efficient Implementation of Deep Neural Networks" tutorial at the May 2018 Embedded Vision Summit.
Deep neural networks (DNNs) are proving very effective for a variety of challenging machine perception tasks. But these algorithms are very computationally demanding. To enable DNNs to be used in practical applications, it’s critical to find efficient ways to implement them.
This talk explores how DNNs are being mapped onto today’s processor architectures, and how these algorithms are evolving to enable improved efficiency. Sze explores the energy consumption of commonly used CNNs versus their accuracy, and provides insights on "energy-aware" pruning of these networks.
Paraphrase detection is an academically challenging NLP problem of detecting whether multiple phrases have the same meaning. In this talk, we’ll go through the existing traditional and deep learning approaches for this task, and see how they apply in practice as a silver-winning solution to the popular Kaggle Quora Question Pairs competition.
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
오늘 소개해 드릴 논문은 구글의 BERT와 페이스북 현재 메타의 RoBERTa를 기반으로 만들어진 모델입니다. RoBERTa + Disentangled Attention과 enhanced mask decode
두가지의 핵심 기술로 RoBERTa를 더욱 개선 시킨 모델이라고 이해하시면 될 것 같습니다. 추가적으로 Scale Invariant Fine Tuning을 도입하여 RoBERTa를 상당히 많은 테스크에서, NLU 테스크에서는 RoBERTa, BERT이상의 성능을 보여준 논문이기도 합니다.
논문의 자세한 리뷰부터, 백그라운드 지식까지, 자연어처리팀 진명훈님이 도와주셨습니다.
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words
but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications
such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount
of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no
existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.
A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this
system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity
measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data
is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and worddistance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any
syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two
Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated
after joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase or
non-paraphrase about the Marathi sentences in question.
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy
word based on the clue words. The related words for each sense of a polysemy word as well as single sense
word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and
adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the
structure of WordNet, we developed a new model of WordNet that organizes the different senses of
polysemy words as well as the single sense words based on the clue words. These clue words for each sense
of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the
polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms.
The clue word can be a noun, verb, adjective or adverb.
This slide introduces transformer-xl which is the base paper for xl-net. You can understand what is the major contribution of this paper using this slide. This slide also explains the transformer for comparing differences between transformer and transformer-xl. Happy NLP!
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-sze
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Vivienne Sze, Associate Professor at MIT, presents the "Approaches for Energy Efficient Implementation of Deep Neural Networks" tutorial at the May 2018 Embedded Vision Summit.
Deep neural networks (DNNs) are proving very effective for a variety of challenging machine perception tasks. But these algorithms are very computationally demanding. To enable DNNs to be used in practical applications, it’s critical to find efficient ways to implement them.
This talk explores how DNNs are being mapped onto today’s processor architectures, and how these algorithms are evolving to enable improved efficiency. Sze explores the energy consumption of commonly used CNNs versus their accuracy, and provides insights on "energy-aware" pruning of these networks.
Paraphrase detection is an academically challenging NLP problem of detecting whether multiple phrases have the same meaning. In this talk, we’ll go through the existing traditional and deep learning approaches for this task, and see how they apply in practice as a silver-winning solution to the popular Kaggle Quora Question Pairs competition.
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
오늘 소개해 드릴 논문은 구글의 BERT와 페이스북 현재 메타의 RoBERTa를 기반으로 만들어진 모델입니다. RoBERTa + Disentangled Attention과 enhanced mask decode
두가지의 핵심 기술로 RoBERTa를 더욱 개선 시킨 모델이라고 이해하시면 될 것 같습니다. 추가적으로 Scale Invariant Fine Tuning을 도입하여 RoBERTa를 상당히 많은 테스크에서, NLU 테스크에서는 RoBERTa, BERT이상의 성능을 보여준 논문이기도 합니다.
논문의 자세한 리뷰부터, 백그라운드 지식까지, 자연어처리팀 진명훈님이 도와주셨습니다.
A Simple Introduction to Word EmbeddingsBhaskar Mitra
In information retrieval there is a long history of learning vector representations for words. In recent times, neural word embeddings have gained significant popularity for many natural language processing tasks, such as word analogy and machine translation. The goal of this talk is to introduce basic intuitions behind these simple but elegant models of text representation. We will start our discussion with classic vector space models and then make our way to recently proposed neural word embeddings. We will see how these models can be useful for analogical reasoning as well applied to many information retrieval tasks.
Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words
but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications
such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount
of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no
existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.
A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this
system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity
measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data
is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and worddistance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any
syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two
Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated
after joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase or
non-paraphrase about the Marathi sentences in question.
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy
word based on the clue words. The related words for each sense of a polysemy word as well as single sense
word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and
adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the
structure of WordNet, we developed a new model of WordNet that organizes the different senses of
polysemy words as well as the single sense words based on the clue words. These clue words for each sense
of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the
polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms.
The clue word can be a noun, verb, adjective or adverb.
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
Considerable research in the field of ontology matching has been performed where information sharing
and reuse becomes necessary in ontology development. Measurement of lexical similarity in ontology
matching is performed using synset, defined in WordNet. In this paper, we defined a Super Word Set,
which is an aggregate set that includes hypernym, hyponym, holonym, and meronym sets in WordNet.
The Super Word Set Similarity is calculated by the rate of words of concept name and synset’s words
inclusion in the Super Word Set. In order to measure of Super Word Set Similarity, we first extracted
Matched Concepts(MC), Matched Properties(MP) and Property Unmatched Concepts(PUC) from the
result of ontology matching. We compared these against two ontology matching tools – COMA++ and
LOM. The Super Word Set Similarity shows an average improvement of 12% over COMA++ and 19%
over LOM.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Rule-based Prosody Calculation for Marathi Text-to-Speech SynthesisIJERA Editor
This research paper presents two empirical studies that examine the influence of different linguistic aspects on
prosody in Marathi. First, we analyzed a Marathi corpus with respect to the effect of syntax and information
status on prosody. Second, we conducted a listening test which investigated the prosodic realisation of
constituents in the Marathi depending on their information status. The results were used to improve the prosody
prediction in the Marathi text-to-speech synthesis system MARY.
Cross lingual similarity discrimination with translation characteristicsijaia
In cross-lingual plagiarism detection, the similarity between sentences is the basis of judgment. This paper
proposes a discriminative model trained on bilingual corpus to divide a set of sentences in target language
into two classes according their similarities to a given sentence in source language. Positive outputs of the
discriminative model are then ranked according to the similarity probabilities. The translation candidates
of the given sentence are finally selected from the top-n positive results. One of the problems in model
building is the extremely imbalanced training data, in which positive samples are the translations of the
target sentences, while negative samples or the non-translations are numerous or unknown. We train models
on four kinds of sampling sets with same translation characteristics and compare their performances.
Experiments on the open dataset of 1500 pairs of English Chinese sentences are evaluated by three metrics
with satisfying performances, much higher than the baseline system.
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
ABSTRACT
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called
‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words linked together in accordance with a set of predefined rules called grammar. Grammar checking is the task of validating sentences syntactically and is a prominent tool within language engineering. Our review draws on the recent development of various grammar checkers to look at past, present and the future in a new light. Our review covers grammar checkers of many languages with the aim of seeking their approaches, methodologies for developing new tool and system as a whole. The survey concludes with the discussion of various features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
In this paper we discuss the difficulties in processing the Malayalam texts for Statistical Machine Translation (SMT), especially the verb forms. Mostly the agglutinative nature of Malayalam is the main issue with the processing of text. We mainly focus on the verbs and its contribution in adding the difficulty in processing. The verb plays a crucial role in defining the sentence structure. We illustrate the issues with the existing google translation system and the trained MOSES system using limited set of English- Malayalam parallel corpus. Our reference for analysis is English-Malayalam language pair.
Author Credits - Maaz Anwar Nomani
Semantic Role Labeler (SRL) is a semantic parser which can automatically identify and then classify arguments of a verb in a natural language sentence for Hindi and Urdu. For e.g. in the natural language sentence “Sara won the competition because of her hard work.”, ‘won’ is the main verb and there are 3 arguments for this verb; ‘Sara’ (Agent), ‘hard work’ (Reason) and ‘competition’ (Theme). The problem statement of a SRL revolves around the fact that how will you make a machine identify and then classify the arguments of a verb in a natural language sentence.
Since there are 2 sub problem statements here (Identification and Classification), our SRL has a pipeline architecture in which a binary classifier (Logistic Regression) is first trained to identify whether a word is an argument to a verb in a sentence or not (Yes or No) and subsequently a multi-class classifier (SVM with Linear kernel) is trained to classify the identified arguments by above binary classifier into one of the 20 classes. These 20 classes are the various notions present in a natural language sentence (for e.g. Agent, Theme, Location, Time, Purpose, Reason, Cause etc.). These ‘notions’ are called Propbank labels or semantic labels present in a Proposition Bank which is a collection of hand-annotated sentences.
In essence, SRL felicitates Semantic Parsing which essentially is the research investigation of identifying WHO did WHAT to WHOM, WHERE, HOW, WHY and WHEN etc. in a natural language sentence.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Similar to Detecting Paraphrases in Marathi Language (20)
Fault tolerance is an important issue in the field of cloud computing which is concerned with the techniques or mechanism needed to enable a system to tolerate the faults that may encounter during its functioning. Fault tolerance policy can be categorized into three categories viz. proactive, reactive and adaptive. Providing a systematic solution the loss can be minimized and guarantee the availability and reliability of the critical services. The purpose and scope of this study is to recommend Support Vector Machine, a supervised machine learning algorithm to proactively monitor the fault so as to increase the availability and reliability by combining the strength of machine learning algorithm with cloud computing.
. In the Namibian educational environment during COVID-19 many schools were affected as a result of
COVID-19 such as primary school, secondary school, as well as tertiary institutions experiencing challenges of
eLearning platform usage as a means of facilitating teaching and learning among learners and students as most
of them have to adapt to the new environment of the online platform. However, despite some schools had adopted
and implemented eLearning the study discovered that many schools including universities do not fully utilize the
platform implemented in their schools and as such many schools have been struggling to adapt to the new environment of online learning
The study aimed to investigate into the impact of a National COVID-19 Health contact tracing and monitoring system for Namibia. The study used qualitative methods as a research strategy. Qualitative data was collected
through zoom meeting and a Google form link was distributed to the participants. The findings of the study revealed
that a total of 18 participants responded to the semi-structured questions of which 38.9% represents male while
female 61.1%. The age group between 18–25 response rate were 22.2%, age group between 26–35 response rate were
55.6%, age group between 36–45 response rate were 16.7% and the age group between 46 and above response rate
was 10% represented in green colour to represent participants who fall in the age group between 46 and above
The purpose of the study was to examine the effect of information and communication technology on
service delivery in the Telecommunications industry. A descriptive survey research method was used as it helped
the researcher to determine how the use of ICTs can improve service delivery at MTC, Namibia.
Findings showed that variables of information and communication technology have positive effect on service
delivery. Therefore, the study concluded that training collectively with changes in corporate policies and support,
can result in better service delivery. It was recommended that employees have to acquire skills on how to use computers and communication software in order to offer efficient services
Covid-19 is the destructive world’s most recent pandemic that is experienced in every part of the world.
This deadly virus affects different people in different ways. Most infected people will develop mild to moderate
illness and recover without hospitalisation. Covid-19 most common symptoms include fever, dry and tiredness.
It is against this background that in Namibian health environment the country uses a manual system to record
public member’s demographic information when visiting public places which do not allow tracing and monitoring of every public member who visited the 14 regions in the country. Therefore, the present study developed a
National COVID-19 health contact tracing and monitoring system which will allow every public member who visits
an enclosed public place by capturing their demographic information as well as the date and time the facility was
visited. The system replaces the paper-based method of recording the information of people visiting public places
with an entrance that allows the coming in and out of people. The system will also allow for real-time monitoring of
temperature changes of individuals.
Abstract. The Ministry of Health and Social Services in Namibia under the division of epidemiology uses a manual
paper-based approach to capture disease surveillance data through 5 levels of reporting which include the community level, the health facility level, the district level, and the national level. As a result, this method of communicating
and exchanging disease surveillance information is cost and time consuming, which delay disease surveillance information from reaching the head office on time.
In today’s world information and communication technology (ICT) play a crucial role and at the same time, it affects our lives every day. In the current digital age, many organisations across the globe make use of ICT as a tool to facilities teaching and learning(Bosamia,2018). These technologies have been used to enable end-user to access content materials offered online such as portable devices such smartphone, laptops and so on which operate for information, speed, and communication anywhere and anytime without physically visiting the location where the service is offered. With the use of ICT, e-commerce comes into play which enables end-user to send an email, market shopping to on-line shopping, classroom learning to e-learning where class are conducted over the internet.
This research is proposed to design air monitoring system using IOT. The goal of building a smart device to improve the quality of life. We have used several sensors to identify the quality of air on real time basis. IOT based air monitoring system is used to monitor the air quality over the app using internet. It will show the air quality in PPM on LCD. And also, if level is exceeding the normal rate then it will notify the respective person who is the user of that app, an emergency message to let them know that they should take symptoms like wearing a mask etc. To protect them from bad air quality.
Solving geometric problems of special importance is the transformation of a plane called inversion. When solving geometric problems, the transformation of the plane called inversion is of special importance. An inversion is a mapping of a plane that a set of directions and a circle maps to the same set, and in doing so it can map a line either to a line or to a circle, and it can also map a circle to either a line or a circle.
We present the solution in several ways, i.e., one solution from geometry, four solutions are clearly geometric, i.e., one is trigonometric and three are analytical (two on complex level and one in Deck level).
More from BOHR International Journal of Smart Computing and Information Technology (10)
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
2. 8 Shruti Srivastava and Sharvari Govilkar
Phrase level: Phrasal paraphrase states that different
phrases shared same semantic content. The phrasal para-
phrases include syntactic phrases as well as pattern forma-
tion with connected factors.
Example S1: ֆ֚֡֔֠ֈ֚֞֞
ե֊֠ ֑֒֞֞օ ֛֧֟֔֟֔. (Tulsidas wrote
Ramayana.)
S2: ֑֒֞֞օ ֆ֚֡֔֠ֈ֚֞֞
ե֊֠ ֆ֚֡֔֠ֈ֚֞֞
ե֊֠. (Ramayana written by
Tulsidas.)
S3: ֆ֚֡֔֠ֈ֚֞֞
ե֊֠ ֔ոշ
֧ ֡
ֆ֚֔֠ֈ֚֞ ֧
ը֛ֆ. (Ramayana’s author is
Tulsidas.)
S4: ֡
ֆ֚֔֠ֈ֚֞֞ե֊֠ ֑֒֞֞օ֞ռ֠ ֒ռ֊֞ շ֔֠
֧ . (Tulsidas composed the
Ramayana.)
Sentential paraphrases: In this type of paraphrase one sen-
tence is totally replaced by other sentence but still both
sentences keeping the same meaning.
Example –
S1: ֧ ֒֞ւ֠ ֟֊֒֞֕֠ ֧
ը֛. (Each region of
Maharashtra has a different Marathi language)
S2: ֒֞ւ֠ ֏֞֙֞ ֚֗
Sentential paraphrases: In this type of paraphrase one sentence is totally replaced by
keeping the same meaning.
Example –
S1: मिाराष्ट्रातील प्रत्येक प्राांताची मराठी भाषा लनराळी आिे. (Each region of Maharashtra has a d
S2: मराठी भाषा सर्व मिाराष्ट्राची जरी एक असली तरी तीच दर बारा कोसार्र बदलते.
(Marathi language is the only one in the whole of Maharashtra, the same rate varies fro
II. Literature Survey
This section presents various techniques for paraphrase detection in Indian languages. Par
applied on various languages to detect whether two sentences were paraphrase of each oth
Table 2.1: Summary of Literature Review
SNo Paper
Language: English
1
Title Paraphrase detection using Semantic relatedness based on
Synset Shortest path in WordNet
Semant
approac
Author Jun Choi Lee & Yu-N Cheah, August 2016
2
Title Learning Paraphrase Identification with structural
alignment Alignm
Approa
Author Chen Liang, Praveen Paritosh, Vinodh Rajendran, Kenneth
D. Forbus, July 2016
3
Title Paraphrase Detection based on Identical Phrase and
Similar Word matching
SimMat
combin
metrics
Author Hoang-Quoc, Nguyen-Son,Yusuke Miyao, Isao
Echizen,2015
4 Title Paraphrase detection using string similarity with synonyms Use of s
similari
Author Jun Choi Lee & Yu-N Cheah,2015
5
Title Re-examining Machine Translation Metrics for Paraphrase
Identification
Combin
metrics.
Author Nitin Madnani, Joel Tetreault, Martin Chodorow, 2012
6
Title Paraphrase Identification Using Weighted Dependencies
and word semantics
Calcula
dissimil
using de
Author Mihai Lintean and Vaile Rus, 2009
7 Title A Semantic Similarity Approach to paraphrase Detection JCN w
with ma
Author Fernando and Stevenson, August 2008
8 Title A Metric for Paraphrase Detection Paraphr
using S
Author Cordeiro, Dias, Brazdil , 2007(IEEE)
9
Title Paraphrase Identification on the Basis of Supervised
Machine Learning Techniques
Combin
and sem
Author Kozareva and Montoyo,2006
10
Title CFILT-CORE: Semantic Textual Similarity using
Universal Networking Language
Describ
similari
extracti
Univers
finding
pair of s
Author Avishek Dan & Pushpak Bhattacharyya, IIT Bombay
Mumbai
Language : Hindi
11
Title A Novel approach to Paraphrase Hindi Sentences using
NLP
Creating
and anto
Author N.Sethi,P. Agrawal, V. Madaan, S.K. Singh, 2016
12 Title Paraphrase Detection in Hindi Language using
Syntactic Features of Phrase
Random
classific
method
for ide
languag
Author
Rupal Bhargava, Anushka Baoni, Harshit Jain, 2016
13 Title Hindi Paraphrase Detection using Natural Language
Processing Techniques & Semantic Similarity
Computations
Discove
semanti
done by
.
(Marathi language is the only one in the whole of Maha-
rashtra, the same rate varies from 12 to 12 degrees.)
2 Literature survey
This section presents various techniques for paraphrase
detection in Indian languages. Paraphrase detection tech-
niques have been applied on various languages to detect
whether two sentences were paraphrase of each other
or not.
Table 1. Summary of literature review.
SNo Paper Algorithm Accuracy F-score
Language: English
1 Title Paraphrase detection using Semantic relatedness
based on Synset Shortest path in WordNet
Semantic relatedness
approach
71.1% 81.1%
Author Jun Choi Lee & Yu-N Cheah, August 2016
2 Title Learning Paraphrase Identification with structural
alignment
Alignment based
Approach
78.6% 85.3%
Author Chen Liang, Praveen Paritosh, Vinodh Rajendran,
Kenneth D. Forbus, July 2016
3 Title Paraphrase Detection based on Identical Phrase
and Similar Word matching
SimMat metric
combined with 8 MT
metrics
77.6% 83.9%
Author Hoang-Quoc, Nguyen-Son,Yusuke Miyao, Isao
Echizen, 2015
4 Title Paraphrase detection using string similarity with
synonyms
Use of synonyms in text
similarity metrics
70.6% 80.4%
Author Jun Choi Lee & Yu-N Cheah, 2015
5 Title Re-examining Machine Translation Metrics for
Paraphrase Identification
Combination of 8 MT
metrics
77.4% 84.1%
Author Nitin Madnani, Joel Tetreault, Martin Chodorow,
2012
6 Title Paraphrase Identification Using Weighted
Dependencies and word semantics
Calculate similarity and
dissimilarity scores
using dependencies
72.41% 81.32%
Author Mihai Lintean and Vaile Rus, 2009
7 Title A Semantic Similarity Approach to paraphrase
Detection
JCN word similarity
with matrix
74.1% 82.4%
Author Fernando and Stevenson, August 2008
8 Title A Metric for Paraphrase Detection Paraphrase detection
using SumoMetric
78.19% 80.92%
Author Cordeiro, Dias, Brazdil, 2007 (IEEE)
9 Title Paraphrase Identification on the Basis of
Supervised Machine Learning Techniques
Combination of lexical
and semantic features
76.6% 79.6%
Author Kozareva and Montoyo, 2006
10 Title CFILT-CORE: Semantic Textual Similarity using
Universal Networking Language
Described a syntactic and word level similarity method
combining with a semantic extraction method which was based
on Universal Networking Language (UNL) for finding
semantic similarity score between a pair of sentences
Author Avishek Dan & Pushpak Bhattacharyya, IIT
Bombay Mumbai
(Continued)
3. Detecting Paraphrases in Marathi Language 9
Table 1. Continued
SNo Paper Algorithm Accuracy F-score
Language: Hindi
11 Title A Novel approach to Paraphrase Hindi Sentences using NLP Creating paraphrase by applying synonyms and
antonyms replacement
Author N. Sethi, P. Agrawal, V. Madaan, S.K. Singh, 2016
12 Title Paraphrase Detection in Hindi Language using Syntactic
Features of Phrase
Random forest classifier used for classification and
Leven-shtein Distance method used for similarity
score calculation for identifying paraphrases in the
Hindi language
Author Rupal Bhargava, Anushka Baoni, Harshit Jain, 2016
13 Title Hindi Paraphrase Detection using Natural Language
Processing Techniques & Semantic Similarity Computations
Discovered synonym WordNet using a semantic
similarity metric and classification done by
paraphrase detection decision tree classifier
Author Vani K, Deepa Gupta, 2016
14 Title A Novel Paraphrase Detection Method in Hindi Language
using Machine Learning
Random forest classifier and different machine
learning algorithms were used for removing
overlapping words and calculated normalized IDF
score for paraphrase detection
Author Anuj Saini, 2016
Language: Malayalam
15 Title Malayalam Paraphrase Detection Procured CUSAT Malayalam Wordnet for calculating
sentence similarity using matching tokens, lemmas
and synonyms
Author Sindhu L. and Sumam Mary Idicula, 2016
16 Title Detecting Paraphrase in Indian Languages – Malayalam Calculated cosine similarity score using model named
Bag of Word and a threshold for calculating
paraphrase score
Author Manju K and Sumam Mary Idicula, 2016
17 Title Paraphrase Identification using Malayalam sentences Detecting paraphrase by Statistical measure and
semantic analysis
Author Ditty Mathew, Dec. 2013 (IEEE)
Language : Tamil
18 Title Detection of Paraphrases on Indian Languages Tamil Shallow parser used for extracting sixteen
diverse morphological features.
Author R. Thangarajan, S. V. Kogilavani, A. Karthic, S. Jawahar, 2016
Languages (Tamil, Malayalam, Hindi, Punjabi)
19 Title Detecting Paraphrases in Indian Languages based on
Gradient Tree Boosting
Employed a Cosine, Dice Distance, Jaccard Coefficient
and METEOR features and Gradient Boosting Tree
supervised classification method
Author Leilei Kong, Zhenyuan Hao, Kaisheng Chen, Zhongyuan
Han, Liuyang Tian, Haoliang Qi, 2016
20 Title Language Independent Paraphrases Detection Probabilistic Neural Network (PNN) has been used to
train Jaccard, Cosine Similarity and length normalized
Edit Distance language independent feature-set to
detect paraphrases in Indian languages
Author Sandip Sarkar, Partha Pakray, Saurav Saha, Dipankar Das,
Jereemi Bentham, Alexander Gelbukh, 2016
21 Title Detecting Paraphrases in Indian Languages Using
Multinomial Logistic Regression Model
The paper described about lexical and semantic level
similarities between two sentences using a trained
multinomial logistic regression model for paraphrase
identification and attain maximum f-measure
Author Kamal Sarkar, 2016
22 Title Paraphrase Detection in Indian Languages – A Machine
Learning Approach
The team worked on Hindi, Tamil, Punjabi and
Malayalam sentences. The paraphrase decision is
taken into consideration by various similarity
measures, machine translation evaluation metrics and
machine learning framework
Author Tanik Saikh, Sudip Kumar, Naskar Sivaji, Bandyopadhyay,
2016
4. 10 Shruti Srivastava and Sharvari Govilkar
3 Proposed system
The suggested system is designed to take the input as set
of two Marathi sentences. The pre-processing can be done
on sentences for collecting actual root words. Then the pre-
processed data is given as input to the different statistical
and semantic similarity to find paraphrases. The final simi-
larity score is generated by combining maximum statistical
similarity score and semantic similarity score. The final
paraphrase judgement is decided based on a set threshold
value which gives the decision of Paraphrase (P) or Non-
Paraphrase (NP).
In this approach system takes one original Marathi sen-
tences and one suspicious paraphrase sentence as an input.
The sentences then pre-process through various phases
including pre-processing by tokenization and input valida-
tion by stop word removing, stemming and morphological
analysis. After which the pre-processed data is fetched to
the matrix calculation for paraphrase detection.
3.1 Pre-processing
The first step is to present two Marathi sentences into text
format.
Sentence 1:
होळीचा सण साजरा करता
ंना काळजी . (The environment
should be taking care of while celebrating Holi festival.)
Sentence 2:
पयावणाच भान ठ
ेवून होळीचा सण साजरा करावा. (Celebrate the Holi fes-
tival after keeping environment in mind.)
Figure 1. Proposed system architecture.
The pre-processing of the documents involves following
methods:
Tokenization
Tokenization is the method of identification and separation
of tokens from two Marathi input sentences. Lexicon is the
individual word of the sentence. The word boundaries in
the Marathi language are fixed as it is a segmented lan-
guage. The spaces between the words are used to separate
one word from another. The process of separation of tokens
involves removal of delimiters like white spaces and punc-
tuation marks.
Validation of Devanagari script
A Marathi language is written in Devanagari script. Input
sentences validation is an important phase due to the
language dependent information and type of query pro-
vided to the system. The characters in Devanagari script
are recognized with Unicode values from UTF-8. Com-
paring each and every character with UTF-8 list, invalid
Devanagari characters were removed from the sentences.
The valid characters in Devanagari script were retained for
next phase.
Stop word elimination
Stop words are the utmost irrelevant words which delay
the sentence processing. These stop words are most often
occurring words in any document. The list of stop words
includes articles, prepositions and other function words.
Hence to enhance the processing speed of the system, the
stop words needs to identified and removed properly.
Stemming
Stemming is a very important step in the system. In this
phase, suffix list is used to remove suffixes from words for
creating exact stem word as the stem may not be the lin-
guistic root of the word.
Morphological analysis
The morphological analyzer identifies the inner structure
of the word. The root words from the given are expected to
produce from a morphological analyzer. After stemming,
the words are evaluated for any inflection. The perfect root
words can be produced by generating and matching rules.
Addition or replacement of characters to the inflected stem
word results into the precise core word.
3.2 Statistical similarity
The tokens created in tokenization process are the root
words but it is necessary to distinguish the tokens. Sta-
tistical similarity is calculated by considering tokens of
5. Detecting Paraphrases in Marathi Language 11
both sentences and compares them on the basis of word-
set, word-order, word-vector and sumo-metric (word dis-
tance). The system uses 4 statistical methods for statistical
comparisons which are explained as follows:
A. Jaccard similarity
It is based on the similarity and differences between two
word sets from the pair of sentences [10]. Jaccard similarity
coefficient is calculated by taking ratio of the total number
of matched unigrams to the total number of unmatched
words in both the sentences. The Jaccard similarity is
defined using following equation;
Jaccard Similarity (S1, S2) =
S1 ∩ S2
S1 ∪ S2
Example: Jaccard Similarity
Sentences Score
S1 होळी सण साजरा करता
ं ना S1 ∪ S2 = 4
काळजी {होळी, सण, साजरा, }
S2 पयावण भान ठ
ेवून होळी सण साजरा S1 ∩ S2 = 9
Jaccard Similarity
(S1, S2) = 4/9 = 0.4
B. Cosine similarity
Cosine similarity [10] is the most common statistical fea-
ture to measure the similarity between word vectors of two
sentences. For each sentence, a word vector of root words
of those sentences is formed which interprets the frequency
of words in the sentences. Cosine similarity is the ratio of
the dot product of those two word vectors and the product
of their lengths.
Cosine Similarity (S1, S2) =
S1 · S2
|S1| |S2|
Hence, Cosine similarity (S1, S2) = 0.91
C. Word order similarity
The vectors of the two sentences were considered for cal-
culating Word order similarity between them. The vectors
of two sentences constructed for the calculating using fol-
lowing equations:
L(Sa) = {(wa1, wa2), (wa1, wa3), . . . , (wa(i−1), wai)}
L(Sb) = {(wb1, wb2), (wb1, wb3), . . . , (wb(i−1), wbi)}
Where vector (wa1, wa2, . . . , wai) and vector (wb1,
wb2, . . . , wbi) constructed from sentences Sa and sen-
tence Sb tokens respectively. wx is before wy in (wx, wy) ∈
L (Sa) ∪ L(Sb). The similarity calculation between Sa and
Sb can be done [10] on the basis of following equation:
WordOrder(Sa, Sb) =
|L(Sa) ∩ L(Sb)|
|L(Sa) ∪ L(Sb)|
L(Sa) ∩ L(Sb) = {होळी सण, होळी साजरा, सण साजरा, होळी
,साजरा ,सण } = 6
WordOrder (Sa, Sb) =
|L(Sa) ∩ L(Sb)|
|L(Sa) ∪ L(Sb)|
=
6
30
= 0.17
D. Sumo metric
The Sumo-Metric [12] is on the word distance. This met-
ric finds the special lexical links between root words of
a pair of Marathi sentences. This metric not only identify
paraphrases but also identifies the exact and quasi-exact
matches. It can be considered as 1-gram exclusive overlap.
S1 and S2 is a pair of sentences given as an input to the
system. Let x and y are the numbers of words in those two
sentences. The number of words in the intersection of the
two sentences, excluding repetitions is represented by λ.
To calculate the Sumo-Metric S(.,.), first evaluate the
function S(x, y, λ)
S(x, y, λ) = α log 2
x
λ
+ β log 2
y
λ
Where α, β ∈ [0, 1] and α + β = 1. Now the Sumo-Metric
S (.,.) can be calculated by the following equation:
S(S1, S2) =
(
S(x, y, λ) if S(x, y, λ) 1.0
e−k∗S(x,y,λ)
otherwise
α and β are two core components which involved in the
calculation. The exact and quasi exact match pairs are grad-
ually penalize by log2 (.) function as for equal pairs the
sumo-metric result is exactly zero.
Sentence 1: होळी सण साजरा करता
ं ना काळजी
Sentence 2: भान ठ
े वून होळी सण साजरा
λ = number of words in the intersection of the two sen-
tences = {होळी, सण, साजरा, } = 4
Hence, if α and β = 0.5
Then,
S(x, y, λ) = 0.5 ∗ log2
7
4
+ 0.5 ∗ log2
6
4
= 0.7
3.3 Semantic similarity
Universal Networking language (UNL) generates a graph
of semantic representation of sentences. In this method,
UNL relation signifies semantic information in a by draw-
ing a hyper-graph. In this hyper-graph nodes represents
concepts and arcs signifies relations. The hyper-graph con-
tains a set of directed binary relations between two con-
cepts of a sentence.
UNL relation involves mainly 3 important terms which
are as follows:
6. 12 Shruti Srivastava and Sharvari Govilkar
1. Universal Words (UWs): These are the root word
which signifies word meaning.
2. Relation Labels: These labels tag the various interac-
tions between Universal Words.
3. Attribute Labels: It gives extra information for the
Universal Words present in a sentence.
Two UNL graphs are compared for the similarity indi-
rectly compares the semantic content of two sentences. This
comparison is useful for finding semantic similarity score.
UNL Expression
The directed graph generated in this method is known as
a “UNL Expression” or “UNL Graph” which gives a long
list of the binary relations.
The general format of UNL Expression is the following:
relation:[scope-ID](from-node, to-node )
A UW is described in each of from-node and to-
node.
An example for the UNL expression of the sentence
“Bharat is writing a novel.”
agt(write(icldo)@entry.@present.@progress,
Bharat (iofperson))
obj(write(icldo)@entry.@present.@progress,
novel(iclbook))
UNL En-conversion process
UNL en-cnoversion is a process of translating natural lan-
guage text into UNL for syntactic and semantic evaluation.
The En-converter scanned the sentence from left to right
then the word dictionary is used for matching of mor-
phemes and they will be the candidate morphemes for
further processing. By using defined rules a syntactic tree
is built and a UNL format semantic network is designed
for the input sentence until all the words are inputted.
UNL graph based Similarity
The generated UNL graphs of two sentences are compared
for computing semantic similarity using the formula given
below. The average of relation match scores and universal
word match scores is the actual relation score. The match-
ing attributes of the corresponding universal word decides
the universal word match score.
Figure 2. UNL graph example.
Semsim(S1, S2)
=
S1 ∩ S2
S1 ∪ S2
+
X
a∈S1,b∈S2
0.75 ∗
[[a.w1 = b.w1a.w2 = b.w2]]
|S1 ∪ S2|
+ 0.75 ∗
[[a.r = b.ra.Ew1 = b.Ew1a.Ew2 = b.Ew2]]
|S1 ∪ S2|
+ 0.6 ∗
[[a.Ew1 = b.Ew1a.Ew2 = b.Ew2]]
|S1 ∪ S2|
!
Example: Input sentence1: होळी सण साजरा करता
ंना
काळजी
——————UNL expression for sentence1——————
{unl} mod (काळजी (iofaction):01, )iofnature):
00@entry@pred)
seq (काळजी (iofaction):01, साजरा)icldo):02@present)
agt (सण(iofevent):02@def, साजरा)icldo):02@present)
aoj (होळी (iclevent):03@def, सण)iofevent):02@def) {/unl}
——————UNL expression for sentence1——————
Input sentence2: भान ठ
े वून होळी सण साजरा
——————UNL expression for sentence2——————
{unl} mod (भान (iofaction):01, )iofnature):
00@entry@pred)
seq (भान (iofaction):01, साजरा)icldo):02@present)
agt (सण (iofevent):02@def, साजरा)icldo):02@present)
aoj (होळी (iclevent):03@def, सण)iofevent):02@def) {/unl}
——————UNL expression for sentence2——————
Figure 3. UNL1 graph for sentence1.
Figure 4. UNL2 graph for sentence2.
7. Detecting Paraphrases in Marathi Language 13
Table 2. Similarity measure score.
Types of
similarity
Measure Value Similarity measure
score to consider
Statistical
similarity
measures
Jaccard Similarity
(Word set)
0.4 Maximum statistical
similarity Statsim(S1,
S2) = 0.91
Cosine
Similarity (Word
Vector)
0.91
Word order
Similarity
0.17
Sumo Metric 0.7
Semantic
similarity
measure
UNL graph
based similarity
0.59 Semantic similarity
Semsim(S1, S2) =
0.59
Overall
similarity
If p = 0.5, q = 0.5 then, Sim (S1, S2) = 0.5 * 0.91
+ 0.5 * 0.59 = 0.75
UNL1 and UNL2 denotes the UNL graph for sentence1 and
for sentence2 respectively. Hence the semantic Similarity
score can be calculated by using following formula,
S1 ∩ S2 = {होळी, सण, साजरा, पयावण} = 4
S1 ∪ S2 = {होळी, सण, साजरा, पयावण, काळजी, भान} = 6
Hence, Semsim (S1, S2) = 0.59
3.4 Overall similarity
The scores of maximum statistical similarity and seman-
tic similarity are combined using following formula for
overall similarity calculation of two sentences, Sim (S1, S2)
=p*Statsim(S1, S2)+q*Semsim(S1, S2).
The values of p and q are considered in such a way that,
p + q = 1; p, q ∈ [0, 1] as p and q are the constants which
represents involvement of each part to the overall similar-
ity calculation,
3.5 Paraphrase judgement
As the proposed system produced overall similarity score
of 0.75, it can easily conclude that the Marathi sentence
pair judged for paraphrasing identifies as a paraphrase
pair. Taking into consideration all 5 statistical and 1 seman-
tic measures, it can be detected that there are variation in
every similarity measure score. However, the overall sim-
ilarity score providesan enhanced outcome for paraphrase
detection process.
4 Results and discussion
4.1 Working of the system
The input to the Paraphrase Identification system will be
two Marathi text documents in “.txt” format. One docu-
ment will be original sentence and other will be suspicious
paraphrase sentence.
Figure 5. GUI of the system.
Figure 6. Overall similarity Paraphrase Judgement.
Since the paraphrase detection system is designed exclu-
sively for Marathi sentences, all characters which do not
belong to Devanagari script must be eliminated to save
processing time and memory. The pre-processing of sen-
tences undergoes further processing steps like tokeniza-
tion, stop word removal, stemming and morphological
analysis and will result into validated output.
The overall similarity calculation is done by adding
values of maximum statistical similarity and semantic sim-
ilarity. The values of both similarities are multiplied by
a smoothing factor before addition for considering the
contribution of each similarity in overall similarity. The
sentence pair is paraphrase or not is decided on basis of
score of overall similarity. If the score is greater than 0.8
then the sentences are paraphrase of each other. On click-
ing of overall similarity button, system generates overall
similarity score and on basis of that it generates paraphrase
judgement.
4.2 Experimental analysis
All sentence pairs from dataset of five domains were tested
domain wise and similarity wise. The graphs were created
for comparative study of each similarity measure and to
find the best for the system. The best method taking it into
consideration while calculating overall similarity score for
the system.
8. 14 Shruti Srivastava and Sharvari Govilkar
0
0.5
1
1.5
SL1 SL2 SL3 SL4 SL5 SL6 SL7 SL8
PARAPHRASE
SIMILARITY
SCORE
4 STASTICAL AND SEMANTIC (UNL) SIMILARITY
MEASURES
LITERATURE DOMAIN
Jaccard Similarity Cosine Similarity
Word Order Similarity Sumo Matrix
UNL
Figure 7. Comparison of Statistical and semantic similarity for
literature domain.
4.2.1 Domain wise statistical and semantic similarity
The sentence pairs from the political, sports, film, litera-
ture and miscellaneous domains were tested separately for
both types of similarities. The statistical similarity deals
with statistical structure between words of the sentences
whereas semantic similarity deals with the meaning. For
literature domain and miscellaneous domain, statistical
similarity score lies between 0 and 1. The semantic simi-
larity score lies between 0 to 2 for literature domain and
ranges from 0 to 3 for miscellaneous domain.
4.2.2 Overall similarity
Out of 4 statistical similarities, maximum statistical simi-
larity is calculated for each sentence pair. Combination of
the semantic similarity and maximum statistical similarity
considered as overall similarity. The overall similarity score
is used for setting the threshold value and give the judg-
ment for identifying paraphrase (P) and non-paraphrase
pair (NP).
0
0.5
1
1.5
2
2.5
PARAPHRASE
SIMILARITY
SCORE
SEMANTIC (UNL), MAXIMUM STASTICAL ,
OVERALL SIMILARITY MEASURES
SPORTS DOMAIN
UNL Max. Stastical OVERALL
Figure 8. Overall similarity measures for sports domain.
4.3 Threshold value generation
From obtained different similarity scores, it is necessary
to highlight a classical problem in classification – thresh-
olds. Usually, thresholds are some parameter upon which
system takes some decision. After computation and com-
parison of all similarity measures, it is essential to decide
upon some threshold value which decides whether a sen-
tence pair is a paraphrase or not.
Thresholds are the parameters which unease the pro-
cess of evaluation. In the evaluation process, no threshold
value is predefined. Instead, from overall comparison of
each similarity on each domain, best threshold is computed
upon which each domain is agreed.Threshold computa-
tion may be a classical issue of function maximization or
optimization.
The threshold values plays important role in paraphrase
judgement as the sentence pair can divide into 3 groups
as per their overall similarity scores viz, Paraphrase (P),
Semi-paraphrase (SP) and Non-Paraphrase (NP). The fol-
lowing Table 3 gives an example of overall similarity score
Table 3. Threshold selection.
Sentence pair Overall similarity score Threshold value Paraphrase judgement
Sentence 1: हा माणसाचा े आह
(Book is the best friend of the man)
Sentence 2: जवळचा आह
The book is a close friend of man.
1.22 (SM1 from
miscellaneous domain)
(score = 0.5) Paraphrase (P)
Sentence 1: मराठी भाषा िनराळी आह
Each region of Maharashtra has a different Marathi language
Sentence 2: मराठी भाषा सव जरी एक असली तरी तीच दर बारा कोसावर
बदलत
Marathi language is the only one in the whole of Maharashtra,
the same rate varies from 12 to 12 degrees
0.48 (SL5 from literature
domain)
(score 0.5
score 0.4)
Semi-paraphrase (SP)
Sentence 1: वेळी झोळी भरावी
Beans are to be filled with virtues at the time of Holi.
Sentence 2: होळीसाठी वाईट व मोळी बांधावी
Build bad rallies and rituals for Holi.
0.0 (SM4 from
miscellaneous domain)
(score = 0.0) Non-paraphrase (NP)
9. Detecting Paraphrases in Marathi Language 15
Table 4. Confusion matrix for performance evaluation.
Actual results Predicted results Total dataset
Similar Dissimilar
Similar 46 TP 12 FN 58
Dissimilar 1 FP 15 TN 16
of sentence pair and role of threshold for deciding its
section of Paraphrase (P), Semi-paraphrase (SP) and Non-
Paraphrase (NP).
4.4 Performance evaluation
The performance of the system is evaluated on 4 basic eval-
uation measures. These measures are accuracy, precision,
recall and F-measure. For calculation of these four perfor-
mance measures, the four parameters TP, TN, FP, and FN
must be calculated first.
The performance of the system was evaluated by creat-
ing a confusion matrix and it is given in Table 4.
The 46 sentence pairs classified as similar and 12 as dis-
similar out of 58 similar sentence pairs by the proposed
system. Out of actual 16 dissimilar sentence pairs, the 15
sentence pairs classified as dissimilar and 1 as similar in
the used data-set. In Table 4, True positive (TP) and True
Negative (TN) are the observations which are correctly pre-
dicted hence shown in green color. False positive (FP) and
False Negative (FN) must be minimized in order to get a
good system so they are shown in red color.
True positive (TP) and True Negative (TN) values occur
when actual results obtained from the system agreed with
the predicted results.
False positives (FP) and False negatives (FN), these val-
ues occur when the actual results contradicts with the
predicted results.
Accuracy
Accuracy is a ratio of correctly predicted observation to the
total observations.
Precision
Precision is obtained by dividing the number of cor-
rectly predicted paraphrases by the number of predicted
paraphrases. High precision relates to the low false posi-
tive rate.
Recall
Recall calculation is done taking the ratio of correctly pre-
dicted paraphrases to the all referenced paraphrases in
actual class.
F-measure
F-measure considers false positives and false negatives
into account and it is the weighted average of Precision and
Table 5. Performance analysis of overall similarity.
Measure Value
ACCURACY 0.82
PRECISION 0.98
RECALL 0.79
F-MEASURE 0.89
Recall. Accuracy gives best value when the values of false
positives and false negatives are similar. When the cost of
these two is different then consider values of both Precision
and Recall which in turn the value of F-measure.
Although there is deviation in the similarity score
of each method, the overall similarity score provedan
improved result. This system considered all the statistical
and semantic concerns.
The data obtained from the Table 5 prove that the
proposed system performed sound work of Paraphrase
Identification of Marathi sentences.
5 Dataset and testing
5.1 Dataset
The dataset used in this work consist of sentence pairs from
political, film, sports, literautre, miselleneous domains.
There are 74 marathi sentence pairs in the dataset and
each sentence pair has a binary decision accompanying
with it which gives the parahrase judgement for being
paraphrase pair or not.
5.2 Testing
To test the accuracy and F-measure, dataset was divided
into a ratio of 75% and 25% for training and testing respec-
tively. The performance measures (Accuracy, Precision,
Recall and F-Measure) were evaluated for the different
similarity measures.
A quartile is a form of quantile. 25% is set as the first
quartile (Q1) as it is the mid of the smallest number and
the median of the data set. The median of the data-set is
the second quartile (Q2). The mid value between median
and highest value of the data-set is considered as the third
quartile (Q3).
The decision of paraphrase of the sentence pair is
depend on the score of first quartile. If the similarity score
is greater than Q1 then the sentence pair is labelled as
paraphrase otherwise non-paraphrase. On the basis of this
decision accuracy, precision, recall and F-measure was cal-
culated and compared for each similarity measure.
In paraphrase identification of Marathi sentences, 5 sim-
ilarity measures were performed to check similarity score
of the sentences. The similarity measures were performed
on 58 sentence pairs to identify the highest score among
10. 16 Shruti Srivastava and Sharvari Govilkar
Table 6. Performance of the similarity measures.
Similarity measure Min Quartile 25% Quartile 50% Median Quartile 75% Max
Jaccard 0 0.18 0.29 0.4 0.86
Cosine 0.74 0.885 0.91 0.94 1
Word order 0 0.065 0.12 0.24 4
Sumo metric 0 0.01 0.04 0.72 1
Semantic (UNL) 0 0.235 0.38 0.79 2.58
Overall 0.42 0.58 0.64 0.87 0.87
Table 7. Performance measures analysis.
MEASURE JACCARD COSINE WORD-ORDER SUMO METRIC UNL OVERALL
ACCURACY 0.77 0.87 0.8 0.83 0.77 0.82
F-MEASURE 0.85 0.92 0.87 0.89 0.85 0.89
RECALL 0.77 0.95 0.75 0.86 0.77 0.79
PRECISION 0.94 0.9 1 0.92 0.94 0.98
4 statistical similarities and the performance of semantic
similarity.
5.2.1 Analysis of performance measures
As first quartile value is decided threshold for each simi-
larity, the confusion matrix is created and values of perfor-
mance measures are calculated according to it.
From Table 6 it is clear that cosine similarity gives the
best score among all similarity methods. Hence for find-
ing paraphrases in Marathi Sentences, the Cosine similarity
method can be proved effectively accurate.
From testing of all similarity measures, it can be clearly
convey that cosine similarity is best method among all sta-
tistical similarity techniques for finding similarity. When
considering all sentences in given dataset, other statistical
techniques also performed well for different sentence pair.
Word-order similarity is the best to find dissimilar pairs.
For taking dissimilarity into consideration, overall simi-
larity score is resulted to 0 if any of the statistical similarity
or semantic similarity resulted into 0. Otherwise it is cal-
0.77
0.87
0.8 0.83
0.77
0
0.2
0.4
0.6
0.8
1
ACCURACY
0
0
0
0
ACCURACY
SCALE
JACCORD COSINE WORD-ORDER
SUMO METRIC UNL
Figure 9. Accuracy comparisons of all similarity techniques.
culated by combining the maximum statistical similarity
score and semantic similarity score.
6 Conclusion and future scope
Mostly the research has been done for English and Indian
regional languages like Hindi, Malayalam, Punjabi and
Tamil. However no paraphrase detection work has been
yet done for Marathi language. This is the first footstep
towards detecting Paraphrases in Marathi sentences. The
development of social media in Marathi brings the avail-
ability of massive data in Marathi languages. Analysis of
social data like daily tweets in Twitter is the interest grow-
ing field for different purposes. Identifying paraphrases
should be widely explored to help many more arenas of
Natural language processing field.
The proposed system provides paraphrase identifica-
tion tool for Marathi sentences using incorporation of both
statistical and semantic features. The detailed analysis and
performance comparison of all statistical methods con-
cludes that cosine similarity measure outperforms among
all statistical measures. Semantic similarity carried out
through a UNL method plays an important role in achiev-
ing 82% accuracy and 89% F-measure. This score proves
that the designed tool is extremely supportive for Para-
phrase Identification of Marathi Sentences.
The system can be enhanced for capturing the similar-
ity between Marathi paragraphs and Marathi documents.
There is possibility of improving statistical methods to
detect more structural similarity types. Misspelled words
in the sentences will yield wrong result even if the sen-
tences are similar. Pre-processing steps can be enriched for
capturing the spelling mistakes and can deal accordingly.
Semantic similarity measure score can be improved by con-
sidering synonyms of the universal words. The current
model is trained on fewer data sets which can be extended
11. Detecting Paraphrases in Marathi Language 17
to incorporate more data. The model constructed can also
be changed from static to online Paraphrase Identification
tool for Marathi documents.
Bibliography
[1] Lee, Jun Choi and Cheah, Yu-N (2016) Paraphrase Detection
using Semantic Relatedness based on Synset Shortest Path in
WordNet. In: International Conference on Advanced Infor-
matics: Concepts, Theory and Applications, 16-17 August
2016, Parkroyal Penang Resort.
[2] Chen Liang, Praveen Paritosh, Vinodh Rajendran, Kenneth
D. Forbus, Learning Paraphrase Identification with Struc-
tural Alignment Conference: IJCAI 2016, At New York.
[3] Hoang-Quoc Nguyen-Son, Yusuke Miyao, Isao Echizen,
Paraphrase Detection Based on Identical Phrase and Similar
Word Matching, 29th Pacific Asia Conference on Language,
2015.
[4] J.C. Lee, and Y. Cheah. “Paraphrase Detection using String
Similarity with Synonyms.” The Fourth Asian Conference
on Information Systems, ACIS 2015.
[5] Nitin Madnani, Joel Tetreault, Martin Chodorow, Re-
examining Machine Translation Metrics for Paraphrase
Identification, Conference of the North American Chapter of
the ACL: 2012 Association for Computational Linguistics.
[6] Mihai Lintean and Vaile Rus, Paraphrase Identification
Using Weighted Dependencies and word semantics, Pro-
ceedings of the Twenty-Second International FLAIRS Con-
ference (2009).
[7] Fernando and Stevenson, 2008. A Semantic Similarity
Approach to paraphrase Detection, In Proceedings of the
11th Annual Research Colloquium of the UK Special Interest
Group for Computational Linguistics, pages 45–52. Citeseer.
[8] Cordeiro, Dias, Brazdil A Metric for Paraphrase Detection
Proceedings of the International Multi-Conference on Com-
puting in the Global Information Technology (ICCGI’07),
IEEE.
[9] Kozareva and Montoyo, Paraphrase identification on the
basis of supervised machine learning techniques, Advances
in Natural Language Processing: 5th International Confer-
ence on NLP (FinTAL 2006), Turku, Finland, 524–533.
[10] Avishek Dan Pushpak Bhattacharyya, “CFILT-CORE:
Semantic Textual Similarity using Universal Networking
Language” 2013 Association for Computational.
[11] Nandini Sethi, Prateek Agrawal, Vishu Madaan and Sanjay
Kumar Singh, A Novel Approach to Paraphrase Hindi Sen-
tences using Natural Language Processing, Indian Journal of
Science and Technology, Vol 9(28), July 2016.
[12] Rupal Bhargava, Anushka Baoni, Harshit Jain
“BITS_PILANI@DPIL-FIRE 2016: Paraphrase Detection
in Hindi Language using Syntactic Features of Phrase”,
DPIL 2016.
[13] Vani K, Deepa Gupta. “ASE@DPIL-FIRE2016: Hindi Para-
phrase Detection using Natural Language Processing Tech-
niques Semantic Similarity Computations” DPIL 2016.
[14] Anuj Saini “Anuj@DPIL-FIRE2016: A Novel Paraphrase
Detection Method in Hindi Language using Machine Learn-
ing”, DPIL 2016.
[15] Sindhu L. and Sumam Mary Idicula “CUSAT_NLP@DPIL-
FIRE2016:Malayalam Paraphrase Detection”, DPIL 2016.
[16] Manju K. and Sumam Mary Idicula,” CUSAT TEAM@DPIL-
FIRE2016: Detecting Paraphrase in Indian Languages-
Malayalam”, DPIL 2016.
[17] Ditty Mathew and Dr. Suman Mary Idicula,” Paraphrase
identification of Malayalam sentences – an experience”,
IEEE 2013.
[18] R. Thangarajan, S. V. Kogilavani, A. Karthic, S. Jawahar,
“KEC@DPIL-FIRE2016: Detection of Paraphrases on Indian
Languages”, DPIL 2016.
[19] Leilei Kong, Zhenyuan Hao, Kaisheng Chen, Zhongyuan
Han, Liuyang Tian, Haoliang Qi “HIT2016@DPIL-FIRE2016:
Detecting Paraphrases in Indian Languages based on “Gra-
dient Tree Boosting”, DPIL 2016.
[20] Sandip Sarkar, Partha Pakray, Saurav Saha, Dipankar Das,
Jereemi Bentham, Alexander Gelbukh “NLP-NITMZ@DPIL-
FIRE 2016: Language Independent Paraphrases Detection”,
DPIL 2016.
[21] Kamal Sarkar “KS_JU@DPIL-FIRE2016: Detecting Para-
phrases in Indian Languages using Multinomial Logistic
Regression Model”, DPIL 2016.
[22] Tanik Saikh, Sudip Kumar, Naskar Sivaji, Bandyopadhyay
“JU_NLP@DPIL-FIRE2016: Paraphrase Detection in Indian
Languages - A Machine Learning Approach”, DPIL 2016.