The document discusses analyzing Tamil lyrics to determine word frequency, rhyme patterns, and concept co-occurrence. It presents an analysis of over 2,000 Tamil songs to identify the top 10 most commonly used words, rhyme pairs, and co-occurring concepts. The analysis found that the lyrics most commonly expressed emotions of happiness and love. Future work could examine identifying emotions by genre and genre-specific rhyming patterns and concept relationships.
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
Anaphora Resolution is a process of finding referents in discourse. In computational linguistic, Anaphora
resolution is complex and challenging task. This paper focuses on pronominal anaphora resolution. It is a
subpart of anaphora resolution where pronouns are referred to noun referents. Including anaphora
resolution into many applications like automatic summarization, opinion mining, machine translation,
question answering systems etc. increase their accuracy by 10%. Related work in this field has been done
in many languages. This paper focuses on resolving anaphora for Punjabi language. A model is proposed
for resolving anaphora and an experiment is conducted to measure the accuracy of the system. The model
uses two factors: Recency and Animistic knowledge. Recency factor works on the concept of Lappin Leass
approach and for introducing animistic knowledge gazetteer method is used. The experiment is conducted
on a Punjabi story containing more than 1000 words and result is drawn with the future directions.
Paraphrasing refers to the sentences that either differs in their textual content or dissimilar in rearrangement of words but convey the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi.
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Analysis of anaphora resolution system forijitjournal
Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers.
It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in
Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for
English Language in which pronouns refers to the intended noun in discourse. In this paper two
computational models are proposed for anaphora resolution. Resolution of anaphora is based on various
factors among which these models use Recency factor and Animistic Knowledge. Recency factor is
implemented by using Lappin Leass approach in first model and using Centering approach in second
model. Information about animacy is obtained by Gazetteer method. The identification of animistic
elements is employed to improve the accuracy of the system. This paper demonstrates experiment
conducted by both the models on different data sets from different domains. A comparative result of both
the model is summarized and conclusion is drawn for the best suitable model.
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
Anaphora Resolution is a process of finding referents in discourse. In computational linguistic, Anaphora
resolution is complex and challenging task. This paper focuses on pronominal anaphora resolution. It is a
subpart of anaphora resolution where pronouns are referred to noun referents. Including anaphora
resolution into many applications like automatic summarization, opinion mining, machine translation,
question answering systems etc. increase their accuracy by 10%. Related work in this field has been done
in many languages. This paper focuses on resolving anaphora for Punjabi language. A model is proposed
for resolving anaphora and an experiment is conducted to measure the accuracy of the system. The model
uses two factors: Recency and Animistic knowledge. Recency factor works on the concept of Lappin Leass
approach and for introducing animistic knowledge gazetteer method is used. The experiment is conducted
on a Punjabi story containing more than 1000 words and result is drawn with the future directions.
Paraphrasing refers to the sentences that either differs in their textual content or dissimilar in rearrangement of words but convey the same meaning. Identifying a paraphrase is exceptionally important in various real life applications such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no existing system to identify paraphrases in Marathi.
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Analysis of anaphora resolution system forijitjournal
Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers.
It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in
Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for
English Language in which pronouns refers to the intended noun in discourse. In this paper two
computational models are proposed for anaphora resolution. Resolution of anaphora is based on various
factors among which these models use Recency factor and Animistic Knowledge. Recency factor is
implemented by using Lappin Leass approach in first model and using Centering approach in second
model. Information about animacy is obtained by Gazetteer method. The identification of animistic
elements is employed to improve the accuracy of the system. This paper demonstrates experiment
conducted by both the models on different data sets from different domains. A comparative result of both
the model is summarized and conclusion is drawn for the best suitable model.
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Stemming is the process of clipping off the affixes from the input word to obtain the respective
root word, but it is not necessary that stemming provide us the genuine and meaningful root
word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make
the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which
generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
This paper deals about the sentiment analysis of the Manipuri article. The language is very highly
agglutinative in Nature. The document files are the letters to the editor of few local daily newspapers. The
text is processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The lexicon of
verbs is modified with the sentiment polarity (Positive or Negative or Neutral) manually. With the POS
tagger the verbs of each sentence are identified and the modified lexicon of verbs is used to notify the
polarity of the sentiment in the sentence. The total number of polarity for each category that is positive,
negative and neutral is counted separately. The highest total of the three is the deciding factor of the
sentiment polarity of the document. The system shows a recall of 72.10%, a precision of 78.14% and a Fmeasure
of 75.00%.
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...csandit
Sentiment Analysis (SA) and machine learning techniques are collaborating to understand the attitude of text writer, implied in particular text. Although, SA is an important challenging
itself, it is very important challenging in Arabic language. In this paper, we are enhancing sentiment analysis in Arabic language. Our approach had begun with special pre processing steps. Then, we had adopted sentiment keywords co-occurrence measure (SKCM), as an algorithm extracted sentiment-based feature selection method. This feature selection method had utilized on three sentiment corpora using SVM classifier. We compared our approach with some traditional methods, followed by most SA works. The experimental results were very promising for enhancing SA accuracy.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Sanskrit in Natural Language ProcessingHitesh Joshi
As Sanskrit is most unambiguous language as compare to other natural languages. As stated by Rick Briggs, NASA it is the most suitable language for the computer in natural language processing.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...cscpconf
Context Free Grammar is a prime tool for specifying rules to verify the syntax of any language.This paper aims to classify Tamil poems into its subclasses by framing their CFGs. Initially,
tokenization is done where vowels (short syllables) and consonants (long syllables) are
identified, followed by ‘asai’ analysis, ‘seer’ analysis. We then compute ‘Thalai’, which is essential to distinguish among the ‘Paas’. We verify the found ‘Paa’ with the determined ‘osai’.
Classification of tamil poetry using context free grammar using tamil grammar...csandit
Context Free Grammar is a prime tool for specifying rules to verify the syntax of any language.
This paper aims to classify Tamil poems into its subclasses by framing their CFGs. Initially,
tokenization is done where vowels (short syllables) and consonants (long syllables) are
identified, followed by ‘asai’ analysis, ‘seer’ analysis. We then compute ‘Thalai’, which is
essential to distinguish among the ‘Paas’. We verify the found ‘Paa’ with the determined ‘osai’.
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Stemming is the process of clipping off the affixes from the input word to obtain the respective
root word, but it is not necessary that stemming provide us the genuine and meaningful root
word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make
the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which
generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
NLP techniques used for Spell checking to recommend find error in the written word and also suggest a relevant word.
Algorithm: Jaccard Coefficient, The Levenshtein Distance
This paper deals about the sentiment analysis of the Manipuri article. The language is very highly
agglutinative in Nature. The document files are the letters to the editor of few local daily newspapers. The
text is processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The lexicon of
verbs is modified with the sentiment polarity (Positive or Negative or Neutral) manually. With the POS
tagger the verbs of each sentence are identified and the modified lexicon of verbs is used to notify the
polarity of the sentiment in the sentence. The total number of polarity for each category that is positive,
negative and neutral is counted separately. The highest total of the three is the deciding factor of the
sentiment polarity of the document. The system shows a recall of 72.10%, a precision of 78.14% and a Fmeasure
of 75.00%.
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...csandit
Sentiment Analysis (SA) and machine learning techniques are collaborating to understand the attitude of text writer, implied in particular text. Although, SA is an important challenging
itself, it is very important challenging in Arabic language. In this paper, we are enhancing sentiment analysis in Arabic language. Our approach had begun with special pre processing steps. Then, we had adopted sentiment keywords co-occurrence measure (SKCM), as an algorithm extracted sentiment-based feature selection method. This feature selection method had utilized on three sentiment corpora using SVM classifier. We compared our approach with some traditional methods, followed by most SA works. The experimental results were very promising for enhancing SA accuracy.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Sanskrit in Natural Language ProcessingHitesh Joshi
As Sanskrit is most unambiguous language as compare to other natural languages. As stated by Rick Briggs, NASA it is the most suitable language for the computer in natural language processing.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
CLASSIFICATION OF TAMIL POETRY USING CONTEXT FREE GRAMMAR USING TAMIL GRAMMAR...cscpconf
Context Free Grammar is a prime tool for specifying rules to verify the syntax of any language.This paper aims to classify Tamil poems into its subclasses by framing their CFGs. Initially,
tokenization is done where vowels (short syllables) and consonants (long syllables) are
identified, followed by ‘asai’ analysis, ‘seer’ analysis. We then compute ‘Thalai’, which is essential to distinguish among the ‘Paas’. We verify the found ‘Paa’ with the determined ‘osai’.
Classification of tamil poetry using context free grammar using tamil grammar...csandit
Context Free Grammar is a prime tool for specifying rules to verify the syntax of any language.
This paper aims to classify Tamil poems into its subclasses by framing their CFGs. Initially,
tokenization is done where vowels (short syllables) and consonants (long syllables) are
identified, followed by ‘asai’ analysis, ‘seer’ analysis. We then compute ‘Thalai’, which is
essential to distinguish among the ‘Paas’. We verify the found ‘Paa’ with the determined ‘osai’.
A Statistical Model for Morphology Inspired by the Amis Languagedannyijwest
We introduce a statistical model for analysing the morphology of natural languages based on
their affixes. The model was inspired by the analysis of Amis, an Austronesian language with a rich
morphology. As words contain a root and potential affixes, we associate three vectors with each
word: one for the root, one for the prefixes, and one for the suffixes.
A statistical model for morphology inspired by the Amis languageIJwest
We introduce a statistical model for analysing the morphology of natural languages based on
their affixes. The model was inspired by the analysis of Amis, an Austronesian language with a rich
morphology. As words contain a root and potential affixes, we associate three vectors with each
word: one for the root, one for the prefixes, and one for the suffixes. The morphology captures
semantic notions and we show how to approximately predict some of them, for example the type of
simple sentences using prefixes and suffixes only. We then define a Sentence vector s associated with
each sentence, built from the prefixes and suffixes of the sentence and show how to approximately
predict a derivation tree in a grammar.
Developing links of compound sentences for parsing through marathi link gramm...ijnlc
Marathi is a verb-final language with a relatively free word order. Complex Sentences is one of the major types of sentences which are used commonly in any language. This paper explores the study of complex sentence structure of Marathi language. The paper proposes various links of complex sentence clauses and modelling of the complex sentences using proposed links in the Link Grammar Framework for parsing purpose.
Automatic Generation of Compound Word Lexicon for Marathi Speech Synthesisiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
Analysis of lexico syntactic patterns for antonym pair extraction from a turk...csandit
Extraction of semantic relations from various sourc
es such as corpus, web pages, dictionary
definitions etc. is one of the most important issue
in study of Natural Language Processing
(NLP). Various methods have been used to extract se
mantic relation from various sources.
Pattern-based approach is one of the most popular m
ethod among them. In this study, we
propose a model to extract antonym pairs from Turki
sh corpus automatically. Using a set of
seeds, we automatically extract lexico-syntactic pa
tterns (LSPs) for antonym relation from
corpus. Reliability score is calculated for each pa
ttern. The most reliable patterns are used to
generate new antonym pairs. Study conduct on only a
djective-adjective and noun-noun pairs.
Noun and adjective target words are used to measure
success of method and candidate
antonyms are generated using reliable patterns. For
each antonym pair consisting of candidate
antonym and target word, antonym score is calculat
ed. Pairs that have a certain score are
assigned to antonym pair. The proposed method shows
good performance with 77.2% average
accuracy.
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURK...cscpconf
Extraction of semantic relations from various sources such as corpus, web pages, dictionary
definitions etc. is one of the most important issue in study of Natural Language Processing
(NLP). Various methods have been used to extract semantic relation from various sources.
Pattern-based approach is one of the most popular method among them. In this study, we
propose a model to extract antonym pairs from Turkish corpus automatically. Using a set of
seeds, we automatically extract lexico-syntactic patterns (LSPs) for antonym relation from
corpus. Reliability score is calculated for each pattern. The most reliable patterns are used to
generate new antonym pairs. Study conduct on only adjective-adjective and noun-noun pairs.
Noun and adjective target words are used to measure success of method and candidate
antonyms are generated using reliable patterns. For each antonym pair consisting of candidate
antonym and target word, antonym score is calculated. Pairs that have a certain score are
assigned to antonym pair. The proposed method shows good performance with 77.2% average
accuracy.
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic concepts that characterizes the domain as well as their definitions and interrelationships. This paper will describe some algorithms for identifying semantic relations and constructing an Information Technology Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences. We then extract these sentences based on English pattern in order to build training set. We use a random sample among 245 categories of ACM to evaluate our results. Results generated show that our system yields superior performance.
Ontologisms have been applied to many applications in recent years, especially on Sematic Web, Information
Retrieval, Information Extraction, and Question and Answer. The purpose of domain-specific ontology
is to get rid of conceptual and terminological confusion. It accomplishes this by specifying a set of generic
concepts that characterizes the domain as well as their definitions and interrelationships. This paper will
describe some algorithms for identifying semantic relations and constructing an Information Technology
Ontology, while extracting the concepts and objects from different sources. The Ontology is constructed
based on three main resources: ACM, Wikipedia and unstructured files from ACM Digital Library. Our
algorithms are combined of Natural Language Processing and Machine Learning. We use Natural Language
Processing tools, such as OpenNLP, Stanford Lexical Dependency Parser in order to explore sentences.
We then extract these sentences based on English pattern in order to build training set. We use a
random sample among 245 categories of ACM to evaluate our results. Results generated show that our
system yields superior performance.
An implementation of apertium based assamese morphological analyzerijnlc
Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accuracy
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2. Lyric Mining: Word, Rhyme & Concept
Co-occurrence Analysis
Karthika Ranganathan, T.V Geetha, Ranjani Parthasarathi & Madhan Karky
Tamil Computing Lab (TaCoLa),
College of Engineering Guindy, Anna University, Chennai.
karthika.cyr@gmail.com, madhankarky@gmail.com
ABSTRACT
Computational creativity is one area of NLP which requires extensive analysis of large datasets.
Laalalaa [1] framework for Lyric analysis and generation proposed a lyric analysis subsystem that
required statistical analysis of Tamil lyrics. In this paper, we propose a data analysis model for words,
rhymes and their usage in Tamil lyrics. The proposed analysis model extracts the root words from
lyrics using a morphological analyzer [2] to compute the word frequency across the lyric dataset. The
words in their unanalyzed form are used for computing the frequent rhyme, alliteration and end-
rhyme pairs using adapted apriori algorithm. Frequent co-occurring concepts in lyrics are also
computed using Agaraadhi, an on-line Tamil dictionary. Presenting the results, this paper concludes
by discussing the need of such an analysis to compute freshness, pleasantness of a lyric and using
these statistics for Lyric Generation.
Keywords : Tamil Lyrics, Morphological Analyser, Apriori algorithm.
I. INTRODUCTION
Tamil is one of the world's oldest languages and has a Classical status. Numerous forms of literature
exist in Tamil language of which, lyrics play a vital role in taking the language to every house hold in
form of original film soundtracks, jingles, private albums, and commercials. With over thousands of
lyrics being created every year, we do not have proper tools to model and analyse lyrics. Such an
analysis framework would enable one to see various patterns of words, combinations and thoughts
used over time. The analysis framework will also make it possible to generate fresh lyrics where the
freshness can be associated with the concepts and thoughts associated with the lyric.
In this paper, we discuss about Tamil lyric Analysis on the basis of word usage, rhyme usage and the
co-occurrence of word. The frequency of word usage is identified by considering a morphological root
of the word using morphological analyser, instead of considering terms. For analysing the frequent
rhyme, alliteration and end-rhyme pairs, we adapted Apriori algorithm [4]. To identify the co-
occurring concepts in lyrics, we used “Agaraadhi”, an on-line Tamil dictionary Framework [3] and a
new algorithm has been proposed to compute the frequent usage of co-occurring concepts in lyrics.
The rest of this paper has been organized as follows. In Section 2, we explain about the algorithm and
tools. In Section 3, we explain the methodology and in Section 4, we discuss our results. Conclusions
and future extensions to this work are presented in section 5.
276
3. 2. MORPHOLOGICAL ANALYSIS AND APRIORI ALGORITHM
Morphological analysis is the process of segmenting words into morphemes and identifying its
grammatical categories. For a given word, morphological analyser (MA) generates its root word and
its grammatical information. The role of morphological analyser in the proposed work is to identify
the noun and verb morphology of a given word, examples are as follows
Example 1, for noun morphology:
இராமைன (Ramanai)
இராம (Raman) + ஐ (ai)
Entity + Accusative Case
Example 2, for verb morphology:
ெச றா (Senraan)
ெச (sel) + (R) + ஆ (Aan)
Verb + Past Tense Marker + Third Person Masculine Singular Suffix
In the proposed work, the frequency of a word is identified by considering the variations of a noun in
terms of its morphology. For instance, the variations of Ramanai such as Ramanaal, Ramanukku,
Ramanin, Ramanadhu are also counted for the word Raman.
The Apriori Algorithm is an influential algorithm for mining frequent item sets for Boolean
association rules [4]. Apriori uses a "bottom up" approach, where frequent subsets are extended one
item at a time (a step known as candidate generation), and groups of candidates are tested against the
data. The algorithm terminates when no further successful extensions are found. It has objective
measures: support and confidence. The support of an association pattern is the percentage of task-
relevant data transactions for the apparent pattern. Confidence can be defined as the measure of
certainty or trustworthiness associated with each discovered pattern.
3. LYRIC ANALYSIS
(i) Word Analysis
The frequency of words is used to associate a popularity score for each word. This score is proposed to
be used for lyric generation part of the frame work proposed in [1]. In this work, the popularity score
of a word has been identified from lyrics. In lyrics, the words are mainly attached with the suffix. So,
the root words are taken into consideration for determining its frequency count. The root words are
identified using morphological analyser. The algorithm to find the word usage is illustrated below:
Algorithm
Let LD is the set of lyric dataset and LS denotes the set of sentences of lyric dataset and LW denotes the
set of words in the lyric dataset. WC denotes the word count across all lyric dataset. Let m be the total
number of sentences in lyrics and n be the total number of words in lyrics.
277
4. a) Given a Lyric dataset LD
b) For each LSi ← 1 to m
Split the sentence LS into words LW
c) For each LWj ← 1 to n
RW ← ProcessMorphAnalyser(LWj )
d) Let RW be the root word
if RW exist, then add into the word count list ( WC)
else add into the word count list ( WC)
e)ReturnWC.
Here ProcessMorphAnalyser(LWj ) returns the root of the given word.
(ii) Rhyme Analysis
Alliteration (Monai) is the repetition of the same letter at the beginning of words. The rhyme
(Edhugai) is defined as the repetition of the same letter at the second position of words. The end
rhyme (iyaibu) is defined as the repetition of the same letter at the last position of words. The example
of alliteration, rhyme and end rhyme is given below:
Examples:
உய and உ rhyme in alliteration (monai) as they start with the same letter.
இதய and காத rhyme in rhyme (edhugai) as they share the same second letter.
யா ைக and வா ைக rhyme in end – rhyme (iyaibu) as they share the same last letter.
We have adapted apriori algorithm to find the frequency count of rhyme, alliteration and end rhyme
pairs of Tamil lyrics which has been illustrated below:
Algorithm:
Let LD be the set of lyric dataset and LS denote the set of sentences of Lyric dataset. Let m be the total
number of sentences in lyrics. Let PC1 denote the count of alliteration and PC2 denote the count of
rhyme and PC3 denote the count of end – rhyme.
a) Given lyric dataset LD .
b) For each LS ← 1 to m
Join the pair of sentences (LP)
c) For each LP
Consider the first (LP1) and last words (LP2)
d) Rhyme (LP1, LP2)
e) return PC
278
5. Algorithm : Rhyme (LP1, LP2)
a) Let k denote the ith character. Let ML denote the alliteration (monai) list and RL denote the rhyme
(edhugai) list and EL denote the end – rhyme (iyaibu) list.
b) For ∀i,
if k = 1, if LP1(k) = LP2(k), then add into ML list and increment PC1
if k = 2, if LP1(k) = LP2(k), then add into RL list and increment PC2
if k = i - 1, if LP1(k) = LP2(k), then add into EL list and increment PC3
(iii) Co-occurrence concept Analysis
Co-occurrence is defining the frequent occurrence of two terms from a text corpus on the either side in
a certain order. This word information in NLP system is extremely high. It is very important for
cancelling the ambiguous and the polysemy of words to improve the accuracy of the entire system [5].
In this method, to improve the efficiency of co-occurrence, we have been considering the concept of
each word. The concept for each word has been identified using the Agaraadhi, an on-line Tamil
dictionary. The example for concept word which has been in lyric is given below:
Example: The word "நில ”which has the concept ெவ ண லா, மதி, மாத ,
ைண ேகா , ெவ ணல ,அ லி, அ லிமா .
By considering these concepts, we have been determining the co-occurring words using our own
algorithm is described below:
Algorithm :
Let LD denote the set of Lyric dataset and W denote each word in lyric. Let CW denote the set of
concepts for each word. Let WC denote the word count.
a) If the word CW identify, then consider the next word.
Increment the count WC
b) Else the word W and consider the next word.
Increment the count WC
c) return WC
4. RESULTS
The lyric corpus of more than two thousand songs were analysed for the word usage, rhyme usage
and Co-occurence concepts usage. The analysed results are given below:
279
6. Table 1 shows the list of top 10 usage words in lyrics.
WORDS USAGE WORDS USAGE
ந 2009 வா 1062
எ 1941 ஒ 987
நா 1645 க 965
உ 1556 857
காத 1153 இ ைல 793
Table 2 shows the list of top 10 rhyme words in lyrics.
EDHUGAI USAGE MONAI USAGE IYAIBU USAGE
எ ,உ 107975 எ ,எ ைன 33492 எ ,உ 111552
நா ,எ 80125 உ ைன,உ 26289 நா ,எ 83435
நா ,உ 61204 எ த ,எ 16478 நா ,உ 63543
எ ,உ ைன 33731 எ ,எ ன 15405 எ ,உ த 18411
எ ,எ ைன 32570 உ த ,உ 14001 எ த ,எ 16478
உ ைன,உ 25747 உய ,உ 11640 உ த ,உ 14001
எ ைன,உ 24867 இ த,இ 10993 எ த ,உ 12524
நா ,உ ைன 19297 என ,எ 9985 நா ,உ த 10524
நா ,எ ைன 18935 எ த,எ 9976 எ த ,நா 9367
எ ,எ ன 14486 ந,ந 9962 இ த,அ த 4818
Table 3 shows the list of top 10 co-occurring concept words in lyrics.
CO-OCCURING WORDS USAGE
அ ேப,அ ேப 638
சி ன,சி ன 530
வா,வா 506
எ ,காத 478
ஒேர,ஒ 469
ந ,நா 434
உ ைன,நா 419
தமி ,எ க 367
ஒ ,நா 302
ந,எ ைன 287
280
7. By analysing those data, this shows that the most of lyrics which predicts the emotion of happiness
and love. In the adapted apriori algorithm, the support which represents the total number of pair
words with the total number of combination of sentences and the confidence which described the total
number of pair words with the total number of pair of sentences. The results may vary if the number
of lyrics used for the analysis is increased.
5. CONCLUSIONS AND FUTURE WORK
In this paper, we discussed the usage of words and rhymes in the lyrics dataset. The adapted apriori
algorithm has been used to detect the frequency count for rhyme, alliteration and end word pairs. This
analysis has been mainly used in the lyric generation and computing freshness scoring for lyrics.
Frequent co-occurring concept is also been identified for the development of lyric extraction, semantic
relationship, word sense identification and sentence similarity. Possible extensions of this work could
be the detection of emotions in lyrics by genre classification and identify genre specific rhymes and
concept co-occurrence.
REFERENCES
Sowmiya Dharmalingam, Madhan Karky, “LaaLaLaa – A Tamil Lyric Analysis and
Generation Framework” in World Classical Tamil Conference – June 2010, Coimbatore.
Anandan P, Ranjani Parthasarathy, Geetha, T.V. “Morphological analyzer for Tamil”. ICON
2002.
Agaraadhi Online Tamil Dictionary. http://www.agaraadhi.com, Last accessed date 25h April
2011.
HAN Feng, ZHANG Shu-mao, DU Ying-shuang, “The analysis and improvement of Apriori
algorithm”, Journal of Communication and Computer, ISSN1548-7709, USA, Sep. 2008,
Volume 5, No.9 (Serial No.46).
EI-Sayed Atlam, Elmarhomy Ghada, Masao Fuketa, Kazuhiro Morita and Jun-ichi Aoe, “New
Hierarchy Technique Using Co-Occurrence Word Information”, International Journal of
Information Processing and Management, Volume 40 Issue 6, November 2004.
281