Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve translation quality by applying morphological analysis on Myanmar Language.
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and is monosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language
Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech
(POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
Manipuri is both a minority and morphologically rich language with genetic features similar to Tibeto Burman languages. It has Subject-Object-Verb (SOV) order, agglutinative verb morphology and ismonosyllabic. Morphology and syntax are not clearly distinguished in this language. Natural Language Processing (NLP) is a useful research field of computer science that deals with processing of a large amount of natural language corpus. The NLP applications encompass E-Dictionary, Morphological
Analyzer, Reduplicated Multi-Word Expression (RMWE), Named Entity Recognition (NER), Part of Speech (POS) Tagging, Machine Translation (MT), Word Net, Word Sense Disambiguation (WSD) etc. In this paper, we present a study on the advancements in NLP applications for Manipuri language, at the same time presenting a comparison table of the approaches and techniques adopted and the results obtained of each of the applications followed by a detail discussion of each work.
A Marathi Hidden-Markov Model Based Speech Synthesis Systemiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Improving accuracy of part-of-speech (POS) tagging using hidden markov model ...IJECEIAES
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
The Paper presents a new method for translating a text document from Tamil to English. Our method is based on the Statistical Machine Translation Approach, combined with the Morphological Analysis, due to the fact that Tamil is a highly-inflected language. This paper presents a slight modification in SMT to make the approach more efficient and effective, and the experimental results have proven the method to be speed and accurate in the translation process.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
Myanmar language is a low-resource language and this is one of the main reasons why Myanmar Natural Language Processing lagged behind compared to other languages. Currently, there is no publicly available named entity corpus for Myanmar language. As part of this work, a very first manually annotated Named Entity tagged corpus for Myanmar language was developed and proposed to support the evaluation of named entity extraction. At present, our named entity corpus contains approximately 170,000 name entities and 60,000 sentences. This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition. Experimental results of the 10-fold cross validation revealed that syllable-based neural sequence models without additional feature engineering can give better results compared to baseline CRF model. This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
Machine Translation Approaches and Design AspectsIOSR Journals
Machine translation is a sub-field of computational linguistics that investigates the use of software to
translate text or speech from one natural language to another. On a basic level, MT performs simple
substitution of words in one natural language for words in another, but that alone usually cannot produce a
good translation of a text, because recognition of whole phrases and their closest counterparts in the target
language is needed. The paper focuses on Example Based Machine Translation (EBMT) system that translates
sentences from English to Hindi. Development of a machine translation (MT) system typically demands a large
volume of computational resources. For example, rule based MT systems require extraction of syntactic and
semantic knowledge in the form of rules, statistics-based MT systems require huge parallel corpus containing
sentences in the source languages and their translations in target language. Requirement of such computational
resources is much less in respect of EBMT. This makes development of EBMT systems for English to Hindi
translation feasible, where availability of large-scale computational resources is still scarce. Example based
machine translation relies on the database for its translation. The frequency of word occurrence is important for
translation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
Globalization and growth of Internet users truly demands for almost all internet based applications to
support
l
oca
l l
anguages. Support
of
l
oca
l
l
anguages can be
given in all internet based applications by
means of Machine Transliteration
and
Machine Translation
.
This paper provides the thorough survey on
machine transliteration models and machine learning
approaches
used for machine transliteration
over the
period
of more than two decades
for internationally used languages as well as Indian languages.
Survey
shows that linguistic approach provides better results for the closely related languages and probability
based statistical approaches are good when one of the
languages is phonetic and other is non
-
phonetic.
B
etter accuracy can be achieved only by using Hybrid and Combined models.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Natural Language Processing: State of The Art, Current Trends and Challengesantonellarose
Diksha Khurana1
, Aditya Koli1
, Kiran Khatter1,2 and Sukhdev Singh1,2
1Department of Computer Science and Engineering
Manav Rachna International University, Faridabad-121004, India
2Accendere Knowledge Management Services Pvt. Ltd., India
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
Improving accuracy of part-of-speech (POS) tagging using hidden markov model ...IJECEIAES
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
The Paper presents a new method for translating a text document from Tamil to English. Our method is based on the Statistical Machine Translation Approach, combined with the Morphological Analysis, due to the fact that Tamil is a highly-inflected language. This paper presents a slight modification in SMT to make the approach more efficient and effective, and the experimental results have proven the method to be speed and accurate in the translation process.
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
Myanmar language is a low-resource language and this is one of the main reasons why Myanmar Natural Language Processing lagged behind compared to other languages. Currently, there is no publicly available named entity corpus for Myanmar language. As part of this work, a very first manually annotated Named Entity tagged corpus for Myanmar language was developed and proposed to support the evaluation of named entity extraction. At present, our named entity corpus contains approximately 170,000 name entities and 60,000 sentences. This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition. Experimental results of the 10-fold cross validation revealed that syllable-based neural sequence models without additional feature engineering can give better results compared to baseline CRF model. This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
Machine Translation Approaches and Design AspectsIOSR Journals
Machine translation is a sub-field of computational linguistics that investigates the use of software to
translate text or speech from one natural language to another. On a basic level, MT performs simple
substitution of words in one natural language for words in another, but that alone usually cannot produce a
good translation of a text, because recognition of whole phrases and their closest counterparts in the target
language is needed. The paper focuses on Example Based Machine Translation (EBMT) system that translates
sentences from English to Hindi. Development of a machine translation (MT) system typically demands a large
volume of computational resources. For example, rule based MT systems require extraction of syntactic and
semantic knowledge in the form of rules, statistics-based MT systems require huge parallel corpus containing
sentences in the source languages and their translations in target language. Requirement of such computational
resources is much less in respect of EBMT. This makes development of EBMT systems for English to Hindi
translation feasible, where availability of large-scale computational resources is still scarce. Example based
machine translation relies on the database for its translation. The frequency of word occurrence is important for
translation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
Globalization and growth of Internet users truly demands for almost all internet based applications to
support
l
oca
l l
anguages. Support
of
l
oca
l
l
anguages can be
given in all internet based applications by
means of Machine Transliteration
and
Machine Translation
.
This paper provides the thorough survey on
machine transliteration models and machine learning
approaches
used for machine transliteration
over the
period
of more than two decades
for internationally used languages as well as Indian languages.
Survey
shows that linguistic approach provides better results for the closely related languages and probability
based statistical approaches are good when one of the
languages is phonetic and other is non
-
phonetic.
B
etter accuracy can be achieved only by using Hybrid and Combined models.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Natural Language Processing: State of The Art, Current Trends and Challengesantonellarose
Diksha Khurana1
, Aditya Koli1
, Kiran Khatter1,2 and Sukhdev Singh1,2
1Department of Computer Science and Engineering
Manav Rachna International University, Faridabad-121004, India
2Accendere Knowledge Management Services Pvt. Ltd., India
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
Knowledge Extraction and Linked Data: Playing with FramesValentina Presutti
To understand somebody who speaks to us or a text we are
reading, we identify the main entities, and how they relate to each other within relation schemas called frames. This means that we first recognise the occurrence of some frames and then we perform some contextualised reasoning over it, where the context is given by the recognised frames. Three main ingredients may enable machines to perform this process: knowledge extraction, knowledge representation and automated reasoning.
The Semantic Web and Linked Data paradigms provide a knowledge representation model enabling sophisticated automated reasoning. Nevertheless, the modelling trend in existing Linked Data ontologies is far from supporting a frame-based reasoning approach. In this talk, I will describe projects that support frame-driven knowledge extraction and representation, both from a design and an empirical perspective.
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ijnlc
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
Machine translation is being carried out by the researchers from quite a long time. However, it is still a
dream to materialize flawless Machine Translator and the small numbers of researchers has focussed at
translating Marathi Text to English. Perfect Machine Translation Systems have not yet been fully built
owing to the fact that languages differ syntactically as well as morphologically. Majority of the researchers
have opted for Statistical Machine translation whereas in this paper we have addressed the challenges of
Rule based Machine Translation. The paper describes the major divergences observed in language
Marathi and English and many challenges encountered while attempting to build machine translation
system form Marathi to English using rule based approach and rules to handle these challenges. As there
are exceptions to the rules and limit to the feasibility of maintaining knowledgebase, the practical machine
translation from Marathi to English is a complex task.
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
More than hundreds of millions of people of almost all levels of education and attitudes from different country communicate with each other for different purposes using various languages. Machine translation is highly demanding due to increasing the usage of web based Communication. One of the major problem of Bengali translation is identified a naming word from a sentence, which is relatively simple in English language, because such entities start with a capital letter. In Bangla we do not have concept of small or capital letters and there is huge no. of different naming entity available in Bangla. Thus we find difficulties in understanding whether a word is a naming word or not. Here we have introduced a new approach to identify naming word from a Bengali sentence for machine translation system without storing huge no. of naming entity in word dictionary. The goal is to make possible Bangla sentence conversion with minimal storing word in dictionary.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
Myanmar news summarization using different word representations IJECEIAES
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-ofwords cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex sentences.
Parsing of Myanmar Sentences With Function Taggingkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
Similar to Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification and Translation (Myanmar to English) (20)
"Heart failure is a typical clinical accompanied by symptoms syndrome (e.g. shortness of breath, ankle swelling and fatigue) that lead to structural or functional abnormalities of the heart (e.g. high venous pressure, pulmonary edema and peripheral edema).
In recent years, the significant role of B-type natriuretic peptide has been revealed in the pathogenesis of heart disease and the use of the drug sacubitril/valsartan has started. It has a positive effect on the regulation of the level of B-type natriuretic peptide in the body. It is obviously seen from the the world literature that natriuretic peptides play an important role in the pathophysiology of heart failure. For this reason, many studies suggest that the importance of natriuretic peptides in the diagnosis and treatment of heart failure is recommended.
Due to this, we tried to investigate the effects of a comprehensive medication therapy with a combination of sacubitril/valsartan in the patients with chronic heart failure."
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification and Translation (Myanmar to English)
1. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Applying Rule-Based Maximum Matching
Approach for Verb Phrase Identification and
Translation (Myanmar to English)
Soe, Thae Thae1
, Thida, Aye2
1
University of Computer Studies, Mandalay, Myanmar
2
University of Computer Studies, Mandalay, Myanmar
Abstract: Phrase Identification is one of the most critical and widely studied in Natural Language processing (NLP) tasks. Verb
Phrase Identification within a sentence is very useful for a variety of application on NLP. One of the core enabling technologies
required in NLP applications is a Morphological Analysis. This paper presents the Myanmar Verb Phrase Identification and
Translation Algorithm and develops a Markov Model with Morphological Analysis. The system is based on Rule-Based Maximum
Matching Approach. In Machine Translation, Large amount of information is needed to guide the translation process. Myanmar
Language is inflected language and there are very few creations and researches of Lexicon in Myanmar, comparing to other language
such as English, French and Czech etc. Therefore, this system is proposed Myanmar Verb Phrase identification and translation model
based on Syntactic Structure and Morphology of Myanmar Language by using Myanmar- English bilingual lexicon. Markov Model is
also used to reformulate the translation probability of Phrase pairs. Experiment results showed that proposed system can improve
translation quality by applying morphological analysis on Myanmar Language.
Keywords: Myanmar verb phrase identification and translation, morphological analysis, Rule-Based Maximum Matching
1. Introduction
Language plays an important role in human communication
because it is used as a channel not only for expressing
thoughts but also for exchanging information. In the age of
Information Technology, The Internet has become a primary
source for people to exchange their thoughts and information
.It is simply and convenience for all people around the word.
However, they have difficult to communicate among them
because of different their native languages. Some people are
familiar with two or more kind of Languages, spoken and
written languages but most are not. Due to these difficulties
and increased use of network, there is an increased need for
language translation to facilitate among people in
communication, publication and learning subjects. Attempts
of language translation are almost as old as computer
themselves. Machine Translation (MT) is the attempt to
automate all or part of the process of translation between
human languages and is one of the oldest large-scale
applications of computer science. Developing a system that
accurately produces a good translation between human
languages is the goal of MT system.
Human Language translation is a difficult task for natural
language because there has language ambiguity and it varies
according to their features and nature. Myanmar word
transformations are similar to other Asian Language
including Indian, Japanese, Thai and Chinese Language. The
problem of Machine Translation can be view as consisting of
three phrases (i) analysis of the source language to choose
appropriate target language lexical item (words or phrases) ,
(ii) reordering phrase where the chosen target language
lexical items are reordering to produce a meaningful target
language sentence and (iii) disambiguation of words senses
where the correct meaning of words is chosen for
translation.
The Myanmar-English MT system is developed by
composing two main modules which are identification and
translation. First, module, identify the Myanmar Verb Phrase
from input of Myanmar Sentence. And then, second module,
translate the Myanmar Verb Phrase into English Verb Phrase
using Myanmar English Bilingual Lexicon. Each step in
Machine Translation process is hard technical problem, to
which the best known solutions are either not adequate, or
good enough only in narrow application domains, falling
when applied to other domains. The proposed system is
concentrated on improving one of these two steps, namely
identification and translation, while having in mind that
some of the core techniques can be applied to other parts of a
Machine Translation (MT).
There are many research fields in Natural Language
Processing system and Machine Translation System. There is
no one who has developed complete Machine Translation
System for Myanmar to English language. Therefore, this
research aims to emphasize and develop the identification
and translation of Myanmar verb phrase which is a part of
Myanmar-English Machine Translation System.
Paper ID: 06091303 90
2. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
The rest of this paper is organized as follows: In section 2,
previous works in phrase identification for machine
translation is presented. Section 3, presents Nature of
Myanmar Language and Myanmar sentence structure. The
proposed system is presented in section 4. Section 5
presented types of Myanmar Verb Phrase and section 6
described morphological analysis for Myanmar Verb Phrase.
Finally section 7, 8, 9 and .10 discusses about the results and
error analysis of proposed system and conclusion.
2. Related Work
In this section, previous works in the structure of verb
Phrase identification and machine translation on different
language are reviewed. Various researchers have improved
the quality of machine translation system by using different
methods on different language. Wajid Ali et al proposed the
structure of Urdu verb phrases, and detail a series of
experiment to automatically tag them. A 100,000 words
Urdu corpus is manually tagged with VP chunk tags. The
corpus is then used to develop a hybrid approach using
HMM based statistical chunking and correction rules [11].
The technique is enhanced by changing chunking direction
and merging chunk and POS tags. . Kim, Changhyun et.al
described [5] Korean-Chinese machine translation system.
This system includes source language pattern part for
analysis and a target language pattern part for generation.
Basically used Pattern-based knowledge and translates
Korean Verb Phrase into Chinese Verb Phrase. M. Selvame
et al presented an improvement of Rule Based morphological
Analysis and POS Tagging in Tamil Language [8] via
Projection and Induction Techniques. Rule based approach is
applicable to the languages which have well defined set of
rules to accommodate most of the words with inflectional
and derivational morphology. Fridah Katushemererwr [1]
demonstrated the application of finite state approach in the
analysis of Runyakitara verb morphology. Language specific
knowledge and insight have been applied to classify and
describe the morphological structure of the language, and
quasi context-free and rewriting rules have been written to
account for grammatical verbs of Runyakitara.
In 2005 Goldwater and McClosky [2] used morphological
analysis of Czech to improve a Czech-English statistical
machine translation system. This system solve data sparse
problem caused by the highly inflected nature of Czech.
Their combine model achieved high BLEU score of
development and test set. Nguyen and Shimazu, [9]
proposed morphological transformational rules and Bayes’
formula based transformational model to translate English to
Vietnamese. The score of their system is better than baseline
score. Kamaijeetkaur Batra and GS Lehal, [6] presented
rule-based machine translation of Noun Phrase from Punjabi
to English. The system use transfer approach. The system
had analysis, translation and synthesis component. In 2004,
Koehn [4] suggested using features of lexical weighting. In
this year, the famous phrase-bassed decoder, Pharaoh, was
released to be a free SMT toolkit by Philipp Koehn and
further updated to Mosses by Koehn et al, 2007. In 2006,
Narayan Kumar Choudhary [10] presented about the
Developing a Computational Framework for the Verb
Morphology of Great Andamanese.
An ideal system for machine translation would take
advantage of both empirical data and linguistic analysis.
Different authors have different objectives that they attempt
to achieve high translation precision on many languages. Our
phrase identification and translation model aims to get
correct translation phrases with very limited bilingual
lexicon for Myanmar to English machine translation.
3. Nature of Myanmar Language
The Myanmar Language is the official language if Myanmar.
It is also the native language of the Myanmar and related
sub- ethnic groups of the Myanmar, as well as that of some
ethnic minorities in Myanmar like the Mon. Myanmar
Language is spoken by 32 million as a first language and as
a second language by 10 million, particularly ethnic
minorities in Myanmar and those in neighboring countries.
Myanmar Language is a tonal and pitch-register, largely
monosyllabic and analytic language, with a Subject Object
Verb (SOV) word order. The language uses the Myanmar
script, derived from the Old Mon Script and ultimately from
the Brahmi script.
The language is classified into two categories. One is formal,
used in literary works, official publications, radio broadcasts,
and formal speeches. The other is colloquial, used in daily
conversation and spoken. This is reflected in the Myanmar
words for “languge”: စာ refers to written, literary language,
and စကား refers to spoken language. Therefore, Myanmar
language can mean either written Myanmar language or
spoken Myanmar Language.
စာအုပ္္ စားပြဲ ေပၚမွာ ရွိိတယ္။ (spoken language)
စာအုပ္္သည္ စားပြဲ ေပၚတြင္ ရွိိသည္။ (formal language)
3.1 Myanmar Sentence Structure
There are two kinds of sentences according to the syntactic
structure of Myanmar language. They are simple sentence
and complex sentence. Figure1: shows the syntactic structure
of Myanmar language.
Figure 1: Syntactic Structure
Paper ID: 06091303 91
3. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
3.1.1 Simple Sentence
The simple sentences are declarative, negative, and
interrogative. It contains only one clause. There are two
basic phrases such as subject phrase and verb phrase in a
simple sentence.
For example:
သူ (Subject phrase) အိပ္ေနသည္ (Verb phrase)
However, a simple sentence can be constructed by only one
phrase. This phrase may be verb phrase or noun phrase.
For example:
စားသြားသည္ (Verb phrase)
Besides, a simple sentence can be constructed by two or
three phrases.
For example:
ရန္ကုန္တြင္(Place phrase) ေနသည္ (Verb phrase)
Myanmar phrases can be written in any order as long as the
verb phrase is at the end of the sentence.
For example:
ဦးဘသည္ မနၱေလးမွ ၿပန္လာသည္ (Subject, Place, Verb)
မနၱေလးမွ ဦးဘသည္ ၿပန္လာသည္ (Place, Subject, Verb)
A simple sentence can be extended by placing many other
phrases between subject phrase and verb phrase. All of the
following are simple sentences, because each contains only
one clause. It can be quite long.
For example:
ဦးဘသည္ မနၱေလးမွ ရန္ကုန္သို့ မီးရထားၿဖင့္ ၿပန္လာသည္။
(U Ba comes back from Mandalay to Yangon by
train.)
It is also constructed by adding noun phrases such as subject
phrase, object phrase, time phrase and verb phrase. These
added noun phrases are called emphatic phrases.
For example:
ပါေမာကၡ ဦးဘသည္ သား ေမာင္ေမာင္ႏွင့္အတူ အထက္ မႏၱေလးမွ
ၿမိဳ႕ေတာ္ ရန္ကုန္သို႔ အျမန္ မီးရထားျဖင့္ မေန႔ နံနက္က
ေခ်ာေခ်ာေမာေမာ ျပန္လာသည္။
Professor U Ba and his son Mg Mg came back safely from
upper Mandalay to capital Yangon by express train in
yesterday morning.
3.1.2 Complex Sentence
A complex sentence consists of two or more independent
clauses (or simple sentences) joined by postpositions,
particles or conjunctions. There are at least two verbs or
more than two verbs in a complex sentence. There are two
kinds of clause in a complex sentence called independent
clause(IC) and dependent clause (DC). DC is in front of IC.
A complex sentence contains one independent clause and at
least one dependent clause. DC is the same as IC but it must
contain a clause marker (CM) in the end. A clause maker
may be post positions, particles or conjunctions. There are
three dependent clauses depending on the clause marker.
(1)Noun DC (joined by postpositions such as မွာ၊က၊ကို)
မမ ေစ်းသို႔ သြားသည္ ကို ကၽြန္မ ျမင္သည္။
I see that Ma Ma goes to the market.
Noun DC : မမ ေစ်းသို႔ သြားသည္ ကို
IC : ကၽြန္မ ျမင္သည္။
(2)Adjective DC (joined by particles such as ေသာ ၊ သည္ ့၊
မည့္)
မမ ေပးေသာ စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
I read the book that is given by Ma Ma.
Adjective DC :မမ ေပးေသာ (စာအုပ္)
IC :စာအုပ္ ကို ကၽြန္မ ဖတ္သည္။
(3)Adverb DC (joined by conjunctions such as ေသာေၾကာင့့္ ၊
လ်က္ ၊ သျဖင့္)
မိုးရြာေန ေသာေၾကာင့္ ကၽြန္မေစ်းသို႔ မသြားပါ။
I do not go to the market because it is raining.
Adverb DC : မိုးရြာေန ေသာေၾကာင့္
IC:ကၽြန္မေစ်းသို႔မသြားပါ။
3.1.3 Negative Sentence
Generally the negative sentence is ending with “ပါ” and its
roots word has prefix “မ” such as “မ…… ပါ”. It also
depends on the tense type and modality. For example:
(i) သူသည္ (Subject Phrase) ေက်ာင္းသို့(Noun Phrase) မသြားပါ။
(Verb Phrase)
He doesn’ t go to school.
(ii) လွလွသည္ (Noun Phrase) ဒီေန႔ လာလိမ့္မည္ မဟုတ္ပါ။
(Verb Phrase)
Hla Hla will not come today?
(iii) စာအုပ္သည္ (Noun Phrase) မထူပါ။ (Verb Phrase)
This book is not thick.
Normally, negative meaning of verb is adding prefix “မ” in
front of the root verb word. But some verbs have non-linear
structure such as “work”. This positive meaning is
“အလုပ္လုပ္”, the negative meaning is “အလုပ္မလုပ္” . In this
case “မ” is placed within the root words.
Paper ID: 06091303 92
4. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
3.1.4 Interrogative Sentence
There are two types of questions, yes/no question. Yes/no
questions area mentioned in auxiliary verb. In wh-questions,
the WH feature identifies the class of Phrase which is
signaled by words such as who, what, when, where, why and
how (as in how many, how much, how careful). These words
fall in several different categories, who, whom, and what can
appear as pronouns and can be used to specify simple NPs,
what and which appear as determines in NPs, where and
when appear as prepositional phrases, how acts as an
adverbial modifier to adjective and adverbial phrases and
whose acts as a possessive pronoun. The wh-words also can
act in different roles such as relative clause. In Myanmar
Language question ending format is fixed. The suffix if the
yes/no question is “လား” and wh-question is “လဲ” “နည္း”.
For example:
(i) မင္းဘုရားပြဲကို (Subject phrase) သြားမလား။ (Verb
Phrase)
Will you go to Pagoda festival?
(ii)
မင္းစာေမးပြဲ (Subject Phrase) ေအာင္သလား။ (Verb Phrase)
Do you pass the exam?
(iii) ဤခရီး (Subject Phrase) နီးသလား။ (Verb Phrase)
Is this trip is near?
4. The Proposed System
In Natural Language Processing, some results have already
been obtained, however, a number of important research
problems have not been solved yet. This section explains the
details of Myanmar Verb Phrase identification and
translation process by using Rule-Based Maximum Matching
Approach. This process accepts the segmented Myanmar
words with parts of speech to the system (example
ေမာင္ေမာင္ / NPR /ေက်ာင္း /NCCS/ သို႔ / PODIR
/လ်င္ၿမန္စြာ/ ADVM /ခပ္သုပ္သုပ္ /ADVM/ သြား/ HV/ ေန /
PAVPC/ သည္/ POVP). The longest maximum matching
method scans the input text by sequentially reading each
word from the input text and match the predefined Myanmar
Grammar Rule. To identify the Myanmar Verb Phrase,
firstly extract the root verb in a given sentences and then,
consider the morphological analysis of prefix, suffix and
tense particle of the root verb. Finally, the system translates
the Myanmar Verb Phrase into English words by using
Myanmar-English Bilingual Lexicon. The system’s output is
လ်င္ၿမန္စြာ ခပ္သုတ္သုတ္သြားေနသည္။
Figure 2: Overview of Proposed System
5. Verb Phrase in Myanmar Language
Verb Phrase consists of some adverbial modifiers followed
by the head verb or root verb and its complements. Every
verb must appear in one of the five possible forms: base,
simple present, simple past tense, present participle and past
participle. The auxiliary and modal verbs usually take a verb
phrase as a complement, which produces a sequence of verbs
to form a tense system.
The root form of be and have and the modal auxiliary such
as present and past forms of do(did), can(coruld),
may(might), shall(should),will(would), must, need and dare
are the auxiliary verbs. In this case, “be” and “have” can be
either auxiliary or main verb. These two forms are separate
properties. The auxiliary be requires a present –participle
form or in the case of passive form (past-participle form) of
verb phrase to follow it, whereas the verb be requires a noun
phrase complement or preposition phrase or adjective phrase
or adverb phrase. The auxiliary have requires a noun phrase
complement. English sentences typically contain a sequence
of auxiliary verbs followed by a main verb. Auxiliary verbs
can be used in declarative sentence, negative sentences and
yes/no questions. The structure of Myanmar verb phrase is:
ဦးေဆာင္ၾကိယာ (Root Verb) + ၾကိယာဝိဘတ္ (Verb
Preposition) [3].
Example: သြား သည္။ (go)
Myanmar Verb Phrase can be divided into two types:
I. အေၿခခံၾကိယာပုဒ္ (Basic MyanmarVerb Phrase)
II. တိုးခ်ဲံ့ၾကိယာပုဒ္(Extended Myanmar Verb Phrase)
Paper ID: 06091303 93
5. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
5.1 Basic Myanmar Verb Phrase
The basic Verb Phrase consists of a Root Verb and Verb
Preposition. The Root Verb may be either Action or State or
Compound Verb.
Example: ေမာင္ေမာင္ေက်ာင္းသြားသည္။
အေၿခခံၾကိယာပုဒ္ ဦးေဆာင္ၾကိယာ+ၾကိယာ၀ိဘတ္
BV HV +POVP
သြား သည္။ (go)
5.2 Extended Myanmar Verb Phrase
Extended Verb Phrase is based on basic Verb Phrase and it
is extended with verb modifiers. There are four types of
extended Verb Phrase.
တိုးခ်ဲ့ၾကိယာပုဒ္ ၾကိယာအထူးၿပဳ+/- ေလးအနက္ၿ႔ပဳ+/-
အၿငင္း၀ိဘတ္+ ဦးေဆာင္ၾကိယာ+/- ၾကိယာေထာက္
တစ္ခု/တစ္ခုထက္ပုိ + ၾကိယာ၀ိဘတ္ တစ္ခု/တစ္ခုထက္ပို
Extended Myanmar Verb Phrase type (1)
In extended verb phrase type (1), one or more adverbs are
before the head verb and one or more verb prepositions are
after the head verb.
တုိးခ်ဲ့ၾကိယာပုဒ္ ၾကိယာအထူးၿပဳ + ဦးေဆာင္ၾကိယာ+
ၾကိယာ၀ိဘတ္
EVP ADV + HV + POVP
ေခ်ာေခ်ာေမာေမာ ေရာက္ သည္။(arrive at safely)
Extended Myanmar Verb Phrase type (2)
In extended verb phrase type (2), one or more verb particles
and verb propositions are after the head verb.
တုိးခ်ဲ့ၾကိယာပုဒ္ ဦးေဆာင္ၾကိယာ+ၾကိယာေထာက္ပစၥည္း+
ၾကိယာ၀ိဘတ္
EVP HV + PAVP+POVP
စားေနသည္။ ( is eating)
Extended Myanmar Verb Phrase type (3)
According to the extended verb phrase type (3), the negative
particle can be included before the head verb. If verb particle
is after the head verb, the negative particles may be between
the head verb and verb particle. Then, one or more verb
prepositions can be following.
တုိးခ်ဲ့ၾကိယာပုဒ္ အၿငင္း၀ိဘတ္+ ဦးေဆာင္ၾကိယာ+/-
ၾကိယာေထာက္ တစ္ခု/တစ္ခုထက္ပုိ + ၾကိယာ၀ိဘတ္
တစ္ခု/တစ္ခုထက္ပို
EVP PANEG +HV+ PAVPS +/-POVP
မ အိပ္ ခ်င္ေသး ဘူး။ (don’t want to sleep)
Extended Myanmar Verb Phrase type (4)
In extended verb phrase type (4), one or more verb modifiers
are before the head verb and one or more verb preposition
are after the head verb.
တိုးခ်ဲ့ၾကိယာပုဒ္ အေလးအနက္ၿပဳ+ ဦးေဆာင္ၾကိယာ+
ၾကိယာ၀ိဘတ္ တစ္ခု/တစ္ခုထက္ပို
EVP ADVM + HV+ POVP
အရမ္း ေကာင္းသည္။ (is very good)
6. Translation with Morphological Analysis
for Myanmar Verb Phrase
In Myanmar, Verb does not change its form based o the
gender of the subject/object rather it changes with respect to
tense, aspect, modality and number only. Including different
spelling, there are 38 inflected forms of the root verb in
Myanmar. Table 4.1 list the tense suffixes for these different
forms. As stated before, Myanmar Verb morphology has
some non-linear characteristics. Often, the root changes its
form when certain suffixes are added to it based on tenses
and on many occasion, it varies non-linear. For example, the
verb စား (eat) when followed by suffix “ေန သည္” (present
continuous), become “စားေနသည္” , whereas when followed
by the suffix “ခဲ့ သည္” (simple past tense) becomes “ စားခဲ့
သည္” , the suffixes “ၿပီးၿပီ” ( future tense ) becomes
စားၿပီးၿပီ . The negative meaning of prefix “မ” becomes
(does/do not work) “မစားးပါ”, suiffix “ၾက” (plural of subject)
becomes “စားၾကသည္” respectively. Similarly, the verb
“အလုပ္လုပ္” (work) when followed by suffix nay (present
continuous) becomes “အလုပ္လုပ္ေနသည္”. But, the
negative meaning of prefix ma becomes (does/do not work)
“အလုပ္မလုပ္ပါ”. Thus, the addition of the prefix “မ”
changes the root forms of “ အလုပ္လုပ”္ to “အလုပ္မလုပ္” ,
which is an indication of non linearity.
Myanmar verb can be divided into three main categories:
Individual Verb, Compound Verb and Adjective Verb. For
example: individual verb: စားသည္ ‘eat’; compound verb:
ေျပးဖက္သည္ ‘run and hug’; Adjective Verb: ေပ်ာ္သည္ pw-ti
‘is happy’. Some verbs can be used to support other verbs.
For example: ေျပာသည္ ‘tell’ and ေပးသည္ ‘give’ are
individual verbs and can be used as main verbs in sentences.
But in this verb ေျပာေပးသည္ ‘tell’, ေပး ‘give’ is not the main
Verb Phrase
Basic Verb Phrase Extended Verb
Phrase
Extended
verb Phrase
type (1)
Extended
verb Phrase
type (2)
Extended
verb Phrase
type (3)
Extended
verb Phrase
type (4)…
Paper ID: 06091303 94
6. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
verb. It behaves particle to support the main verb ေျပာ ‘tell’.
More than two individual English verbs can include in
Myanmar compound verb. For example: three individual
verbs: ၾကြေရာက္ ‘come’, အားေပး ‘encourage’, ခ်ီးျမွင္ ‘award’
include in compound verb : ၾကြေရာက္အားေပးခ်ီးျမွင့္သည္ ‘come
and encourage and award’. ‘ၾကြေရာက္အားေပးခ်ီးျမွင့္သည္’ is
Myanmar Compound verb. It has three English individual
verbs “come, encourage and award”. Verb particle ၾက t can
be omitted in the sentence. For example: ေက်ာင္းသားမ်ား
ကစားေန ၾကသည္။ ‘Students are playing.’ And ေက်ာင္းသားမ်ား
ကစားေန သည္။ ‘Students are playing’. Compound Verbs pose
special problems to the robustness of a translation method,
because the word itself must be represented in the training
data: the occurrence of each of the components is not
enough.
7. Markov Model
Markov Model has been widely used in several of Natural
Language Processing tasks. (such as POS tagging, Spell
Checking, Machine Translation, Automatic Text
Summarization, Information Retrieval (IR), Automatic Text
Extraction and so on. This system developed a Markov
Model to identify Myanmar Verb Phrases based on
predefined Myanmar Grammar Rule-Based Maximum
Matching Approach of totally 200 rules. This model
constructed both Simple and complex sentences of nearly
2000 sentences.
Figure 3: Markov Model for MVP
PANG={မ}
V= {သြား, စား ,အိပ္ ,ခ်က္ၿပဳတ္, ကန္ ္, ခြဲ,..}
C= { ၍,နွင့္..}
POREP= {တြင္, မွာ , နဲ႔,…}
ADV= {လ်င္ၿမန္စြာ, ခင္ခင္မင္မင္,အေၿပးအလြား,… }
PAVP= {ခဲ့ ,လိမ့္, ေန,…}
POVP= {သည္ ,မည္ ,ဘူးလား ,ဘူး ,ပါ ,ဧ။္,..}
8. Algorithm for Myanmar Verb Phrase
Identification
Input: A= {word1, word2… wordn}// Set Segmented words with
Part of Speech (Myanmar Sentence.)
Output: Myanmar Verb Phrase into English Proper English Verb
Phrase //Translate Verb Phrase using Myanmar- English Bilingual
Lexicon.
Begin
Steps:
1. Read input sentence A.
2. Set i =0;
3. Input [i] =A. next token ();//Read input sentence A and
tokenized the words by “/” and set to array [i].
4. For(s=0; s<=i: s++)
4.1 Find VAC, VST or VCP from Input [i].// where VAC is
Act
On verb, VST is State Verb and VCP is Compound Verb.
4.2 If (input[i] = = “VAC” ||input [i] ==“VST” ||input[i] =
=“VCP”) then
k=s; // set k to s
EndIf
End//for
if (input[k+2] == “ POVP” || input[k+4] = =“POVP” )then
{Input[k] = “HV”;
Identify myanmar verb phrase;
}
EndIf
ENDIF
5. DISPLAY MYANMAR VERB PHRASE
END.
9. Experimental Results
The proposed system, there are nearly 2000 training
sentences and 1500 testing sentences. Myanmar 3 font is
used for Myanmar Language. The sentences consist of 5 to
35 words. We divided sentences into simple sentences and
complex sentences. The simple sentences are declarative,
negative and interrogative. Three types of complex sentences
are joined with particles, adjective and adverb respectively.
The accuracy of verb phrase identification is calculated by
using well-known measure precision; recall and F-measure
in equation (1), (2) and (3).This system ignore the words
order. We have a little limitation in some simple and
complex sentences.
POREP
C
PANG
PAVP
POVP
ADV
V
Paper ID: 06091303 95
7. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
Table 1: Evaluation Results for Verb Phrase
Identification
0
0.2
0.4
0.6
0.8
1
1.2
% for Simple and complex
sentences
Simple sentence
Complex sentence
Table 2: Evaluation results for Verb Phrase identification
Type of
sentences
No of
Sentences
Precisio
n
Recal
l
F-
measured
Simple
Sentences
1000 0.97 0.83 0.89
Complex
Sentences
500 0.94 0.66 0.77
9.1 Error Analysis
Errors in proposed system are as follow. Compound verb has
two meaning. သြားခဲ့ and စားခဲ့သည္ and meaning of
သြားစားခဲ့သည္ဲ့ is (went and ate). Although our system can
translate it as သြားသည္: go) and (စားခဲ့သည္ ate), we have
difficulty to translate (သြားစားခဲ့သည္) : went and ate) to get
correct translation. Some verb support to previous verb:
ေၿပာေပးသည္ give), correct translation is “talk”. Beside then
in the negative inflection of verb has more error because
negative particle of Myanmar “မ” can take as prefix or
middle of stem verb such as (“မေၿပာဘူး”:not tell) and
(“”နားမေထာင္ဘူး:not listen). In the latter case
(“နားမေထာင္ဘူး”) is analyzed as “နား” and “မေထာင္ဘူး”
which as (ear and not stand).
In adjective, we have same error like negative verb inflection
like (ရိုေသ respectful) of negative form as (“မရိုေသ”: not
respectful) or (“မရိုမေသ”: not respectful). Although the word
of “မရုိေသ” is not problem in analyzer, the word “မရိုမေသ”
has error occurs.
10. Conclusion
In Natural Language Processing, Phrase identification is one
of the most critical and widely used as research area. Verb
Phrase identification within a sentence is very useful for a
variety of application in Natural Language Processing
(NLP). In this paper, Myanmar Verb Phrase identification
Algorithm is proposed by developing a Markov Model to
show statistical results. In experimental result, a proposed
algorithm shows the efficient results with precision, recall
and F-measure in simple sentences and complex sentences.
As a future work, after identifying the Myanmar Verb
Phrase, translates to English Verb Phrase by using
Myanmar-English Bilingual Lexicon. The design and
algorithm of the Myanmar Verb Phrase identification and
translation system developed in this research can be
extended in further research directions in the fields of NLP
and IR such as text categorization, document summarization,
question answering, query processing and document ranking
in search engine development etc.
References
[1] Fridah Katushemererwr et al, “Finite State Methods in
Morphological Analysis of Runyakitara Verbs” Nordic
Journal of African Studies.
[2] Goldwater Sharon and McClosky David, 2005,
Improving Statistical MT through Morphological
Analysis. Proceedings of Human Language Technology
Conference and Conference on Empirical Methods in
Natural Language Processing, pages 676-683,
Vancouver , October
[3] J.Okeli, a.Allot, “Burmesse/Myanmar dictionary of
grammatical forms”.
[4] Koehn , P.F.J. Och et al, “ Statistical Phrase-Based
Translation”, Processing of the 2007 Joint Conference
on Empirical Methods in Natural Language Processing
and Computational Natural Language Learning, PP868-
876, Pragun.
[5] Kim, Changhyun et al, “Verb Pattern Based Korean-
Chinese Machine Translation System”
[6] KamaijeeKaur Batra and GS Lehal, “Rule-Based
Machine Translation of Noun Phrase from Punjabi to
English”, International Journal of Computer Sciences
Issue, 2010
[7] Md. Musfique Anwaar, Mohammad Aabed Anwar et al
“Syntax Analysis and Machine Translation of Bangla
Sentences”, Dept. of Computer Science & Engineering,
Jahangirnagar University, Bangladesh
[8] M. Selvam et al “ Improvement of Rule-Based
Morphological Analysis and POS Tagging in Tamil
Language via Projection and Induction Techniques."
INTERNATIONAL JOURNAL OF COMPUTERS,
Issue 4, Volume 3, 2009
[9] Nguyen et al, “Improving Phrase-Based SMT with
Morpho-Syntactic Analysis and Transformation.
Proceeding of the conference on Empirical Method in
Natural Language Processing and Very Large Corpora,
University of Maryland, College Park, MD, pp 20-28
[10] N. K. Choudhary, “Developing a Computational
Framework for the Verb Morphology of Great
Andamanese”, Centre for Linguistics in India, JNU,
2006.
[11] Wajid Ali et al, “A hybrid approach to Urdu Verb
Phrase Chunking.”, Department of the Myanmar
Language Commission, Ministry of Education, Union
of Myanmar 2005
Author Profile
Thae Thae Soe received the B.C.Sc. and
M.C.Sc. degrees in University of Computer
Studies, Mandalay, Myanmar in 2004 and 2008,
respectively. I am also an assistance lecturer and
Paper ID: 06091303 96
8. International Journal of Science and Research (IJSR), India Online ISSN: 2319-7064
Volume 2 Issue 9, September 2013
www.ijsr.net
a Ph.D candidate of University of Computer Studies,
Mandalay. My research field is Natural Language Processing
(NLP). I am very interested in NLP.
Paper ID: 06091303 97