This paper describes a context free grammar (CFG) based grammatical relations for Myanmar
sentences which combine corpus-based function tagging system. Part of the challenge of
statistical function tagging for Myanmar sentences comes from the fact that Myanmar has freephrase-order
and a complex morphological system. Function tagging is a pre-processing step to
show grammatical relations of Myanmar sentences. In the task of function tagging, which tags
the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the possible function
tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of
the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex sentences.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
Text analysis has been attracting increasing attention in this data era. Selecting effective features from datasets is a particular important part in text classification studies. Feature selection excludes irrelevant features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
A legal text usually long and complicated, it has some characteristic that make it different from other dailyuse
texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a legal sentence based on its logical structure and presents selecting appropriate
translation rules to improve phrase reordering of legal translation. We use a method which divides a English legal sentence based on its logical structure to segment the legal sentence. New features with rich linguistic and contextual information of split sentences are proposed to rule selection. We apply maximum entropy to combine rich linguistic and contextual information and integrate the features of split sentences into the legal translation, tree-based SMT system. We obtain improvements in performance for legal translation from English to Japanese over Moses and Moses-chart systems.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex sentences.
A survey on phrase structure learning methods for text classificationijnlc
Text classification is a task of automatic classification of text into one of the predefined categories. The
problem of text classification has been widely studied in different communities like natural language
processing, data mining and information retrieval. Text classification is an important constituent in many
information management tasks like topic identification, spam filtering, email routing, language
identification, genre classification, readability assessment etc. The performance of text classification
improves notably when phrase patterns are used. The use of phrase patterns helps in capturing non-local
behaviours and thus helps in the improvement of text classification task. Phrase structure extraction is the
first step to continue with the phrase pattern identification. In this survey, detailed study of phrase structure
learning methods have been carried out. This will enable future work in several NLP tasks, which uses
syntactic information from phrase structure like grammar checkers, question answering, information
extraction, machine translation, text classification. The paper also provides different levels of classification
and detailed comparison of the phrase structure learning methods.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSkevig
This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
A COMPARATIVE STUDY OF FEATURE SELECTION METHODSijnlc
Text analysis has been attracting increasing attention in this data era. Selecting effective features from datasets is a particular important part in text classification studies. Feature selection excludes irrelevant features from the classification task, reduces the dimensionality of a dataset, and improves the accuracy and performance of identification. So far, so many feature selection methods have been proposed, however,
it remains unclear which method is the most effective in practice. This article focuses on evaluating and comparing the available feature selection methods in general versatility regarding authorship attribution problems and tries to identify which method is the most effective. The discussions on general versatility of feature selection methods and its connection in selecting the appropriate features for varying data were
done. In addition, different languages, different types of features, different systems for calculating the accuracy of SVM (support vector machine), and different criteria for determining the rank of feature selection methods were used to measure the general versatility of these methods together. The analysis
results indicate the best feature selection method is different for each dataset; however, some methods can always extract useful information to discriminate the classes. The chi-square was proved to be a better method overall.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
A legal text usually long and complicated, it has some characteristic that make it different from other dailyuse
texts.Then, translating a legal text is generally considered to be difficult. This paper introduces an approach to split a legal sentence based on its logical structure and presents selecting appropriate
translation rules to improve phrase reordering of legal translation. We use a method which divides a English legal sentence based on its logical structure to segment the legal sentence. New features with rich linguistic and contextual information of split sentences are proposed to rule selection. We apply maximum entropy to combine rich linguistic and contextual information and integrate the features of split sentences into the legal translation, tree-based SMT system. We obtain improvements in performance for legal translation from English to Japanese over Moses and Moses-chart systems.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
A survey on sentence fusion techniques of abstractive text summarizationIJERA Editor
Sentence fusion is one of the tasks of automatic text summarization that has wide spread applications in the field of computer science. It is currently used in automatic text summarization and question answering systems. It is also used in natural language generation systems in the name of sentence aggregation. The task of sentence fusion poses various challenges like deciding whether two sentences can be combined or not, which part of a sentence should be selected for combination, how these parts can be combined and how to fit the combinations in a grammatically correct sentence structure. In this paper we discuss about the research done on sentence fusion and various challenges that are yet to be met.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A comparative study on term weighting methods for automated telugu text categ...IJDKP
Automatic Text categorization refers to the process of assigning a category or some categories
automatically among predefined ones. Text categorization is challenging in Indian languages has rich in
morphology, a large number of word forms and large feature spaces. This paper investigates the
performance of different classification approaches using different term weighting approaches in order to
decide the most applicable one to Telugu text classification problem. We have investigated on different
term weighting methods for Telugu corpus in combination with Naive Bayes ( NB), Support Vector
Machine (SVM) and k Nearest Neighbor (kNN) classifiers.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
The amount of data present online is growing very rapidly, hence a need for organizing and categorizing
data has become an obvious need. The Information Retrieval (IR) techniques act as an aid in assisting
users in obtaining relevant information. IR in the Indian context is very relevant as there are several blogs,
news publications in Indian languages present online. This work looks at the suitability of Naïve Bayesian
methods for paragraph level text classification in the Kannada language. The Naïve Bayesian methods are
the most primitive algorithms for Text Categorization tasks. We apply dimensionality reduction technique
using Minimum term frequency, stop word identification and elimination methods for achieving the task. It
is evident that Naïve Bayesian Multinomial model outperforms simple Naïve Bayesian approach in
paragraph classification tasks.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
Considerable research in the field of ontology matching has been performed where information sharing
and reuse becomes necessary in ontology development. Measurement of lexical similarity in ontology
matching is performed using synset, defined in WordNet. In this paper, we defined a Super Word Set,
which is an aggregate set that includes hypernym, hyponym, holonym, and meronym sets in WordNet.
The Super Word Set Similarity is calculated by the rate of words of concept name and synset’s words
inclusion in the Super Word Set. In order to measure of Super Word Set Similarity, we first extracted
Matched Concepts(MC), Matched Properties(MP) and Property Unmatched Concepts(PUC) from the
result of ontology matching. We compared these against two ontology matching tools – COMA++ and
LOM. The Super Word Set Similarity shows an average improvement of 12% over COMA++ and 19%
over LOM.
Improving accuracy of part-of-speech (POS) tagging using hidden markov model ...IJECEIAES
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
Parsing of Myanmar Sentences With Function Taggingkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEijnlc
Named Entity Recognition (NER) for Myanmar Language is essential to Myanmar natural language processing research work. In this work, NER for Myanmar language is treated as a sequence tagging problem and the effectiveness of deep neural networks on NER for Myanmar language has been investigated. Experiments are performed by applying deep neural network architectures on syllable level Myanmar contexts. Very first manually annotated NER corpus for Myanmar language is also constructed and proposed. In developing our in-house NER corpus, sentences from online news website and also sentences supported from ALT-Parallel-Corpus are also used. This ALT corpus is one part of the Asian Language Treebank (ALT) project under ASEAN IVO. This paper contributes the first evaluation of neural network models on NER task for Myanmar language. The experimental results show that those neural sequence models can produce promising results compared to the baseline CRF model. Among those neural architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing as well as to promote further researches on this understudied language.
A survey on sentence fusion techniques of abstractive text summarizationIJERA Editor
Sentence fusion is one of the tasks of automatic text summarization that has wide spread applications in the field of computer science. It is currently used in automatic text summarization and question answering systems. It is also used in natural language generation systems in the name of sentence aggregation. The task of sentence fusion poses various challenges like deciding whether two sentences can be combined or not, which part of a sentence should be selected for combination, how these parts can be combined and how to fit the combinations in a grammatically correct sentence structure. In this paper we discuss about the research done on sentence fusion and various challenges that are yet to be met.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A comparative study on term weighting methods for automated telugu text categ...IJDKP
Automatic Text categorization refers to the process of assigning a category or some categories
automatically among predefined ones. Text categorization is challenging in Indian languages has rich in
morphology, a large number of word forms and large feature spaces. This paper investigates the
performance of different classification approaches using different term weighting approaches in order to
decide the most applicable one to Telugu text classification problem. We have investigated on different
term weighting methods for Telugu corpus in combination with Naive Bayes ( NB), Support Vector
Machine (SVM) and k Nearest Neighbor (kNN) classifiers.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
The amount of data present online is growing very rapidly, hence a need for organizing and categorizing
data has become an obvious need. The Information Retrieval (IR) techniques act as an aid in assisting
users in obtaining relevant information. IR in the Indian context is very relevant as there are several blogs,
news publications in Indian languages present online. This work looks at the suitability of Naïve Bayesian
methods for paragraph level text classification in the Kannada language. The Naïve Bayesian methods are
the most primitive algorithms for Text Categorization tasks. We apply dimensionality reduction technique
using Minimum term frequency, stop word identification and elimination methods for achieving the task. It
is evident that Naïve Bayesian Multinomial model outperforms simple Naïve Bayesian approach in
paragraph classification tasks.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...Lifeng (Aaron) Han
Authors: Aaron Li-Feng Han, Derek Wong, Lidia S. Chao, Yervant Ho, Yi Lu, Anson Xing, Samuel Zeng
Proceedings of the 14th biennial International Conference of Machine Translation Summit (MT Summit 2013) pp. 215-222. Nice, France. 2 - 6 September 2013. Open tool https://github.com/aaronlifenghan/aaron-project-hlepor (Machine Translation Archive)
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Ontology Matching Based on hypernym, hyponym, holonym, and meronym Sets in Wo...dannyijwest
Considerable research in the field of ontology matching has been performed where information sharing
and reuse becomes necessary in ontology development. Measurement of lexical similarity in ontology
matching is performed using synset, defined in WordNet. In this paper, we defined a Super Word Set,
which is an aggregate set that includes hypernym, hyponym, holonym, and meronym sets in WordNet.
The Super Word Set Similarity is calculated by the rate of words of concept name and synset’s words
inclusion in the Super Word Set. In order to measure of Super Word Set Similarity, we first extracted
Matched Concepts(MC), Matched Properties(MP) and Property Unmatched Concepts(PUC) from the
result of ontology matching. We compared these against two ontology matching tools – COMA++ and
LOM. The Super Word Set Similarity shows an average improvement of 12% over COMA++ and 19%
over LOM.
Improving accuracy of part-of-speech (POS) tagging using hidden markov model ...IJECEIAES
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences
Parsing of Myanmar Sentences With Function Taggingkevig
This paper describes the use of Naive Bayes to address the task of assigning function tags and context free
grammar (CFG) to parse Myanmar sentences. Part of the challenge of statistical function tagging for
Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex
morphological system. Function tagging is a pre-processing step for parsing. In the task of function
tagging, we use the functional annotated corpus and tag Myanmar sentences with correct segmentation,
POS (part-of-speech) tagging and chunking information. We propose Myanmar grammar rules and apply
context free grammar (CFG) to find out the parse tree of function tagged Myanmar sentences. Experiments
show that our analysis achieves a good result with parsing of simple sentences and three types of complex
sentences.
International Journal of Engineering and Science Invention (IJESI) inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ijnlc
Neural machine translation is a new approach to machine translation that has shown the effective results
for high-resource languages. Recently, the attention-based neural machine translation with the large scale
parallel corpus plays an important role to achieve high performance for translation results. In this
research, a parallel corpus for Myanmar-English language pair is prepared and attention-based neural
machine translation models are introduced based on word to word level, character to word level, and
syllable to word level. We do the experiments of the proposed model to translate the long sentences and to
address morphological problems. To decrease the low resource problem, source side monolingual data are
also used. So, this work investigates to improve Myanmar to English neural machine translation system.
The experimental results show that syllable to word level neural mahine translation model obtains an
improvement over the baseline systems.
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.
Source side pre-ordering using recurrent neural networks for English-Myanmar ...IJECEIAES
Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target English’s word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus.
Myanmar news summarization using different word representations IJECEIAES
There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-ofwords cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGijnlc
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
Derric A. Alkis C
Abstract:
Delivering the customer to a high degree of confidence and the seller for more information about the products and the desire of customers through the use of modern technology and Machine Learning through comments left on the product to see and evaluate the comments added later and thus evaluate the product, whether good or bad.
Author Credits - Maaz Anwar Nomani
Semantic Role Labeler (SRL) is a semantic parser which can automatically identify and then classify arguments of a verb in a natural language sentence for Hindi and Urdu. For e.g. in the natural language sentence “Sara won the competition because of her hard work.”, ‘won’ is the main verb and there are 3 arguments for this verb; ‘Sara’ (Agent), ‘hard work’ (Reason) and ‘competition’ (Theme). The problem statement of a SRL revolves around the fact that how will you make a machine identify and then classify the arguments of a verb in a natural language sentence.
Since there are 2 sub problem statements here (Identification and Classification), our SRL has a pipeline architecture in which a binary classifier (Logistic Regression) is first trained to identify whether a word is an argument to a verb in a sentence or not (Yes or No) and subsequently a multi-class classifier (SVM with Linear kernel) is trained to classify the identified arguments by above binary classifier into one of the 20 classes. These 20 classes are the various notions present in a natural language sentence (for e.g. Agent, Theme, Location, Time, Purpose, Reason, Cause etc.). These ‘notions’ are called Propbank labels or semantic labels present in a Proposition Bank which is a collection of hand-annotated sentences.
In essence, SRL felicitates Semantic Parsing which essentially is the research investigation of identifying WHO did WHAT to WHOM, WHERE, HOW, WHY and WHEN etc. in a natural language sentence.
SYLLABLE-BASED NEURAL NAMED ENTITY RECOGNITION FOR MYANMAR LANGUAGEkevig
This paper contributes the first evaluation of neural
network models on NER task for Myanmar language. The experimental results show that those neural
sequence models can produce promising results compared to the baseline CRF model. Among those neural
architectures, bidirectional LSTM network added CRF layer above gives the highest F-score value. This
work also aims to discover the effectiveness of neural network approaches to Myanmar textual processing
as well as to promote further researches on this understudied language.
A COMPUTATIONAL APPROACH FOR ANALYZING INTER-SENTENTIAL ANAPHORIC PRONOUNS IN...ijnlc
This paper presents a strategy and a computational model for solving inter-sentential anaphoric pronouns
in Vietnamese paragraphs composing simple sentences. The strategy is proposed based on grammatical
features of nouns and the focus phenomenon when using pronouns in Vietnamese. In this research, we
consider only nouns and pronouns which are human objects in the paragraph, and each anaphoric
pronoun will appear one time in one sentence and can appear in adjacent sentences. The computational
model is implemented in Prolog and based on applying and improving the models of Mark Johnson and
Ewan Klein, had been improved by Covington and Schmitz, with theoretical background of Discourse
Representation Theory.Analysis of test results shows that this approach which based on linguistic theories
helps for well solving inter-sentential anaphoric pronouns in Vietnamese paragraphs.
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
ABSTRACT
Natural Language processing is an interdisciplinary branch of linguistic and computer science studied under the Artificial Intelligence (AI) that gave birth to an allied area called
‘Computational Linguistic’ which focuses on processing of natural languages on computational devices. A natural language consists of a large number of sentences which are linguistic units involving one or more words linked together in accordance with a set of predefined rules called grammar. Grammar checking is the task of validating sentences syntactically and is a prominent tool within language engineering. Our review draws on the recent development of various grammar checkers to look at past, present and the future in a new light. Our review covers grammar checkers of many languages with the aim of seeking their approaches, methodologies for developing new tool and system as a whole. The survey concludes with the discussion of various features included in existing grammar checkers of foreign languages as well as a few Indian Languages.
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
2. 212 Computer Science & Information Technology (CS & IT)
location to each word in the text document. In case of function tagging, we use Naive Bayesian
theory and the functional annotated tagged corpus. Grammatical relations are the process of
analyzing an input sequence in order to determine its grammatical structure with respect to a
given grammar. The goal of the second one is to produce the relations of the grammatical
structures of the sentences in Myanmar text as a parse tree.
Myanmar is SOV language. It is also a variable phrase order language. The free phrase order
feature of Myanmar makes statistical function tagging a challenging task. Function tagging is a
part of the Myanmar to English machine translation project. If high quality translation is to be
achieved, language understanding is a necessity. One problem in Myanmar language processing
is the lack of grammatical regularity in the language. This leads to very complex Myanmar
grammar in order to obtain satisfactory results, which in term increases the complexity in the
grammatical relation process, it is desired that simple grammar is to be used.
In our approach, we take the chunk level phrase with the combination of POS tag and its category
which is the output of a fully described morphological analyzer [3][4], which is very important
for agglutinative languages like Myanmar. A small corpus annotated manually serves as training
data because the large scale Myanmar Corpus is unavailable at present. Since the large-scale
annotated corpora, such as Penn Treebank, have been built in English, statistical knowledge
extracted from them has been shown to be more and more crucial for natural language
disambiguation [5]. As a distinctive language, Myanmar has many characteristics different from
English. The use of statistical information efficiently in Myanmar language is still a virgin land
waiting to explore.
Naïve Bayesian is chosen for its simplicity and user-friendliness. Naive-Bayesian classifier make
strong assumptions about how the data is generated, and use a probabilistic model that reflects
these assumptions [6]. They use a collection of labelled training examples to estimate the
parameters of the generative model. Classification of new examples is performed with Bayes’ rule
by selecting the class that is most likely to have generated the example.
2. LITERATURE SURVEY
Blaheta and Johnson [7] addressed the task of function tags assignment. They used a statistical
algorithm based on a set of features grouped in trees, rather than chains. The advantage was that
features can better contribute to overall performance for cases when several features are sparse.
When such features are conditioned in a chain model the sparseness of a feature can have a
dilution effect of an ulterior (conditioned) one.
Mihai Lintean and Vasile Rus[8] described the use of two machine learning techniques, naive
Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic
parse tree. They used a set of features inspired from Blaheta and Johnson [7]. The set of classes
they used in their model corresponds to the set of functional tags in Penn Treebank. To generate
the training data, they have considered only nodes with functional tags, ignoring nodes unlabeled
with such tags. They trained the classifiers on sections 1-21 from Wall Street Journal (WSJ) part
of Penn Treebank and used section 23 to evaluate the generated classifiers.
Yong-uk Park and Hyuk-chul Kwon [9] tried to disambiguate for syntactic analysis system by
many dependency rules and segmentation. Segmentation is made during parsing. If two adjacent
morphemes have no syntactic relations, their syntactic analyzer makes new segment between
these two morphemes, and find out all possible partial parse trees of that segmentation and
combine them into complete parse trees. Also they used adjacent-rule and adverb
subcategorization to disambiguate of syntactic analysis. Their syntactic analyzer system used
morphemes for the basic unit of parsing. They made all possible partial parse trees on each
segmentation process, and tried to combine them into complete parse trees.
3. Computer Science & Information Technology (CS & IT) 213
Mark-Jan Nederhof and Giorgio Satta[10] considered the problem of parsing non-recursive
context-free grammars, i.e., context-free grammars that generate finite languages and presented
two tabular algorithms for these grammars. They presented their parsing algorithm, based on the
CYK (Cocke–Younger–Kasami) algorithm and Earley’s alogrithm. As parsing CFG (context-
free grammar), they have taken a small hand-written grammar of about 100 rules. They have
ordered the input grammars by size, according to the number of nonterminals (or the number of
nodes in the forest, following the terminology by Langkilde (2000)).
Kyongho Min and William H. Wilson [11] discussed the robustness of four efficient syntactic
error-correcting parsing algorithms that are based on chart parsing with a context-free grammar.
They implemented four versions of a bottom-up error-correcting chart parser: a basic bottom-up
chart parser, and chart parsers employing selectivity, top-down filtering, and a combination of
selectivity and a top-down filtering. They detected and corrected syntactic errors using a system
component called IFSCP (Ill-Formed Sentence Chart Parser) described by Min & Wilson (1994),
together with a spelling correction module. They tested 4 different lengths of sentences (3, 5, 7,
and 11) and 5 different error types, with a grammar of 210 context-free rules designed to parse a
simple declarative sentence with no conjunctions, passivisation, or relative clauses.
3. MYANMAR LANGUAGE
The Myanmar language is the official language and is more than one thousand years old.
3.1. Features of Myanmar Language
Unlike English language Myanmar is syntax of relatively free-phrase-order language. This can be
easily illustrated with the example “သူသည္ စာအုပ္ကို စားပြဲေပၚတြင္ ထားသည္။” (He places the book on
the table) as shown in table 1. All are valid sentences [12].
Table 1. Word order in Myanmar language
CaseCaseCaseCase Myanmar SentencesMyanmar SentencesMyanmar SentencesMyanmar Sentences Word orderWord orderWord orderWord order
Case 1 သူ စာအုပ္ကို စားပြဲေပၚတြင္ ထားသည္။ (Subj-Obj-Pla-Verb)
Case 2 သူ စားပြဲေပၚတြင္ စာအုပ္ကို ထားသည္။ (Subj-Pla-Obj-Verb)
Case 3 စာအုပ္ကို စားပြဲေပၚတြင္ သူ ထားသည္။ (Obj-Pla-Subj-Verb)
Case 4 စာအုပ္ကို သူ စားပြဲေပၚတြင္ ထားသည္။ (Obj-Subj-Pla-Verb)
Case 5 စားပြဲေပၚတြင္ သူ စာအုပ္ကို ထားသည္။ (Pla-Subj-Obj-Verb)
Case 6 စားပြဲေပၚတြင္ စာအုပ္ကို သူ ထားသည္။ (Pla-Obj-Subj-Verb)
In all the cases, subject is သူ (He), object is စာအုပ္ကို (the book), place is စားပြဲေပၚတြင္ (on the table)
and verb is ထားသည္ (places). From the above example, it is clear that phrase order does not
determine the functional structure in Myanmar language and permits scrambling. Myanmar
language follows Subject-Object-Verb orders in contradiction with English language.
3.2. Issues of Myanmar Language
The highly agglutinative language like Myanmar, nouns and verbs get inflected. Many times we
need to depend on syntactic function or context to decide upon whether the particular word is a
noun or adjective or adverb or post position [12]. This leads to the complexity in Myanmar
grammatical relations. A noun may be categorized as common, proper or compound. Similarly,
verb may be finite, infinite, gerund or contingent.
A number of issues are affecting the function tagging for Myanmar language.
4. 214 Computer Science & Information Technology (CS & IT)
• Myanmar phrases can be written in any order as long as the verb phrase is at the end of
sentence.
For example:
ေမာင္လွသည္ - စာအုပ္တစ္အုပ္ကို - ေမာင္ဘအား - ေပးသည္။
Mg Hla - a book - to Mg Ba - gives
(or)
စာအုပ္တစ္အုပ္ကို - ေမာင္ဘအား - ေမာင္လွက - ေပးသည္။
a book - to Mg Ba - Mg Hla - gives
(Ma Hla gives a book to Mg Ba.)
• The phrase order of Myanmar language is free. The sentence can be constructed by placing
emphatic phrases at the beginning of a sentence.
For example:
သူသည္- သတင္းစာကို - ဖတ္သည္။(Subj-Obj-Verb)
He - newspaper – reads
(or)
သတင္းစာကို - သူ - ဖတ္သည္။ (Obj-Subj-Verb)
newspaper - he - reads
(He reads the newspaper.)
• The subject or object of the sentence can be skipped, and still be a valid sentence.
For example:
ရန္ကုန္သို႔သြားသည္။ (Go to Yangon)
• Myanmar language makes prominent usage of particles, which are untranslatable words
that are suffixed or prefixed to words to indicate level of respect, grammatical tense, or
mood.
For example:
ေမာင္ေမာင္ - မ်ားမ်ားမ်ားမ်ား - ပထမ - ဆု - ရ - လွ်င္ - သူ႔မိဘမ်ား - က - အ့ံၾသ - လိမ့္မည္။
Mg Mg - particle - first - prize - wins - if - his parents - PPM - surprise - will
(If Mg Mg wins the first prize, his parents will surprise.)
• In Myanmar language, an adjective can specialize before or after a noun unlike other
languages.
For example:
သူသည္ - ခ်မ္းသာေသာခ်မ္းသာေသာခ်မ္းသာေသာခ်မ္းသာေသာ - လူ -တစ္ေယာက္ -ျဖစ္သည္။
He - rich - man - a - is
(or)
သူသည္ - လူ - ခ်မ္းသာခ်မ္းသာခ်မ္းသာခ်မ္းသာ - တစ္ေယာက္ -ျဖစ္သည္။
He - man - rich - a - is
(He is a rich man.)
• The subject /object can be another sentence, which does not contain subject or object.
For example:
ကေလးမ်ားသစ္ပင္ေအာက္တြင္ကစားေနသည္ ကို ကၽြန္ေတာ္ျမင္သည္။
(I see the children playing under the tree.)
• The postpositions of subject phrases or object phrases can be hidden.
For example:
သူသည္သည္သည္သည္- ဆရာ၀န္ -တစ္ေယာက္ - ျဖစ္သည္။
He - doctor - a - is
(or)
သူ - ဆရာ၀န္ - တစ္ေယာက္ - ျဖစ္သည္။
He - doctor - a - is
(He is a doctor.)
5. Computer Science & Information Technology (CS & IT) 215
• The postpositions of time phrases or place phrases can be omitted.
For example:
သူမ - ေက်ာင္း - သို႔သို႔သို႔သို႔ - သြားသည္။
She - school - to - goes
(or)
သူမ - ေက်ာင္း - သြားသည္။
She - school - goes
(She goes to school.)
• The verb phrase can be hidden in a Myanmar sentence.
For example:
သူ - ေမာင္လွ -ပါ။
He - Mg Hla - particle
(He is Mg Hla.)
These issues will cause a lot of problem during function tagging, and a lot of possible tags will be
resulted.
3.3. Grammar of Myanmar Language
Grammar studies the rules behind languages. The aspect of grammar that does not concern
meaning directly is called syntax. Myanmar (syntax: SOV), because of its use of postposition
(wi.Bat), would probably be defined as a “postpositional language”, whereas English (syntax:
SVO) because of its use of preposition would probably be defined as a “prepositional language”.
There are really only two parts of speech in Myanmar, the noun and the verb, instead of the
usually accepted eight parts (Pe Maung Tin 1956:195). Most Myanmar linguists [13] accepted
there are eight parts of speech in Myanmar. Myanmar nouns and verbs need the help of suffixes
or particles to show grammatical relations.
For example:
ေက်ာင္းသူမ်ားသာသာသာသာ ဂုဏ္ထူးရသည္။
သူတို႔သည္ အတန္းထဲမွာ ႐ွိၾကၾကၾကၾက၏။
Myanmar is a highly verb-prominent language and that suppression of the subject and omission
of personal pronouns in connected text result in a reduced role of nominals. This observation
misses the critical role of postposition particles marking sentential arguments and also of the verb
itself being so marked. The key to the view of Myanmar being structures by nominals is found in
the role of the particles. Some particles modify the word's part of speech. Among the most
prominent of these is the particle အ, which is prefixed to verbs and adjectives to form nouns or
adverbs.There is a wide variety of particles in Myanmar [14].
For example:
သူတို႔သည္ မႏ ၱေလးတြင္ ၈ ရက္ တိတိတိတိတိတိတိတိ လည္ခဲ့သည္။
Stewart remarked that “The Grammar of Burmese is almost entirely a matter of the correct use of
particles”(Stewart 1956: xi). How one understands the role of the particles is probably a matter of
one's purpose.
3.4. Syntacic Structure of Myanmar Language
It is known that many postpositions can be used in a Myanmar sentence. If the words can be
misplaced in a sentence, the sentence can be abnormal. There are two kinds of sentence as a
sentence construction. They are simple sentence (SS) and complex sentence (CS). In simple
sentence, other phrases such as object, time, and place can be added between subject and verb.
There are two kinds of clause in a complex sentence called independent clause(IC) and dependent
clause (DC).There must be at least one independent clause in a sentence. But there can be more
6. 216 Computer Science & Information Technology (CS & IT)
than one dependent clause in it. IC contains sentence’s final particle (sfp) at the end of a sentence
[15].
SS=IC+sfp
CS=DC...+IC+sfp
IC may be noun phrase or verb or combination of both.
IC=N... (မ်က္မွန္ႏွင့္ေက်ာင္းသား)
IC=V (စား)
IC=N...+V (ဘုရားမွာပန္းနဲ႔ဆီမီးလွဴ)
DC is the same as IC but it must contain a clause marker (cm) in the end.
DC=N...+cm (ေက်ာင္းကဆရာ+ပဲ)
DC=V+cm (ေရာက္+ရင္)
DC=N...+V+cm (စိတ္ထား+ျဖဴ+မွ)
4. PROPOSED SYSTEM
The procedure of the proposed system is described in the following.
Step1. Accept input Myanmar sentence with segmentation, POS tagging and chunking
Step2. Extract one POS tag and its category from each chunk
Step3. Choose the possible function tags for each POS tag by using Naive Bayesian theory
Step4. Display the sentence with function tags
Step5. Parse the function tags by using CFG rules with the proposed grammar
Step6. Display the parse tree as an output
5. CORPUS CREATION
We collected several types of Myanmar texts to construct a corpus. Our corpus is to be built
manually. We extended the POS tagged corpus that is proposed in [3]. The chunk and function
tags are manually added to the POS tagged corpus. The number of sentences is about 3000
sentences with average word length 15 and it is not a balanced corpus that is a bit biased on
Myanmar textbooks of middle school. The corpus size is bigger and bigger because the tested
sentences are automatically added to the corpus. In table 2, Myanmar grammar books and
websites are text collections. Example corpus sentence is shown in figure 2.
Table 2. Corpus Statistics
Text types # of sentences
Myanmar textbooks of middle school 1200
Myanmar Grammar books 600
Myanmar websites 900
Others 300
Total 3000
VC@Active[မိုး႐ြာ/verb.common]#CC@CCS[လွ်င္/cc.sent]#NC@Subj[ကေလး/n.person,မ်ား/part.number]#NC@
PPla[လမ္း/n.location]#PPC@PlaP[ေပၚတြင္/ppm.place]#NC@Obj[ေဘာလံုး/n.objects]#VC@Active [ကန္
ၾက/verb.common]#SFC@Null[သည္/sf]။
Figure 2. A sentence in the corpus
7. Computer Science & Information Technology (CS & IT) 217
6. FUNCTION TAGSET
Function tagging is a process of assigning syntactic categories like subject, object, time and
location to each word in the text document. These are conceptually appealing by encoding an
event in the format of “who did what to whom, where, when”, which provides useful semantic
information of the sentences. We use the function tags that is proposed in [16] because it is easier
to maintain and can add new language features. The function tagsets are shown in table 3.
Table 3. Function Tagsets
Tag Description Example
Active
Subj
PSubj
SubjP
Obj
PObj
ObjP
PIobj
IobjP
Pla
PPla
PlaP
Tim
PTim
TimP
PExt
ExtP
PSim
SimP
PCom
ComP
POwn
OwnP
Ada
PcomplS
PcomplP
PPcomplO
PcomplOP
PUse
UseP
PCau
CauP
PAim
AimP
CCS
CCM
Verb
Subject
Subject
Postposition of Subject
Object
Object
Postposition of Object
Indirect Object
Postposition of Indirect Object
Place
Place
Postposition of Place
Time
Time
Postposition of Time
Extract
Postposition of Extract
Similie
Postposition of Similie
Compare
Postposition of Compare
Own
Postposition of Own
Adjective
Subject Complement
Object Complement
Object Complement
Postposition of Object Complement
Use
Postposition of Use
Cause
Postposition of Cause
Aim
Postposition of Aim
Join the sentences
Join the meanings
စားသည္
သူ
သူ
သည္
ေကာ္ဖီ
ေကာ္ဖီ
ကို
မလွ
အား
ရန္ကုန္
ရန္ကုန္
သို႔
မနက္
မနက္
တြင္
ေက်ာင္းသားမ်ား
အနက္
မင္းသမီး
ကဲ့သို႔
သူ႔ဦးေလး
ႏွင့္အတူ
သူ
၏
လွ
သူသည္ဆရာဆရာဆရာဆရာျဖစ္သည္
ေ႐ႊကိုလက္စြပ္လက္စြပ္လက္စြပ္လက္စြပ္လုပ္ သည္
ထြန္းထြန္း
ဟု
တုတ္
ျဖင့္
မိုး
ေၾကာင့္
အေမ႔
အတြက္
လွ်င္
ထို႔ေၾကာင့္
8. 218 Computer Science & Information Technology (CS & IT)
CCC
CCP
CCA
Join the words
Join with particles
Join as an adjective
ႏွင့္
ကို
မည့္
7. PROPOSED GRAMMAR FOR MYANMAR SENTENCES
Since it is impossible to cover all types of sentences in Myanmar language, we have taken some
portion of the sentence and try to make grammar for them. Myanmar is free-phrase-order
language. In Myanmar language, we see that one sentence can be written in different forms for
the same meaning, i.e. the positions of the tags are not fixed. So we cannot restrict the grammar
rule for one sentence. The grammar rule may be very long, but we have to accept it. The grammar
rule we have tried to make, may not work for all the sentences in Myanmar language because we
have not considered all types of sentences. Some of the sentences are shown below, which are
used to make the grammar rules.
သူ-သည္-ေက်ာင္း-သို႔-သြား-သည္။ (Subj-Pla-Verb)
သူ-သည္-ေက်ာင္းသားတစ္ေယာက္-ျဖစ္-သည္။ (Subj-PcomplS-Verb)
ေကာင္စီ၀င္-အျဖစ္-သူ႔-ကို-လူထု-က-ေရြး-သည္။ (PcomplO-Obj-Subj-Verb)
ေမာင္လွ-သည္-ေခြး-ကို-တုတ္-ျဖင့္-ရိုက္-သည္။ (Subj-Obj-Use-Verb)
သူ-သည္-ဆရာ႔-ကို-စာအုပ္-ေပး-သည္။ (Subj-Obj-Iobj-Verb)
သူမ-သည္-လူနာမ်ား-ကို-ေဆြမ်ိဳးမ်ား-ကဲ႔သို႔-ျပဳစု-သည္။ (Subj-Obj-Sim-Verb)
ကေလးမ်ား-သည္-အေဖာ္-ေၾကာင့္-ပ်က္စီး-သည္။ (Subj-Cau-Verb)
သစ္႐ြက္တို႔-သည္-တေပါင္းလ-၌-ေၾကြ-သည္။ (Subj-Tim-Verb)
တရားသူၾကီး-သည္-ခိုးမႈ-ကို-တရား႐ံုး-၌-နံနက္-က-စစ္ေဆး-သည္။ (Subj-Obj-Pla-Tim-Verb)
အေမသည္-သူ႔သားအတြက္-မုန္႔ကို-ေစ်းမွ-မနက္က-ဝယ္ခဲ႔သည္။ (Subj-Aim-Obj-Pla-Tim-Verb)
Our proposed grammar for Myanmar Sentences:
Sentence →I-sent | I-sent CC I-sent | Obj-sent I-sent | Subj-sent I-sent
I-sent →Subj Obj Pla Verb | Subj Verb | Com Pla Verb
CC →CCA | CCS | CCM
Subj -sent →I-sent CCA Subj
Obj -sent →I-sent CCA Obj
Subj →PSubj SubjP
Subj →Subj
Obj →PObj ObjP
Obj →Obj
Pla →PPla PlaP
PcomplO →PPcomplO PcomplOP
Use →PUse UseP
Sim →PSim SimP
8. NAIVE BAYESIAN CLASSSIFIER
Before one can build naive Bayesian based classifier, one needs to collect training data. The
training data is a set of problem instances. Each instance consists of values for each of the defined
features of the underlying model and the corresponding class, i.e. function tag in our case. The
development of a naive Bayesian classifier involves learning how much each function tag should
be trusted for the decisions it makes [17]. In probability estimation for Naive Bayesian
classifiers, namely that the attribute values are conditionally independent when the target
value is given. Naive Bayesian classifiers are well-matched to the function tagging problem.
9. Computer Science & Information Technology (CS & IT) 219
The Naïve Bayesian classifier is a term in Bayesian statistics dealing with a simple probabilistic
classifier based on applying Bayes’ theorem with strong (naïve) independence assumptions. It
assumes independence among input features. Therefore, given an input vector, its target class can
be found by choosing the one with the highest posterior probability.
8.1. Function Tagging by Using Naïve Bayes Theory
The labels such as subject, object, time, etc. are named as function tags. By function, it is meant
that action or state which a sentence describes. The system operates at word-level with the
assumption that input sentences are pre-segmented, pos-tagged and chunked.
Each proposed function tag is regarded as a class and the task is to find what class/tag a given
word in a sentence belongs to a set of predefined classes/tags. A feature is a POS tag word with
category. The category of a word is added to the POS tag to obtain more accurate lexical
information. It can be formed from the features of that word. For example, noun has 16 categories
such as animals, person, objects, food, location, etc. There are 47 categories in our corpus. We
show some features of Myanmar words as shown in table 4.
Table 4. Features
FeatureFeatureFeatureFeature EnglishEnglishEnglishEnglish MyanMyanMyanMyanmarmarmarmar
n.food apple ပန္းသီး
pron.possessive his သူ႕
ppm.time at တြင္
adj.dem happy ေပ်ာ္ရႊင္ေသာ
part.support can ႏိုင္
cc.mean so ထို႔ေၾကာင့္
v.common go သြား
sf.declarative null ၏
In Myanmar language, some words have same meaning but in different features as shown in table
5.
Table 5. Same word with different features
FeatureFeatureFeatureFeature EnglishEnglishEnglishEnglish MyanmarMyanmarMyanmarMyanmar
cc.chunk and ႏွင့္
ppm.compare with ႏွင့္
ppm.use with ႏွင့္
A class is a one of the proposed function tags. Same word may have different function tags as
shown in table 6.
Table 6. Function tags
Function tagsFunction tagsFunction tagsFunction tags EnglishEnglishEnglishEnglish MyanmarMyanmarMyanmarMyanmar
PcomplS He has a househousehousehouse. အိမ္
PPla He lives in a househousehousehouse. အိမ္
PSubj
A househousehousehouse is near the
school.
အိမ္
10. 220 Computer Science & Information Technology (CS & IT)
PObj He buys a househousehousehouse. အိမ္
There are many chunks in a sentence such as NC (noun chunk), PPC (postpositional chunk), AC
(adjectival chunk), RC (adverbial chunk), CC (conjunctional chunk), SFC (sentence’s final
chunk) and VC (verb chunk). The chunk types are shown in table 7.
Table 7. Chunk types
No.No.No.No. Chunk TypeChunk TypeChunk TypeChunk Type ExampleExampleExampleExample
1 Noun Chunk NC[သူတို႔/pron.person]
2 Postpositional Chunk PPC[သည္/ppm.subj]
3 Adjectival Chunk AC[ရဲရင္႔/adj.dem]
4 Adverbial Chunk RC[လ်င္ျမန္စြာ/adv.manner]
5 Conjunctional Chunk CC[သို႔မဟုတ္/cc.chunk]
6 Sentence Final Chunk SFC[၏/sf.declarative]
7 Verb Chunk VC[ကူညီ/v.common]
A chunk contains a Myanmar head word and its modifier. It can contain more than one POS tag
and one of the POS tags is selected with respect to the chunk type. In the following chunk, the
POS tag (n.animals) is selected with respect to the chunk type (NC).
For example:
NC [ေခြး/n.animals,တစ္/part.number,ေကာင္/part.type]
If the noun chunk (NC) contains more than one noun, the last noun (n.food) is selected as a main
word according to the nature of Myanmar language.
For example:
NC [ေဆာင္းရာသီ/n.time,သီးႏွံပင္/n.food,မ်ား/part.number]
There are many possible function tags (t1, t2…tk) for each POS tag with category (pc). These
possible tags are retrieved from the training corpus by using the following equation that is prior
probability as shown in figure 3.
P (tk|pc) = C (tk,pc)/C(pc) (1)
ppm.use#UseP:1.0
n.natural#PSubj:0.209,Subj:0.2985,PPla:0.1343,PObj:0.1642,PcomplS:0.0448,PPcomplO:0.0149,
PCau:0.0448,PSim:0.0149,PAim:0.0299,Obj:0.0299,PCom:0.0149
pron.possessive#PIobj:0.1111,PSubj:0.2222,PObj:0.6667
cc.chunk#CCC:1.0
adj.dem#Ada: 0.9149, PObj: 0.0213,PSubj:0.0426,Active:0.0213
ppm.cause#CauP:1.0
n.verb#PSubj:0.6667,PObj:0.3333
11. Computer Science & Information Technology (CS & IT) 221
v.common#Active:0.9744,VC:0.0128,PcomplS:0.0128
part.eg#PcomplOP:0.5455,SimP:0.4545
Figure 3. Sample data for POS/Function tag pairs with probability
We calculate the probability between next function tags (n1, n2…nj) and previous possible tags by
using the following equation that is log likelihood as shown in figure 4.
P (nj|tk) = C (nj,tk)/C(tk) (2)
CCC,PSubj=0.2
CCC,PAim=0.04
CCC,Tim=0.04
CCC,PcomplS=0.04
PCau,CauP=1.0
PPla,CCC=0.0156
CCS,Ada=0.0196
CCS,PTim=0.0196
CCS,Tim=0.0131
PlaP,Active=0.6111
PlaP,Subj=0.1111
Figure 4. Sample data for Function/Function tag pairs with probability
Possible function tags are disambiguated by using Naïve Bayesian method. We multiply the
probabilities from (1) and (2) and choose the function tag with the largest number as the posterior
probability.
Technically, the task of function tags assignment is to generate a sentence that has correct
function tags attached to certain words.
Our description of the function tagging process refers to the example as shown in figure 5, which
illustrates the sentence (“မမႏွင့္လွလွသည္ ေက်ာင္းသို႔ စက္ဘီးျဖင့္ သြားသည္။” (Ma Ma and Hla Hla go to
school by bicycle). This sentence is represented as a sequence of word-tags as “noun verb
conjunction noun ppm pronoun verb”. It is described as a sequence of chunk as “NC VC CC NC
PPC NC VC SFC”.
(a) NC[မမ/n.person]#CC[ႏွင့္/cc.chunk]#NC[လွလွ/n.person]#PPC[သည္/ppm.subj]#NC[ေက်ာင္း/n.location]
#PPC[သို႔/ppm.place]#NC[စက္ဘီး/n.objects]#PPC[ျဖင့္/ppm.use]#VC[သြား/v.common]#SFC[သည္/sf]။
(b) PSubj[မမ]#CCC[ႏွင့္]#PSubj[လွလွ]#SubjP[သည္]#PPla[ေက်ာင္း]#PlaP[သို႔]#PUse[စက္ဘီး]#UseP[ျဖင့္]
#Active[သြားသည္]။
Figure 5. An overview of function tagging of the sentence
(a)The input POS-tagged and chunk sentence (b) The output sentence with function tags
12. 222 Computer Science & Information Technology (CS & IT)
8.2. Grammatical Relations of Myanmar Sentence
The LANGUAGE defined by a CFG (context-free grammar) is the set of strings derivable from
the start symbol S (for Sentence). The core of a CFG grammar is a set of production rules that
replaces single variables with strings of variables and symbols. The grammar generates all strings
that, starting with a special start variable, can be obtained by applying the production rules until
no variables remain. A CFG is usually thought in two ways: a device for generating sentences, or
a device if assigning a structure to a given sentence. We use CFG for grammatical relations of
function tags.
A CFG is a 4-tuple <N,Σ,P,S> consisting of
• A set of non-terminal symbols N
• A set of terminal symbols Σ
• A set of productions P
– A-> α
– A is a non-terminal
– α is a string of symbols from the infinite set of strings (ΣU N)*
• A designated start symbol S
8.2.1. Simple Sentence
Consider a simple declarative sentence “သူတို႔သည္ ေမာင္ဘကို ေခါင္းေဆာင္ အျဖစ္ ေရြးခ်ယ္ခဲ့ သည္။” (They
selected Mg Ba as a leader).
(a) NC[သူတို႔/pron.possessive]#PPC[သည္/ppm.subj]#NC[ေမာင္ဘ/n.person]#PPC[ကို/ppm.obj]#NC
[ေခါင္းေဆာင္/n.person]#PPC[အျဖစ္/part.eg]#VC[ေရြးခ်ယ္/v.common,ခဲ့/part.support]#SFC[သည္/ sf]။
(b) PSubj[သူတို႔]#SubjP[သည္]#PObj[ေမာင္ဘ]#ObjP[ကို]#PPcomplO[ေခါင္းေဆာင္ ]#PcomplOP[အျဖစ္] #
Active[ေရြးခ်ယ္ခဲ့သည္]။
(c)
Figure 6. An overview of the function tagging and grammatical relations of simple sentence
(a) The tagged and chunk sentence (b) The sentence with function tags
(c) The syntactic tree structure with function tags
8.2.2. Complex Sentence
Our description of the parsing process refers to the example in figure 7, which illustrates the
sentence “အေဖေပးေသာစာအုပ္ကိုကၽြန္ ေတာ္ဖတ္သည္။” (I read the book which is given by my father).
13. Computer Science & Information Technology (CS & IT) 223
This sentence is represented as a sequence of word-tags as “N V CC N PPC PRON V” .It is
described as a sequence of chunk as “NC VC CC NC PPC NC VC SFC” and the sentence
structure (Sentence) contains separate constituents for the object sentence (Obj-sent) and
independent sentence (I-sent), which contains other phrases. Note that this parse tree has had
some constituents conflated to comply with the constraint that there be only one constituent per
word.
(a) NC [အေဖ/n.person] # VC [ေပး/v.common] # CC [ေသာ/cc.adj] # NC [စာအုပ္/n.objects] # PPC
[ကို/ppm.obj] # NC [ကၽြန္ေတာ္/pron.person] # VC [ဖတ္/v.common] # SFC [သည္/sf]။
(b) Subj[အေဖ]#Active[ေပး]#CCA[ေသာ]#PObj[စာအုပ္]#ObjP[ကို]#Subj[ကၽြန္ေတာ္]#Active[ဖတ္သည္]။
(c)
Figure 7. An overview of the function tagging and grammatical relations of complex sentence (a)
The tagged and chunk sentence (b) The sentence with function tags
(c) The syntactic tree structure with function tags
9. EVALUATION AND RESULTS
The corpus contains about 3000 sentences with average word length 15. All sentences can be
further classified as two sets. One is simple sentence set, in which every sentence has no more
than 15 words. The other is complex sentence set, in which every sentence has more than 15
words. There are 1800 simple sentences and 1200 complex sentences in the corpus.
For evaluation purpose, different numbers of sentences collected from Myanmar textbooks of
middle school and Myanmar grammar books are used as a test set. The test set can be divided into
two groups: first group sentences are composed of word patterns in corpus and second group
sentences are composed of word patterns that are not in the corpus. There are 60 sentences in the
first group and 40 in the second one.The sentences are tested in the program and the function
tagged results are manually checked. In table 8, the performance of function tagging
according to the two groups is described.
14. 224 Computer Science & Information Technology (CS & IT)
Table 8. Performance of function tagging for different sentence patterns
Sentence Patterns Accuracy
sentence patterns in the corpus 97.4%
sentence patterns that are not in the
corpus
89.6%
After implementation of the system using the grammar, it has been seen that the system can easily
generates the parse tree for a sentence if the sentence structure satisfies the grammar rules.
For example we take the following Myanmar simple sentence
မလွ သည္ သူ႔အေမ အတြက္ ကိတ္မုန္႔ ဝယ္လာသည္။
(Ma Hla buys a cake for her mother.)
The structure of the above sentence is Subj-Aim-Obj-Pla-Verb. This is a correct sentence
according to the Myanmar literature. According to the grammar a possible top-down derivation
for the above simple sentence is
1. Sentence [start]
2. >>I-sent [Sentence→I-sent ]
3. >> Subj-Aim-Obj- Verb [I-sent→Subj-Aim-Obj-Verb]
4. >> PSubj SubjP -Aim-Obj- Verb [Subj → PSubj SubjP]
5. >> PSubj SubjP –PAim-AimP-Obj- Verb [Aim → PAim AimP]
6. >> PSubj SubjP –PAim-AimP-Obj-Verb [Obj→Obj]
For example we take the following Myanmar complex sentence
မိုး႐ြာလွ်င္ကေလးမ်ားသည္လမ္းေပၚတြင္ေဘာလံုးကန္ၾကသည္။
(If it rains, the children play the football on the road.)
The structure of the above sentence is Verb-CCS-Subj-Pla-Obj-Verb. This is a correct sentence
according to the Myanmar literature. According to the grammar a possible top-down derivation
for the above complex sentence is
1. Sentence [start]
2. >>I-sent CCS I-sent [Sentence→I-sent CCS I-sent]
3. >>Verb CCS I-sent [I-sent→Verb]
4. >>Verb CCS Subj Pla Obj Verb [I-sent→Subj Pla Obj Verb]
5. >>Verb CCS PSubj SubjP Pla Obj Verb [Subj → PSubj SubjP]
6. >>Verb CCS PSubj SubjP PPla PlaP Obj Verb [Pla → PPla PlaP]
7. >>Verb CCS PSubj SubjP PPla PlaP Obj Verb [Obj→Obj]
From the above derivation it has been seen that the Myanmar sentence is correct according to the
grammar. So our system generates a parse tree successfully.
Our program tests only the sentence structure according to the grammar rules. So if the sentence
structure satisfies the grammar rule, program recognizes the sentence as a correct sentence and
generates a parse tree. Otherwise it gives output as an error.
10. CONCLUSION AND FUTURE WORK
In this paper, we investigate the function tag of the word depending on the sentence structure of
Myanmar language. We used Naïve Bayesian technique for the task of assigning function tags.
For grammatical relations of the function tags, we use context free grammar. The parse tree can
be built by using function tags.
15. Computer Science & Information Technology (CS & IT) 225
As function tagging is a pre-processing step for grammatical relations, the errors occurred in the
task of function tagging affect the relations of the words. The corpus may be balanced because
Naïve Bayesian framework probability simply describes uncertainty. The corpus creation is time
consuming. The corpus is the resource for the development of Myanmar to English translation
system and we expect the corpus to be continually expanded in the future because the tested
sentence can be added into the corpus.
In this work we have considered limited number of Myanmar sentences to construct the grammar
rules. In future work we have to consider as many sentences as we can and some more tags for
constructing the grammar rules because Myanmar language is a free-phrase-order language.
Word position for one sentence may not be same in the other sentences. So we can not restrict
the grammar rules for some limited number of sentences.
REFERENCES
[1] John C. Henderson and Eric Brill. “Exploiting Diversity in Natural Language Processing:
Combining Parsers”.
[2] Don Blaheta (2003) “Function tagging”. Ph.D. Dissertation, Brown University. Advisor-Eugene
Charniak.
[3] Phyu Hnin Myint (2010) “Assigning automatically Part-of-Speech tags to build tagged corpus for
Myanmar language”, The Fifth Conference on Parallel Soft Computing, Yangon, Myanmar.
[4] Phyu Hnin Myint (2011) “Chunk Tagged Corpus Creation for Myanmar Language”. In Proceedings
of the ninth International Conference on Computer Applications, Yangon, Myanmar.
[5] Eugene Charniak (1997) “Statistical parsing with a context-free grammar and word statistics”. In
Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 598-603,
Menlo Park.
[6] Michael Collins (1997) “Three generative, lexicalized models for statistical parsing”. In Proceedings
of the 35th Annual Meeting of the Association for Computational Linguistics, pages 16-23.
[7] Don Blaheta and Mark Johnson (2000) “Assigning function tags to parsed text”. In Proceedings of the
1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics,
234–240.
[8] Mihai Lintean and Vasile Rus (2007) “Naive Bayes and Decision Trees for Function Tagging”. In
Proceedings of the International Conference of the Florida Artificial Intelligence Research Society
(FLAIRS) 2007, Key West, FL, May 2007 (in press).
[9] Yong-uk Park and Hyuk-chul Kwon (2008) “Korean Syntactic Analysis using Dependency Rules and
Segmentation “, Proceedings of the Seventh International Conference on Advanced Language
Processing and Web Information Technology(ALPIT2008), Vol.7, pp.59-63, China, July 23-25, 2008
[10] Mark-Jan Nederhof and Giorgio Satta (2002) “Parsing Non-Recursive Context-Free Grammars”. In
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL
ANNUAL'02), July 7-12, Pages 112-119, Philadelphia, Pennsylvania, USA.
[11] Kyongho Min and William H. Wilson (1995) “Are Efficient Natural Language Parsers Robust?”
Proceedings of the Eighth Australian Joint Conference on Artifical Intelligence (AI '95), Canberra 13-
17 November 1995, 283-290. Edited Xin Yao. World Scientific, Singapore. ISBN 981-02-2484-2
[12] Myanmar Thudda, vol. 1 to 5 in Bur-Myan, Text-book Committee, Basic Edu., Min. of Edu.,
Myanmar, ca. 1986
[13] Shwe Pyi Soe (2010) ျမန္မာဘာသာစကား Aspects of Myanmar Language
[14] Thaung Lwin (1978) နည္းသစ္ျမန္မာသဒၵါ
[15] Ko Lay (2003) ျမန္မာသဒၵါဖြဲ႔စည္းပံု Ph.D. Dissertation, Myanmar Department, University of Educaion.
16. 226 Computer Science & Information Technology (CS & IT)
[16] Win Win Thant (2010) Naive Bayes for Function Tagging in Myanmar Language”, The Fifth
Conference on Parallel Soft Computing, Yangon, Myanmar.
[17] Leon Versteegen (1999) “The Simple Bayesian Classifier as a Classification Algorithm”.
[18] Yoshimasa Tsuruoka and Jun’ichi Tsujii (2005) “Chunk parsing revisited”. In Proceedings of the
Ninth International Workshop on Parsing Technologies. Vancouver, Canada.
[19] Michael Collins (1996) “A New Statistical Parser Based on Bigram Lexical Dependencies”. In
Proceedings of ACL-96, pp. 184–191.
Authors
Win Win Thant is a Ph.D research student. She received B.C.Sc (Bachelor of
Computer Science) degree in 2004, B.C.Sc (Hons.) degree in 2005 and M.C.Sc
(Master of Computer Science) degree in 2007. She is now Assistant Lecturer of
U.C.S.Y (University of Computer Studies, Yangon). She has written one local
paper for Parallel and Soft Computing (PSC) conference in 2010 and one
international paper for ICCA conference in 2011. Her research interests include
Natural Language Processing and Machine Translation.
Tin Myat Htwe is an Associate Professor of U.C.S.Y. She obtained Ph.D degree of Information
Technology from University of Computer Studies, Yangon. Her research interests include Natural Language
Processing, Data Mining and Artificial Intelligence. She has published few papers in International
conferences and International Journals.
Ni Lar Thein is a Rector of U.C.S.Y. She obtained B.Sc.(Chem.),B.Sc.(Hons) and M.Sc. (Computer
Science) from Yangon University and Ph.D.(Computer Engg.) from Nanyang Technological University,
Singapore in 2003. Her research interests include Software Engineering, Artificial Intelligence and Natural
Language Processing. She has published few papers in International conferences and International Journals.