SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
DOI: 10.5121/ijnlc.2015.4502 15
MYANMAR WORDS SORTING
Su Mon Khine, Yadana Thein
University of Computer Studies, Yangon
ABSTRACT
Myanmar word sorting is very important in indexing of search engine to optimize in the searching process
of keywords. This paper proposed an efficient sorting algorithm for Myanmar words based on the weights
of consonants, vowels, devowelizers, and consonant combination of each syllable of the words since
Myanmar words are composed of one syllable or more than one syllable and finally the words are sorted
based on Quick sort. The proposed algorithm is intended to design for Zawgyi_One font, which is mainly
dominant in Myanmar Web pages.
KEYWORDS
Myanmar Script, Myanmar Word sorting.
1. INTRODUCTION
Word sorting plays a vital role in natural language processing applications for lexical analysis,
information retrieval or language specific search engine in order to find user specified keywords
in indexing database more quickly. In search engine indexing databases, keywords or terms
should be stored in sorted order in order to find the relevance documents in the least possible
time.
The sorting of Myanmar words is a difficult task in natural language processing since the nature
of Myanmar script is complex rather than the English language. Myanmar sentences do not have
white space to specify words boundaries and hence word segmentation is essential as a first step
for word sorting.
Myanmar words cannot be sorted directly through using existing sorting algorithms since these
sorting algorithm cannot be directly determined which words is greater or lesser than the other
words. In this paper, Quick sort algorithm, the fastest sorting algorithm, is used to sort the input
Myanmar words and the determination of which word is greater or lesser is computed by the
proposed algorithm that sort as a dictionary order [5]. Although there are various fonts in
Myanmar text, the proposed algorithm is implemented for Zawgyi_One fonts of Myanmar words,
which is mainly dominant in web pages.
The objective of this paper is to know how to sort Myanmar words in Myanmar language and to
make more efficient in searching time of indexing database of search engine that are indexed
based on sorted Myanmar words in future.
This paper is organized into six sections. Literature reviews are discussed in the next section.
Section 3 describes about Myanmar scripts and Section 4 describes about Myanmar words and
segmentation .The proposed algorithm will be explained in Section 5. Experimental results will
be discussed in Section 6 and the proposed system will be concluded in Section 7.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
16
2. LITERATURE REVIEWS
The Myanmar script, also known as Burmese script and it is used to write Burmese language. The
Myanmar writing style derives from a Brahmi-related script borrowed from South India in about
the eighth century [4].
To sort Myanmar words, syllable segmentation and word segmentation are first important steps.
Zin Maung Maung , Yoshiki Mikami [1], proposed a rule based syllable segmentation algorithm
for Myanmar text and segmentation rules were created based on the syllable structure of
Myanmar script. They tested a training corpus of 32,283 Myanmar syllables and they achieved
99.96% accuracy for syllable segmentation.
Hla Hla Htay, Kavi Narayana Murthy [2] described words segmentation process for Myanmar
words. The training corpus contained more than 75000 sentences from various web sites and
normalized various fonts to Win Innwa font before segmentation. They removed stopwords which
are frequently occurred in sentences and segmented Myanmar words by using N-gram statistics of
Text::Ngrams which is developed by Vlado Keselj [3]. Manual checking was performed to build
lists of valid words and they have collected about 100,000 Myanmar words.
Hla Hla Htay, Kavi Narayana [6] also proposed Myanmar word segmentation using the syllable
level longest Matching. Firstly, they have collected 4550 syllables from available sources of
2,728 sentences and build the words lists from available sources including dictionaries and by
generating syllable n-grams as possible words [2], a total of 800000 words. Secondly, word
segmentation is carried out with the longest syllable word matching using their 800,000 strong
stored word list. Finally, they tested the algorithm over 5000 sentences and the algorithm
correctly recognized 34633 words of 34943 words with achieving 99.11% precision and 98.81%
recall respectively.
Due to the practice of Buddhism and study of Buddhist literature in Myanmar, Pali words are
influenced and adopted on Myanmar language .Zin Maung Maung [7] presented an algorithm to
identify Myanmar –adopted Pali words in Myanmar language. The algorithm combined rule-
based syllable segmentation and a dictionary based longest matching method to identify
Myanmar Pali words. Firstly, the algorithm converted to Unicode font from Zawgyi_One font and
then, syllable segmentation is carried out by [1]. Pali words were identified by using a dictionary-
based syllable-level longest matching method, which scans the input text by sequentially reading
each syllable from the input text and matching the syllable against stored Pali words in dictionary
[8]containing 3477 Myanmar adopted Pali words. The newly found Pali words are added to the
dictionary in the Pali word inventory process. Incorrectly identified Pali words are flittered in Pali
word disambiguation process by matching them with Myanmar words. The algorithm tested on a
total of 12536 sentences and achieved precision of 97.59 % and recall to 99.04 %.
Rsesearcher, Di Jiang [10] discussed the problem of sorting orders of Tibetan dictionaries and
Tibetan electronic dictionaries. Firstly, this paper stated that three reasons for the disagreement
among different dictionaries: there is no criteria on sorting orders of Tibetan-transliterated letters
from Sanskrit, the nature of some character forms is not clear and whether a dictionary ought to
contain those forms of Tibetan sounds newly-emerged in its phonology or not. Therefore, they
proposed three standardizing principles for compiling Tibetan dictionaries: agreement,
compatibility and rationality to overcome these problems. For the aspects of standardization, this
paper carried out a referential scheme to Tibetan dictionaries or electronic dictionaries on its
sorting order, by setting up reasonable sorting values. Finally, they designed a set of digital codes
for each Tibetan letter or character and assign distinctive sorting values to all existing words in
the electronic dictionary with algorithm that revise errors of the [11] , according to standardized
principle of compiling dictionaries.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
17
Syllable segmentation in this paper is used by rule based syllable segmentation described in [9],
in which syllables were segmented by using rule based syllable segmentation for Zawgyi_One
fonts. In this paper, words are segmented by finding the longest word in the dictionary that
matches the ending point of the phases, which is widely used for segmentation in Myanmar
language and words are sorted according to the Myanmar to English dictionary[5] sorting order
such as " " (dance / first letter of Myanmar Alphabet), " " (period), " " (car), " "
(tool),.......," " (remain)", " " (muscle) .
3. MYANMAR SCRIPT
Myanmar language is the official language of Myanmar, spoken as first language by two thirds of
the population of 60 million and 10 million as a second language, particularly ethnic minorities in
Myanmar. Myanmar script draws its source of Brahmi script which flourished in India from about
500 B.C.to over 300 AD and a system of writing constructed from consonants, vowels symbols
related to the relevant consonants, consonant combination symbols, devowelizer and digits.
Myanmar script is composed of 33 consonants, 12 basic vowels, 8 independent vowels, 11
consonant combination symbols and 38 devowelizer [5] and is written from left to right in
horizontal line.
The combination of one or more characters but not more than eight characters will become one
syllable; combination of one or more syllables becomes one word and combination of one or
more than one words becomes one phase and phases are combined into sentences. Finally, a
paragraph is formed by one sentence or more than one sentences.
Figure 1 shows the structure of Myanmar sentence and Figure 2 shows structure of Myanmar
syllable ( ) that is equivalent to ‘school’ in English and that contains 7 characters
- .
Figure1. Structure of Myanmar Sentence
Figure2. Structure of Myanmar Syllable
Consonant
Devowelizer
Consonant
Combination
Vowel
Sentence level
Phrase level
Word level
Syllable level
Character
level-
-
- --
-
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
18
3.1 Consonant (C)
Myanmar word consists of one consonant or more than one consonant. There are basically 33
consonants, also known as "byee" [5] and these consonants are described in Table 1. An example
of Myanmar word " " (bear) consists of two syllables " " and " ". The first syllable
consists of two consonants " " and " ", and the second syllable consists of one consonant " ".
Table1. Myanmar Consonant
3.2 Vowel (V)
Basically, there are 12 vowels in Myanmar writing [5]. They are " " ," ", " ", " ", " " , " ",
" ” , " " ," ", " ", " " ," " . The variation " " ," ", " ", " ", " " , " ", " ,
" " ," ", " ", " " ," ", can be written . These variations on 12 basic vowels can be
extended with the employment of tone markers (- ) and (- ) and therefore there are totally 23
vowels in Myanmar scripts shown in Table 2. The vowels " ", " ", " " , " "," ", " ",
" ” and the other vowels " "," "," ", "၄ ", can also called independent vowels .The variation of
12 basic vowels which are mostly occurred in Myanmar words rather than independent vowels is
used to sort Myanmar words in this proposed algorithm.
Table2. Myanmar Vowels
- - - - - - - -
- - - - - - - -
- - - - - - -
3.3 Devowelizer (D)
There are 38 devowelizers in Myanmar scripts [5] which are formed by combining the character
" " (asat) to some consonants and that are shown in Table 3.
Table3. Myanmar Devowelizers
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
19
3.4 Consonant Combination (CC)
There are four basic consonant combinations in Myanmar scripts [5]such as " " , " ", " " ,
" " , and one or more consonant combination can be combined with consonant to become the
sound of consonant. Totally, there are 11 consonant combination symbols in Myanmar scripts and
that are described in Table 4.
Table 4. Myanmar Consonant combination
3.5 Myanmar Fonts, encoding system and normalization of Zawgyi_One font
The first generation of Myanmar encoding systems were ASCII code in which Latin English
glyphs were replaced by the Myanmar script glyphs to render the Myanmar script which was no
standardization of encoding characters. Firstly, Myanmar script was added to Unicode
Consortium in 1999 as version 3.0 and improved Unicode5.1 in 2008 and Myanmar3, Padauk and
Parabaik fonts are in the range of U+1000 to U+109F. And then, various fonts such as Myazedi,
Zawgyi_One have been created. Although Zawgyi_One is not Unicode standard, over 90% of
Web sites use Zawgyi_One font. Unicode stores text in only one order and render correctly.
Zawgyi–One can store text in several ways but superficially appears correct. Therefore, the
proposed sorting algorithm is developed for Zawgyi_One fonts and normalizes various writing
style to one standard style. For example, the user can write the vowels sing "- " to ' ' ,' ' or '
', ' ' after writing consonant ' ' for syllable ' ' that is equivalent to 'Ko' in English. Table 5
shows different encoding sequences of Unicode and Zawgyi_One and Table 6 shows some
examples of normalization of Zawgy_One font.
Table 5. Sequence style of using Unicode and Zawgyi_One for Myanmar Syllable “ ”
Fonts Sequence Style
Unicode ++++ ++++ =
++++ ++++ ====
Zawgyi-One + + =
+ + =
Table 6. Normalization of Zawgyi_One font
Various forms of writing sequence Normalize sequence
- , - -
- , - -
- , - , - , - -
- , - -
- , - -
- , - , - , - ႔ -
......... ....
- , - , - , - -
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
20
4. MYANMAR WORDS AND SEGMENTATION
Although Myanmar sentences are defined by a sentence bounder marker such as " ", words are
not separated by special character such as white space or punctuation marks. Spaces may
sometimes use in sentences, but it may not be meaningfully separated as a word. In Myanmar
text, meaningful words are composed of syllables and syllables are formed by combining the
consonants, vowels, devowelizer and consonant combination characters. In this paper, syllable
segmentation is carried out by rule based syllable segmentation [9] for Zawgyi_One font. After
the syllable segmentation is carried out, words are segmented by finding the longest word in the
dictionary that matches the ending point of the phases, which is widely used for segmentation in
Myanmar language. Finally, these words are sorted by the proposed algorithm.
Myanmar words can be divided into simple words, compound words, pali words (Patsint), and
loan words and these words are shown in Table 7. Patsints are written by combining two syllables
into one syllable by omitting the last character of devowelizer of first syllable " "and these can
be repeated to form a pali or patsints word. The following words are some examples of Myanmar
Pali words.
" " (paper) , " " (eye) , " " (species of land lily) , " " (copy), " "
(box), " "(clever or skillful ) .
The proposed algorithm can be sorted any types Myanmar words written by users.
Table7. Types of Myanmar Word
Types of Myanmar word Examples
Simple words (moon) (bear)
Compound words (kettle) => (hot water)
+ (pot)
Pali words (Patsint) " " (paper) " " (box)
Loan words (computer ) (radio)
5. THE PROPOSED SORTING ALGORITHM
After word segmentation is carried out, Myanmar words are sorted by the proposed algorithm
according to the following step.
Firstly, weights are assigned to consonants, vowels, devowelizer and consonant combination
symbols in ascending order. These weights are assigned according to Myanmar dictionary sorting
order.s
a. Weights 1 to 33 are assigned to 33 consonants (C)
b. Weights 34 to 56 are assigned to 23 vowels (V)
c. Weights 57 to 94 are assigned to 38 devowelizers (D) and
d. Weights 95 to 105 are assigned to consonant combination symbols (CC).
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
21
Secondly, the input words are separated into syllable
( => _ ) and then compute and sort according to their weight
of each syllable.
The following steps are used to sort Myanmar words.
1. Firstly, separate each syllable of the words into consonant, vowel, devowelizer and consonant
combination.
2. Then, the algorithm compares the weight of the consonants of first syllable of each compared
words.
3. If they are the same consonants according to their weight values, then compare the weight of
consonant combinations of syllable.
4. If the weights of consonant combinations are equal, then compare to devowelizer and if the
devowelizers of syllables are equal, then compare to vowels of syllables.
5. Finally, if the weights of vowel of syllables are equal, then next syllables of words are
compared as shown in above until end of the length of the shorted syllable is reached.
These detail steps of the algorithm are shown in Figure 3.
Algorithm MW_Sorting( W1, W2)
1. Assign weight to C, V, D, CC in ascending order .
2. MaxW Ф
3. S1 (s1w1,s2w1,….,snw1) syll-segment(W1) // input words are segmented
into syllable
S2 (s1w2,s2w2,….,snw2) syll-segment(W2)
4. for (k=0;k<min(W1,W2).length-1;k++) do
5. Result Compare_Syll (skw1, skw2)
6. if result equal to 3 then MaxW W1 break;
7. if result equal to 2 then MaxW W2 break;
if result equal to 1 , then
8. if k is last index ,then
9. if W1 length is equal to W2, then W1 is equal to W2
10. else if W1.lenght>W2.length , then MaxW W1
11. else MaxW W2
12. end for
13. return MaxW
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
22
Figure3. Proposed Myanmar word sorting algorithm
Examples of Myanmar word sorting
Example (1) W1= ( word 1) , W2= (word 2 )
s1w1= (first syllable of word1), s1w2= (first syllable of word2)
Table8. Comparison of syllables " " and " "
C Weight of C V Weight of V D Weight of D CC Weight of CC
1 - 56 - - 95
1 - 56 - - 96
Table8 shows example of comparison of syllable " " and " ". In this example, the
weights of two consonants " " and " " of syllables " " ,1 and " " ,1 are equal.
So, the algorithm needs to compare the weights of the consonant combinations " " and " " of
two syllables. The weights of two consonant combinations are " " to 95 and " " to 96 .
Therefore, word 2 is greater than word1 and the algorithm does not need to compare the next
syllables of words.
Example (2) W1= , W2=
s1w1= ,s1w2=
Table 9.Comparison of syllables " " and " "
C Weight of C V Weight of V D Weight of D CC Weight of CC
1 - 34 - - 95
1 - 34 - - 95
Function Compare_Syll (s1w1, s1w2)
1. [C, V, D, CC] s1w1 // separate each syllable into 4 parts
[C, V, D, CC] s1w2
2. result compare_Val ( s1w1 [C], s1w2 [C])
3. if result equal 1 , then
4. for (k= s1w1.lenght-1; k>0;k--) do
5. result = compare_Val (s1w1.[k], s1w2.[k])
6. if result ≠ 1 then
7. go to 10.
8. end for
9. end if
10. return result.
Function compare_Val (v1,v2)
1. if (v1>v2) then result 3; // compare their weight
2. if (v1<v2) then result 2; //
3. else result=1.
5. return result
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
23
Table 9 shows the comparison of syllable " " and " " .In the second
example, the weights of consonant, vowel and consonant combination of first syllable of word1
and the weights of consonant ,vowel and consonant combination of first syllable of word2 are
equal. So, the algorithm compares the next syllables of word1 and word2 ," " and " " are
shown in Table 10.The weight of vowel " " ,40 is greater than the vowel "- ",35 and
therefore, word1 is greater than word2 and shown in Table 10.
Table10. Comparison of syllables " " and " "
C Weight of C V Weight of V D Weight of D CC Weight of CC
28 - 40 - - - -
28 - 35 - - - -
The proposed algorithm can also sort Myanmar pali words (Patsint) such as " ",
" ". The pali word " " is formed by omitting the character " " of first
syllable and then directly concat to second syllable of word " " and formed as a
superscription. For these pali words, the algorithm will be sorted after converting " " to
" " and then compare the weights of syllables.
6. EXPERIMENTAL RESULTS
The proposed algorithm tested on different words of various sources such as newspapers,
websites and, including simple words, loan words, compound words, and Pali words, as a total of
15200 words written in Zawgyi_One font. Firstly, we tested for simple words, loan words,
compound words and the sorting result is shown in Figure 4. Secondly, we tested for the
Myanmar Pali words and the result of sorting algorithm is shown in Figure 5. We achieved
99.73% average accuracy of both tests and we found 41 errors from these 15200 tested words.
These errors are caused by sorting of words that include the independent vowels ,which is
explained in section 3.2 ,such as “ ", " ", " " , " "," ", " ", " ” and the other vowels
" "," "," ", "၄ ", since the proposed sorting algorithm is developed for variation of 12 basic
vowels. For example, the words such as “ ” (desirable object of perception), “
” (lead), “ ” (guest.) , which include “ ", " "," ” independent vowels and the
algorithm will not be calculated the weight of these independent vowels and caused to sorting
errors. The algorithm will be enhanced for these errors by mapping the independent vowels such
as “ ", " ", to " ", " ", vowels in order to correctly sort for the words that include these
independent vowels in the future.
Figure 4. Result of sorting of Myanmar words
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
24
Figure 5. Result of sorting of Myanmar Pali words
7. CONCLUSION
The proposed sorting algorithm allows to sort different types of Myanmar words written by the
user. Myanmar word sorting is an important step for search engine to search the sorted Myanmar
keywords in indexing databases in order to speed up the searching process. This proposed
algorithm is intended to develop the Zawgyi_One font for Myanmar words, since Zawgyi_One is
mainly dominant on Myanmar Web documents and the sorting result of this algorithm will be
used for indexing database of search engine in later in order to improve the performance of search
engine. In future, the algorithm will be enhanced for independent vowels of Myanmar Language.
ACKNOWLEDGEMENTS
First of all, I would like to thank to U Myint Kyi “Takkatho Myat Soes”, member of
MyanmarLanguage Commission, for his supports and encouragement to develop system.
Moreover, I wish to express special thanks to Daw Myint Myint Kyi, a teacher of Myanmar, for
her valuable guidance and suggestions to complete this paper.
REFERENCES
[1] Zin Maung Maung, Yoshiki Mikami , "A Rule-based Syllable Segmentation of Myanmar Text"
,Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 51–58.
[2] Hla Hla Htay, Kavi Narayana Murthy, "Myanmar Word Segmentation" Department of Computer and
Information Sciences University of Hyderabad.
[3] Vlado Keselj, “Text :Ngrams” software, http://search.cpan.org/~vlado/ Text-Ngrams-1.8/
[4] "Myanmar Script", Southeast Asian Scripts.
[5] Myanmar-English Dictionary, Department of the Myanmar Language Commission, Ministry of
Education, Union of Myanmar Language Commission, January 2013.
[6] Hla Hla Htay, Kavi Narayana Murthy, "Myanmar Word Segmentation using Syllable level Longest
Matching", The 6th Workshop on Asian Language Resources, 2008.
[7] Zin Maung Maung "Identification of Adopted Pali Words in Myanmar Text", IJCSI International
Journal of Computer Science Issues, Vol. 9, Issue 6, No 1, November 2012.
[8] U. H. Myint, Dictionary of Pali-derived Words, First Edition, Universities' Press, Yangon, Myanmar,
1986.
[9] Su Mon Khine and Yadana Thein, " Myanmar Web pages crawler " , Fourth International conference
on Natural Language Processing , February 21~22 , 2015, Sydney ,Australia
[10] Di Jiang “The Current Status of Sorting Order of Tibetan Dictionaries and Standardization” ,
Academy of humanities, Shanghai Normal University, Shanghai, 200234 , Institute of Ethnology &
Anthropology, CASS, Beijing, 100081
[11] Jiang, Di, Kang, Caijun: The Sorting Mathematical Model and Algorithm of Written Tibetan
Language. Journal of Computer Science and Technology. (2004) Vol. 27: 4. 524-529
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015
25
Authors
Su Mon Khine received M.C.Sc and B.C. Sc, in Computer Science, from University of
Computer Studies, Yangon. She is now PhD candidate in Information and Technology
and currently doing research at University of Computer Studies, Yangon. Her research
interest includes web crawling, information retrieval, natural language processing and
data mining.
Dr. Yadana Thein is now working as an Associate Professor in University of Computer
Studies, Yangon (UCSY) under Ministry of Science and Technology, Myanmar. She is
particularly interested in Optical Character Recognition, Speech Processing and
Networking. She published over 40 papers in workshops, conferences and journals.
Currently, she is working as a principal of University of Computer Studies, LoiCaw. She
supervises Master thesis and PhD research candidates in the areas of Image Processing.

More Related Content

What's hot

ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ijnlc
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODijait
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...kevig
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEijnlc
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbsijaia
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiIJERA Editor
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 
Phonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: KannadaPhonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: KannadaIJERA Editor
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 

What's hot (16)

ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbs
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
Phonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: KannadaPhonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: Kannada
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 

Similar to MYANMAR WORDS SORTING

OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMERkevig
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ijnlc
 
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...kevig
 
Parsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function TaggingParsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function Taggingkevig
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathiiosrjce
 
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...IJERA Editor
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
 
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCES
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCESSTATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCES
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCEScscpconf
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
 
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)cscpconf
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKijnlc
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 

Similar to MYANMAR WORDS SORTING (20)

OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMEROPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
 
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
ISOLATING WORD LEVEL RULES IN TAMIL LANGUAGE FOR EFFICIENT DEVELOPMENT OF LAN...
 
Parsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function TaggingParsing of Myanmar Sentences With Function Tagging
Parsing of Myanmar Sentences With Function Tagging
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
A Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for MarathiA Corpus-Based Concatenative Speech Synthesis System for Marathi
A Corpus-Based Concatenative Speech Synthesis System for Marathi
 
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
Duration for Classification and Regression Treefor Marathi Textto- Speech Syn...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
 
F017163443
F017163443F017163443
F017163443
 
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCES
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCESSTATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCES
STATISTICAL FUNCTION TAGGING AND GRAMMATICAL RELATIONS OF MYANMAR SENTENCES
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)
IMPROVEMENT OF CRF BASED MANIPURI POS TAGGER BY USING REDUPLICATED MWE (RMWE)
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 

More from kevig

Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Taggingkevig
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabikevig
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimizationkevig
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structurekevig
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Languagekevig
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structurekevig
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...kevig
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEkevig
 
February 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfFebruary 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfkevig
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmkevig
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Modelkevig
 
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENENLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENEkevig
 
January 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language ComputingJanuary 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language Computingkevig
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsingkevig
 
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...kevig
 

More from kevig (20)

Genetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech TaggingGenetic Approach For Arabic Part Of Speech Tagging
Genetic Approach For Arabic Part Of Speech Tagging
 
Rule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to PunjabiRule Based Transliteration Scheme for English to Punjabi
Rule Based Transliteration Scheme for English to Punjabi
 
Improving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data OptimizationImproving Dialogue Management Through Data Optimization
Improving Dialogue Management Through Data Optimization
 
Document Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language StructureDocument Author Classification using Parsed Language Structure
Document Author Classification using Parsed Language Structure
 
Rag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented GenerationRag-Fusion: A New Take on Retrieval Augmented Generation
Rag-Fusion: A New Take on Retrieval Augmented Generation
 
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...
 
Evaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English LanguageEvaluation of Medium-Sized Language Models in German and English Language
Evaluation of Medium-Sized Language Models in German and English Language
 
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONIMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATION
 
Document Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language StructureDocument Author Classification Using Parsed Language Structure
Document Author Classification Using Parsed Language Structure
 
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONRAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATION
 
Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...Performance, energy consumption and costs: a comparative analysis of automati...
Performance, energy consumption and costs: a comparative analysis of automati...
 
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEEVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGE
 
February 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdfFebruary 2024 - Top 10 cited articles.pdf
February 2024 - Top 10 cited articles.pdf
 
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank AlgorithmEnhanced Retrieval of Web Pages using Improved Page Rank Algorithm
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithm
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov ModelNERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
NERHMM: A Tool for Named Entity Recognition Based on Hidden Markov Model
 
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENENLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
NLization of Nouns, Pronouns and Prepositions in Punjabi With EUGENE
 
January 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language ComputingJanuary 2024: Top 10 Downloaded Articles in Natural Language Computing
January 2024: Top 10 Downloaded Articles in Natural Language Computing
 
Clustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language BrowsingClustering Web Search Results for Effective Arabic Language Browsing
Clustering Web Search Results for Effective Arabic Language Browsing
 
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
Semantic Processing Mechanism for Listening and Comprehension in VNScalendar ...
 

Recently uploaded

VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 

Recently uploaded (20)

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 

MYANMAR WORDS SORTING

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 DOI: 10.5121/ijnlc.2015.4502 15 MYANMAR WORDS SORTING Su Mon Khine, Yadana Thein University of Computer Studies, Yangon ABSTRACT Myanmar word sorting is very important in indexing of search engine to optimize in the searching process of keywords. This paper proposed an efficient sorting algorithm for Myanmar words based on the weights of consonants, vowels, devowelizers, and consonant combination of each syllable of the words since Myanmar words are composed of one syllable or more than one syllable and finally the words are sorted based on Quick sort. The proposed algorithm is intended to design for Zawgyi_One font, which is mainly dominant in Myanmar Web pages. KEYWORDS Myanmar Script, Myanmar Word sorting. 1. INTRODUCTION Word sorting plays a vital role in natural language processing applications for lexical analysis, information retrieval or language specific search engine in order to find user specified keywords in indexing database more quickly. In search engine indexing databases, keywords or terms should be stored in sorted order in order to find the relevance documents in the least possible time. The sorting of Myanmar words is a difficult task in natural language processing since the nature of Myanmar script is complex rather than the English language. Myanmar sentences do not have white space to specify words boundaries and hence word segmentation is essential as a first step for word sorting. Myanmar words cannot be sorted directly through using existing sorting algorithms since these sorting algorithm cannot be directly determined which words is greater or lesser than the other words. In this paper, Quick sort algorithm, the fastest sorting algorithm, is used to sort the input Myanmar words and the determination of which word is greater or lesser is computed by the proposed algorithm that sort as a dictionary order [5]. Although there are various fonts in Myanmar text, the proposed algorithm is implemented for Zawgyi_One fonts of Myanmar words, which is mainly dominant in web pages. The objective of this paper is to know how to sort Myanmar words in Myanmar language and to make more efficient in searching time of indexing database of search engine that are indexed based on sorted Myanmar words in future. This paper is organized into six sections. Literature reviews are discussed in the next section. Section 3 describes about Myanmar scripts and Section 4 describes about Myanmar words and segmentation .The proposed algorithm will be explained in Section 5. Experimental results will be discussed in Section 6 and the proposed system will be concluded in Section 7.
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 16 2. LITERATURE REVIEWS The Myanmar script, also known as Burmese script and it is used to write Burmese language. The Myanmar writing style derives from a Brahmi-related script borrowed from South India in about the eighth century [4]. To sort Myanmar words, syllable segmentation and word segmentation are first important steps. Zin Maung Maung , Yoshiki Mikami [1], proposed a rule based syllable segmentation algorithm for Myanmar text and segmentation rules were created based on the syllable structure of Myanmar script. They tested a training corpus of 32,283 Myanmar syllables and they achieved 99.96% accuracy for syllable segmentation. Hla Hla Htay, Kavi Narayana Murthy [2] described words segmentation process for Myanmar words. The training corpus contained more than 75000 sentences from various web sites and normalized various fonts to Win Innwa font before segmentation. They removed stopwords which are frequently occurred in sentences and segmented Myanmar words by using N-gram statistics of Text::Ngrams which is developed by Vlado Keselj [3]. Manual checking was performed to build lists of valid words and they have collected about 100,000 Myanmar words. Hla Hla Htay, Kavi Narayana [6] also proposed Myanmar word segmentation using the syllable level longest Matching. Firstly, they have collected 4550 syllables from available sources of 2,728 sentences and build the words lists from available sources including dictionaries and by generating syllable n-grams as possible words [2], a total of 800000 words. Secondly, word segmentation is carried out with the longest syllable word matching using their 800,000 strong stored word list. Finally, they tested the algorithm over 5000 sentences and the algorithm correctly recognized 34633 words of 34943 words with achieving 99.11% precision and 98.81% recall respectively. Due to the practice of Buddhism and study of Buddhist literature in Myanmar, Pali words are influenced and adopted on Myanmar language .Zin Maung Maung [7] presented an algorithm to identify Myanmar –adopted Pali words in Myanmar language. The algorithm combined rule- based syllable segmentation and a dictionary based longest matching method to identify Myanmar Pali words. Firstly, the algorithm converted to Unicode font from Zawgyi_One font and then, syllable segmentation is carried out by [1]. Pali words were identified by using a dictionary- based syllable-level longest matching method, which scans the input text by sequentially reading each syllable from the input text and matching the syllable against stored Pali words in dictionary [8]containing 3477 Myanmar adopted Pali words. The newly found Pali words are added to the dictionary in the Pali word inventory process. Incorrectly identified Pali words are flittered in Pali word disambiguation process by matching them with Myanmar words. The algorithm tested on a total of 12536 sentences and achieved precision of 97.59 % and recall to 99.04 %. Rsesearcher, Di Jiang [10] discussed the problem of sorting orders of Tibetan dictionaries and Tibetan electronic dictionaries. Firstly, this paper stated that three reasons for the disagreement among different dictionaries: there is no criteria on sorting orders of Tibetan-transliterated letters from Sanskrit, the nature of some character forms is not clear and whether a dictionary ought to contain those forms of Tibetan sounds newly-emerged in its phonology or not. Therefore, they proposed three standardizing principles for compiling Tibetan dictionaries: agreement, compatibility and rationality to overcome these problems. For the aspects of standardization, this paper carried out a referential scheme to Tibetan dictionaries or electronic dictionaries on its sorting order, by setting up reasonable sorting values. Finally, they designed a set of digital codes for each Tibetan letter or character and assign distinctive sorting values to all existing words in the electronic dictionary with algorithm that revise errors of the [11] , according to standardized principle of compiling dictionaries.
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 17 Syllable segmentation in this paper is used by rule based syllable segmentation described in [9], in which syllables were segmented by using rule based syllable segmentation for Zawgyi_One fonts. In this paper, words are segmented by finding the longest word in the dictionary that matches the ending point of the phases, which is widely used for segmentation in Myanmar language and words are sorted according to the Myanmar to English dictionary[5] sorting order such as " " (dance / first letter of Myanmar Alphabet), " " (period), " " (car), " " (tool),.......," " (remain)", " " (muscle) . 3. MYANMAR SCRIPT Myanmar language is the official language of Myanmar, spoken as first language by two thirds of the population of 60 million and 10 million as a second language, particularly ethnic minorities in Myanmar. Myanmar script draws its source of Brahmi script which flourished in India from about 500 B.C.to over 300 AD and a system of writing constructed from consonants, vowels symbols related to the relevant consonants, consonant combination symbols, devowelizer and digits. Myanmar script is composed of 33 consonants, 12 basic vowels, 8 independent vowels, 11 consonant combination symbols and 38 devowelizer [5] and is written from left to right in horizontal line. The combination of one or more characters but not more than eight characters will become one syllable; combination of one or more syllables becomes one word and combination of one or more than one words becomes one phase and phases are combined into sentences. Finally, a paragraph is formed by one sentence or more than one sentences. Figure 1 shows the structure of Myanmar sentence and Figure 2 shows structure of Myanmar syllable ( ) that is equivalent to ‘school’ in English and that contains 7 characters - . Figure1. Structure of Myanmar Sentence Figure2. Structure of Myanmar Syllable Consonant Devowelizer Consonant Combination Vowel Sentence level Phrase level Word level Syllable level Character level- - - -- -
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 18 3.1 Consonant (C) Myanmar word consists of one consonant or more than one consonant. There are basically 33 consonants, also known as "byee" [5] and these consonants are described in Table 1. An example of Myanmar word " " (bear) consists of two syllables " " and " ". The first syllable consists of two consonants " " and " ", and the second syllable consists of one consonant " ". Table1. Myanmar Consonant 3.2 Vowel (V) Basically, there are 12 vowels in Myanmar writing [5]. They are " " ," ", " ", " ", " " , " ", " ” , " " ," ", " ", " " ," " . The variation " " ," ", " ", " ", " " , " ", " , " " ," ", " ", " " ," ", can be written . These variations on 12 basic vowels can be extended with the employment of tone markers (- ) and (- ) and therefore there are totally 23 vowels in Myanmar scripts shown in Table 2. The vowels " ", " ", " " , " "," ", " ", " ” and the other vowels " "," "," ", "၄ ", can also called independent vowels .The variation of 12 basic vowels which are mostly occurred in Myanmar words rather than independent vowels is used to sort Myanmar words in this proposed algorithm. Table2. Myanmar Vowels - - - - - - - - - - - - - - - - - - - - - - - 3.3 Devowelizer (D) There are 38 devowelizers in Myanmar scripts [5] which are formed by combining the character " " (asat) to some consonants and that are shown in Table 3. Table3. Myanmar Devowelizers
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 19 3.4 Consonant Combination (CC) There are four basic consonant combinations in Myanmar scripts [5]such as " " , " ", " " , " " , and one or more consonant combination can be combined with consonant to become the sound of consonant. Totally, there are 11 consonant combination symbols in Myanmar scripts and that are described in Table 4. Table 4. Myanmar Consonant combination 3.5 Myanmar Fonts, encoding system and normalization of Zawgyi_One font The first generation of Myanmar encoding systems were ASCII code in which Latin English glyphs were replaced by the Myanmar script glyphs to render the Myanmar script which was no standardization of encoding characters. Firstly, Myanmar script was added to Unicode Consortium in 1999 as version 3.0 and improved Unicode5.1 in 2008 and Myanmar3, Padauk and Parabaik fonts are in the range of U+1000 to U+109F. And then, various fonts such as Myazedi, Zawgyi_One have been created. Although Zawgyi_One is not Unicode standard, over 90% of Web sites use Zawgyi_One font. Unicode stores text in only one order and render correctly. Zawgyi–One can store text in several ways but superficially appears correct. Therefore, the proposed sorting algorithm is developed for Zawgyi_One fonts and normalizes various writing style to one standard style. For example, the user can write the vowels sing "- " to ' ' ,' ' or ' ', ' ' after writing consonant ' ' for syllable ' ' that is equivalent to 'Ko' in English. Table 5 shows different encoding sequences of Unicode and Zawgyi_One and Table 6 shows some examples of normalization of Zawgy_One font. Table 5. Sequence style of using Unicode and Zawgyi_One for Myanmar Syllable “ ” Fonts Sequence Style Unicode ++++ ++++ = ++++ ++++ ==== Zawgyi-One + + = + + = Table 6. Normalization of Zawgyi_One font Various forms of writing sequence Normalize sequence - , - - - , - - - , - , - , - - - , - - - , - - - , - , - , - ႔ - ......... .... - , - , - , - -
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 20 4. MYANMAR WORDS AND SEGMENTATION Although Myanmar sentences are defined by a sentence bounder marker such as " ", words are not separated by special character such as white space or punctuation marks. Spaces may sometimes use in sentences, but it may not be meaningfully separated as a word. In Myanmar text, meaningful words are composed of syllables and syllables are formed by combining the consonants, vowels, devowelizer and consonant combination characters. In this paper, syllable segmentation is carried out by rule based syllable segmentation [9] for Zawgyi_One font. After the syllable segmentation is carried out, words are segmented by finding the longest word in the dictionary that matches the ending point of the phases, which is widely used for segmentation in Myanmar language. Finally, these words are sorted by the proposed algorithm. Myanmar words can be divided into simple words, compound words, pali words (Patsint), and loan words and these words are shown in Table 7. Patsints are written by combining two syllables into one syllable by omitting the last character of devowelizer of first syllable " "and these can be repeated to form a pali or patsints word. The following words are some examples of Myanmar Pali words. " " (paper) , " " (eye) , " " (species of land lily) , " " (copy), " " (box), " "(clever or skillful ) . The proposed algorithm can be sorted any types Myanmar words written by users. Table7. Types of Myanmar Word Types of Myanmar word Examples Simple words (moon) (bear) Compound words (kettle) => (hot water) + (pot) Pali words (Patsint) " " (paper) " " (box) Loan words (computer ) (radio) 5. THE PROPOSED SORTING ALGORITHM After word segmentation is carried out, Myanmar words are sorted by the proposed algorithm according to the following step. Firstly, weights are assigned to consonants, vowels, devowelizer and consonant combination symbols in ascending order. These weights are assigned according to Myanmar dictionary sorting order.s a. Weights 1 to 33 are assigned to 33 consonants (C) b. Weights 34 to 56 are assigned to 23 vowels (V) c. Weights 57 to 94 are assigned to 38 devowelizers (D) and d. Weights 95 to 105 are assigned to consonant combination symbols (CC).
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 21 Secondly, the input words are separated into syllable ( => _ ) and then compute and sort according to their weight of each syllable. The following steps are used to sort Myanmar words. 1. Firstly, separate each syllable of the words into consonant, vowel, devowelizer and consonant combination. 2. Then, the algorithm compares the weight of the consonants of first syllable of each compared words. 3. If they are the same consonants according to their weight values, then compare the weight of consonant combinations of syllable. 4. If the weights of consonant combinations are equal, then compare to devowelizer and if the devowelizers of syllables are equal, then compare to vowels of syllables. 5. Finally, if the weights of vowel of syllables are equal, then next syllables of words are compared as shown in above until end of the length of the shorted syllable is reached. These detail steps of the algorithm are shown in Figure 3. Algorithm MW_Sorting( W1, W2) 1. Assign weight to C, V, D, CC in ascending order . 2. MaxW Ф 3. S1 (s1w1,s2w1,….,snw1) syll-segment(W1) // input words are segmented into syllable S2 (s1w2,s2w2,….,snw2) syll-segment(W2) 4. for (k=0;k<min(W1,W2).length-1;k++) do 5. Result Compare_Syll (skw1, skw2) 6. if result equal to 3 then MaxW W1 break; 7. if result equal to 2 then MaxW W2 break; if result equal to 1 , then 8. if k is last index ,then 9. if W1 length is equal to W2, then W1 is equal to W2 10. else if W1.lenght>W2.length , then MaxW W1 11. else MaxW W2 12. end for 13. return MaxW
  • 8. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 22 Figure3. Proposed Myanmar word sorting algorithm Examples of Myanmar word sorting Example (1) W1= ( word 1) , W2= (word 2 ) s1w1= (first syllable of word1), s1w2= (first syllable of word2) Table8. Comparison of syllables " " and " " C Weight of C V Weight of V D Weight of D CC Weight of CC 1 - 56 - - 95 1 - 56 - - 96 Table8 shows example of comparison of syllable " " and " ". In this example, the weights of two consonants " " and " " of syllables " " ,1 and " " ,1 are equal. So, the algorithm needs to compare the weights of the consonant combinations " " and " " of two syllables. The weights of two consonant combinations are " " to 95 and " " to 96 . Therefore, word 2 is greater than word1 and the algorithm does not need to compare the next syllables of words. Example (2) W1= , W2= s1w1= ,s1w2= Table 9.Comparison of syllables " " and " " C Weight of C V Weight of V D Weight of D CC Weight of CC 1 - 34 - - 95 1 - 34 - - 95 Function Compare_Syll (s1w1, s1w2) 1. [C, V, D, CC] s1w1 // separate each syllable into 4 parts [C, V, D, CC] s1w2 2. result compare_Val ( s1w1 [C], s1w2 [C]) 3. if result equal 1 , then 4. for (k= s1w1.lenght-1; k>0;k--) do 5. result = compare_Val (s1w1.[k], s1w2.[k]) 6. if result ≠ 1 then 7. go to 10. 8. end for 9. end if 10. return result. Function compare_Val (v1,v2) 1. if (v1>v2) then result 3; // compare their weight 2. if (v1<v2) then result 2; // 3. else result=1. 5. return result
  • 9. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 23 Table 9 shows the comparison of syllable " " and " " .In the second example, the weights of consonant, vowel and consonant combination of first syllable of word1 and the weights of consonant ,vowel and consonant combination of first syllable of word2 are equal. So, the algorithm compares the next syllables of word1 and word2 ," " and " " are shown in Table 10.The weight of vowel " " ,40 is greater than the vowel "- ",35 and therefore, word1 is greater than word2 and shown in Table 10. Table10. Comparison of syllables " " and " " C Weight of C V Weight of V D Weight of D CC Weight of CC 28 - 40 - - - - 28 - 35 - - - - The proposed algorithm can also sort Myanmar pali words (Patsint) such as " ", " ". The pali word " " is formed by omitting the character " " of first syllable and then directly concat to second syllable of word " " and formed as a superscription. For these pali words, the algorithm will be sorted after converting " " to " " and then compare the weights of syllables. 6. EXPERIMENTAL RESULTS The proposed algorithm tested on different words of various sources such as newspapers, websites and, including simple words, loan words, compound words, and Pali words, as a total of 15200 words written in Zawgyi_One font. Firstly, we tested for simple words, loan words, compound words and the sorting result is shown in Figure 4. Secondly, we tested for the Myanmar Pali words and the result of sorting algorithm is shown in Figure 5. We achieved 99.73% average accuracy of both tests and we found 41 errors from these 15200 tested words. These errors are caused by sorting of words that include the independent vowels ,which is explained in section 3.2 ,such as “ ", " ", " " , " "," ", " ", " ” and the other vowels " "," "," ", "၄ ", since the proposed sorting algorithm is developed for variation of 12 basic vowels. For example, the words such as “ ” (desirable object of perception), “ ” (lead), “ ” (guest.) , which include “ ", " "," ” independent vowels and the algorithm will not be calculated the weight of these independent vowels and caused to sorting errors. The algorithm will be enhanced for these errors by mapping the independent vowels such as “ ", " ", to " ", " ", vowels in order to correctly sort for the words that include these independent vowels in the future. Figure 4. Result of sorting of Myanmar words
  • 10. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 24 Figure 5. Result of sorting of Myanmar Pali words 7. CONCLUSION The proposed sorting algorithm allows to sort different types of Myanmar words written by the user. Myanmar word sorting is an important step for search engine to search the sorted Myanmar keywords in indexing databases in order to speed up the searching process. This proposed algorithm is intended to develop the Zawgyi_One font for Myanmar words, since Zawgyi_One is mainly dominant on Myanmar Web documents and the sorting result of this algorithm will be used for indexing database of search engine in later in order to improve the performance of search engine. In future, the algorithm will be enhanced for independent vowels of Myanmar Language. ACKNOWLEDGEMENTS First of all, I would like to thank to U Myint Kyi “Takkatho Myat Soes”, member of MyanmarLanguage Commission, for his supports and encouragement to develop system. Moreover, I wish to express special thanks to Daw Myint Myint Kyi, a teacher of Myanmar, for her valuable guidance and suggestions to complete this paper. REFERENCES [1] Zin Maung Maung, Yoshiki Mikami , "A Rule-based Syllable Segmentation of Myanmar Text" ,Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 51–58. [2] Hla Hla Htay, Kavi Narayana Murthy, "Myanmar Word Segmentation" Department of Computer and Information Sciences University of Hyderabad. [3] Vlado Keselj, “Text :Ngrams” software, http://search.cpan.org/~vlado/ Text-Ngrams-1.8/ [4] "Myanmar Script", Southeast Asian Scripts. [5] Myanmar-English Dictionary, Department of the Myanmar Language Commission, Ministry of Education, Union of Myanmar Language Commission, January 2013. [6] Hla Hla Htay, Kavi Narayana Murthy, "Myanmar Word Segmentation using Syllable level Longest Matching", The 6th Workshop on Asian Language Resources, 2008. [7] Zin Maung Maung "Identification of Adopted Pali Words in Myanmar Text", IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 1, November 2012. [8] U. H. Myint, Dictionary of Pali-derived Words, First Edition, Universities' Press, Yangon, Myanmar, 1986. [9] Su Mon Khine and Yadana Thein, " Myanmar Web pages crawler " , Fourth International conference on Natural Language Processing , February 21~22 , 2015, Sydney ,Australia [10] Di Jiang “The Current Status of Sorting Order of Tibetan Dictionaries and Standardization” , Academy of humanities, Shanghai Normal University, Shanghai, 200234 , Institute of Ethnology & Anthropology, CASS, Beijing, 100081 [11] Jiang, Di, Kang, Caijun: The Sorting Mathematical Model and Algorithm of Written Tibetan Language. Journal of Computer Science and Technology. (2004) Vol. 27: 4. 524-529
  • 11. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.5, October 2015 25 Authors Su Mon Khine received M.C.Sc and B.C. Sc, in Computer Science, from University of Computer Studies, Yangon. She is now PhD candidate in Information and Technology and currently doing research at University of Computer Studies, Yangon. Her research interest includes web crawling, information retrieval, natural language processing and data mining. Dr. Yadana Thein is now working as an Associate Professor in University of Computer Studies, Yangon (UCSY) under Ministry of Science and Technology, Myanmar. She is particularly interested in Optical Character Recognition, Speech Processing and Networking. She published over 40 papers in workshops, conferences and journals. Currently, she is working as a principal of University of Computer Studies, LoiCaw. She supervises Master thesis and PhD research candidates in the areas of Image Processing.