SlideShare a Scribd company logo
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
DOI : 10.5121/ijnlc.2013.2207 67
RULE BASED TRANSLITERATION SCHEME FOR
ENGLISH TO PUNJABI
Deepti Bhalla1
, Nisheeth Joshi2
and Iti Mathur3
1,2,3
Apaji Institute, Banasthali University, Rajasthan, India
1
deeptibhalla0600@gmail.com
2
nisheeth.joshi@rediffmail.com
3
mathur_iti@rediffmail.com
ABSTRACT
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
KEYWORDS
Machine Translation, Machine Transliteration, Name entity recognition, Syllabification.
1. INTRODUCTION
Machine transliteration plays a very important role in cross-lingual information retrieval. The
term, name entity is widely used in natural language processing. Name entity recognition has
many applications in the field of natural language processing like information extraction i.e.
extracting the structured text from unstructured text, data classification and question answering.
In this paper we are only emphasizing on transliteration of proper names & locations using
statistical approach. The process of transliterating a word from its native language to foreign
language is called forward transliteration and the reverse process is called backward
transliteration. This paper addresses the problem of forward transliteration i.e. from English to
Punjabi. For Example, ‘Silver’ in English is transliterated to ‘ਿਸਲਵਰ’ into Punjabi instead of
‘ਚਦੀ ’. Here our main aim is to develop a transliteration system which can be used to properly
transliterate the name entities as well as the words which are not name entities from source
language to target language. We have used English-Punjabi parallel corpus of proper names and
locations to perform our experiment.
The remainder of this paper is organized as follows: In section 2 we described related work
Section 3 gives a brief introduction about analysis of English and Punjabi script. Section 4
explains our methodology and experimental set up. Section 5 consists of evaluation and results.
Finally in section 6 we conclude the paper.
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
68
2. RELATED WORK
A lot of work has been carried out in the field of machine transliteration. Kamal Deep and Goyal
[1] have addressed the problem of transliterating Punjabi to English language using rule based
approach. The proposed transliteration scheme uses grapheme based method to model the
transliteration problem. This technique has demonstrated transliteration from Punjabi to English
for common names and achieved accuracy of 93.22%. Sharma et al. [2] presented English-Hindi
Transliteration using Statistical Machine Translation in different Notations. In this paper they
have shown that transliteration in WX-notation gives improved results over UTF-notation by
training the English-Hindi corpus (of Indian names) using Phrase based statistical machine
translation. Kamal deep et al. [3] proposed Hybrid Approach for Transliteration from Punjabi to
English. In this paper they addressed the problem of transliterating Punjabi to English language
using statistical rule based approach. The Authors used letter to letter mapping as baseline and
tried to find out the improvements by statistical methods. kaur and Josan [4] proposed a
framework which addresses the issue of Statistical Approach to Transliteration from English to
Punjabi. In this Statistical Approach to transliteration is used from English to Punjabi using
MOSES, a statistical machine translation tool. They applied transliteration rules and after that
showed average percentage, accuracy of the system and BLEU score which comes out to be
63.31% and 0.4502 respectively. Padda et al. [5] presented conversion system on Punjabi Text to
IPA. In this paper they described steps for converting Punjabi text to IPA. In this paper, they have
shown how a letter of Punjabi maps directly to its corresponding IPA symbol which represents
the sounds of spoken language. Josan et al. [6] presented a novel approach to improve Punjabi to
Hindi transliteration by combining a basic character to character mapping approach with rule
based and Soundex based enhancements. Experimental results show that this approach effectively
improves the word accuracy rate and average Levenshtein distance. Dhore et al. [7] have
addressed the problem of machine transliteration where given a named entity in Hindi using
Devanagari script need to be transliterated in English using CRF as a statistical probability tool
and n-gram as a features set. In this approach, they presented machine transliteration of named
entities for Hindi-English language pair using CRF as a statistical probability tool and n-gram as
feature set. They achieved accuracy of 85.79% . Musa et al. [8] proposed Syllabification
Algorithm based on Syllable Rules Matching for Malay Language. In this paper, authors
presented a new syllabification algorithm for Malay language.
3. ANALYSIS OF ENGLISH AND PUNJABI SCRIPT
Punjabi is an ancient and Indo-Aryan language which is mainly used in regions of Punjab. It is
one of the world’s 14th
widely spoken languages. Its script is Gurumukhi which is based on
Devanagri. Punjabi is one of the Indian language and official language of Punjab. Along with
Punjab it is also spoken in neighbouring areas such as Haryana and Delhi. There are 19 Vowels
(10 Independent Vowels and 9 dependent Vowels) and 41 Consonants in Punjabi language.
English on the other hand is based on Roman Script. It is one of the six international languages of
United Nations. There are 26 letters in English. Out of which 21 are Consonants and 5 are
Vowels.
4. METHODOLOGY AND EXPERIMENTAL SETUP
In our work, we have English Punjabi parallel corpus of name entities. We are basically
emphasizing on transliteration of our input text with the help of rule matching. We are using
statistical approach for transliteration that is we train and test our model using MOSES toolkit.
With the help of GIZA++ we have calculated translation probabilities which are based on relative
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
69
frequency. For constructing our rules we are using syllabification approach. The syllable is a
combination of vowel and consonant pairs such as V, VC, CV, VCC, CVC, CCVC and CVCC.
Almost all the languages have VC, CV or CVC structures so we are using vowel and consonants
as the basic phonification unit.
Some possible rules that we have constructed for syllabification are shown in table 1:
Table1: Possible combinations for syllables
Syllable
structure
Example Syllabified
form(English)
Syllabified
form(Punjabi)
V eka [e] [ka] [ਏ] [ਕਾ]
CV tarun [ta] [run] [ਤ] [ਰੁਣ]
VC angela [an] [ge] [la] [ਅਨ] [ਗੇ] [ਲਾ]
CVC sidhima [si] [dhi] [ma] [ਿਸ] [ਧੀ] [ਮਾ]
CCVC Odisha [o] [di] [sha] [ਓ] [ਡੀ] [ਸ਼ਾ]
VCC obhika [o] [bhi] [ka] [ਓ] [ਭੀ] [ਕਾ]
V=Vowel, C=Consonant
If we have a word P then it will be syllabified in the form of {p1,p2,p3….pn} where p1,p2..pn are
the individual syllables. We have syllabified our English input using rule based approach for
which we have used the following algorithm.
Algorithm for syllable extraction
1. Enter input string in English.
2. Identify Vowels and Consonants.
3. Identify Vowel-Consonant combination and consider it as one syllable.
4. Identify Consonants followed by Vowels and consider them as separate syllables.
5. Identify Vowels followed by two continuous Consonants as separate syllable.
6. Consider Vowel surrounded by two Consonants as separate syllable.
7. Transliterate each syllable into Punjabi.
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
70
For converting our English input to equivalent Punjabi output we first recognize the name entities
from our input sentence. For this we have used Stanford NER tool [10] which gives us name
entities in English text. The text entered by the user is first analyzed and then pre-processed. After
pre-processing if the selected input is a proper name or location then it is passed to the
syllabification module through which the syllables are extracted. After selection of equivalent
probability, syllables are combined to form the Punjabi word. If the selected input is not a name
entity it will be passed to the syllabification module and will be transliterated with the help of
probability matching. Separate probabilities are calculated for those words which are not name
entities.
Table 2: Corpus
English
Words
Punjabi
Words
English
Sentences
Training 25,500 25,500
Testing 6080 1500
Table 2 demonstrates the English-Punjabi parallel corpus that we have used for performing our
experiment. In this we have trained our language model using IRSTLM toolkit [9]. Our
transliteration system follows the steps which are represented in figure 1.
Example:
Source sentence: Teena is going to Haryana
Target sentence: ਟੀਨਾ ਇਜ ਗੋਇੰਗ ਟੂ ਹਿਰਆਣਾ
In this initially the name entities will be extracted which are Teena (proper name) and Haryana
(location). After pre-processing, these words will be checked with the help of name entity
recognizer for identifying whether they are name entities or not. If the word is a name entity of
type proper name and location, they will be passed to the syllabification module. In this case the
name Teena will be syllabified in the form of Tee na and location name Haryana will be
syllabified in the form of Har ya na. After syllabification, these words will be checked for their
corresponding English-Punjabi probabilities and equivalent Punjabi syllables are selected. The
words other than the name entities are syllabified and matched against the separate probabilities.
For example going are transliterated to go ing. Finally these syllables are joined to generate final
output sentence.
Through language model training we have calculated N-gram probabilities which are based on
relative frequency. For calculating the N-gram probability we are using the following formula
[11].
(1)
Where probability of a word wn given a word wn-1 is count (wn-1, wn)/count (wn-1)
For example:
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
71
For a name Kunal probabilities for every syllable can be as follows:
= 0.8797654
= 0.7333333
Final Score=Prob(ਕੁ|ku)×Prob(ਣਾਲ|nal)
=0.64516127
Figure 1: System architecture
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
72
After selecting the equivalent Punjabi syllable on the basis of probabilities these syllables are
joined together and final output is obtained.
5. EVALUATION AND RESULTS
We have used a corpus of 1500 sentences which had 15982 words. Among them 45% were name
entities i.e. 7192 were proper names, locations, organization and Date & Time. Among these 88%
were name entities of type person & location i.e. 6328 were person names or location names.
From these our system was able to correctly transliterate 6109 name entities. Remaining 55 % i.e.
8790 words were those that were not name entities and from these our system was able to
transliterate 7987 words correctly. This provided us with accuracy of 88.19%. The accuracy was
calculated using the formula:
*100 (2)
Some of the names which are correctly transliterated by our system are as follows:.
Table 3: Correct output
spardha ਸਪਰਧਾ
sanbosh ਸੰਬੋਸ਼
kamaldeep ਕਮਲਦੀਪ
haryana ਹਿਰਆਣਾ
shelly ਸ਼ੇਲੀ
Some of the names which are not correctly transliterated by our system are as follows:
Table 4: Incorrect output
falkanvai ਫਲਕੰਵਆ
gowryharan ਗੋਰੀਹਰਣ
seerajuddeen ਸੀਰਾਜੁਿਦਨ
kolkandkar ਕੋਲਕੰ ਡਕਰ
6. CONCLUSION
In this paper we have performed our experiment using statistical machine translation tool. We
have shown how we can extract syllables from our input text and perform transliteration of name
International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013
73
entities (proper names and locations) and rest of the words other than name entities in source
language to target language through probability calculation. This transliteration system will prove
to be very beneficial. Through this approach we attained accuracy of 88.19%. In future we will
work on remaining name entities that were not included in this experiment.
7. REFERENCES
[1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English transliteration system”. In
International Journal of Computer Science and Communication Vol. 2, No. 2, pp. 521-526.
[2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration using
Statistical Machine Translation in different Notation” International Conference on Computing and
Control Engineering (ICCCE 2012).
[3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English Transliteration
System” International Journal of Computer Applications (0975 – 8887) Volume 28– No.1.
[4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from English to
Punjabi”, In Proceeding of International Journal on Computer Science and Engineering (IJCSE), Vol. 3
Issue 4, p1518.
[5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text to IPA
Conversion” International Journal of Emerging Technology and Advanced Engineering Website:
www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10.
[6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine Transliteration
System” Computational Linguistics and Chinese Language Processing Vol. 15, No. 2, pp. 77-102.
[7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English Machine
Transliteration of Named Entities using Conditional Random Fields.” International Journal of Computer
Applications;6/15/2012, Vol. 48, p31.
[8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification algorithm
based on syllable rules matching for Malay language." Proceedings of the 10th WSEAS international
conference on Applied computer and applied computational science. World Scientific and Engineering
Academy and Society (WSEAS).
[9] To download IRSTLM toolkit http://www.statmt.org
[10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating Non-local
Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual
Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370.
[11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to speech
Recognition, natural language processing, and computational linguistics.
Authors
Deepti Bhalla is pursuing her M.Tech in Computer Science from Banasthali University, Rajasthan and is
working as a Research Assistant in English-Indian Languages Machine Translation System Project
sponsored by TDIL Programme, DEITY. She has her interest in Machine Translation specifically in
English-Punjabi Language Pair. She has developed various tools on Punjabi Language Processing. Her
current research interest includes Natural Language Processing and Machine Translation.
Nisheeth Joshi is a researcher working in the area of Machine Translation. He has been primarily working
in design and development of evaluation Matrices in Indian languages. Besides this he is also actively
involved in the development of MT engines for English to Indian Languages. He is one of the expert
empanelled with TDIL programme, Department of electronics Information Technology (DEITY), Govt.
of India, a premier organization which foresees Language Technology Funding and Research in India. He
has several publications in various journals and conferences and also serves on the Programme
Committees and Editorial Boards of several conferences and journals.
Iti Mathur is an assistant professor at Banasthali University. Her primary area of research is
computational semantics and ontological engineering. Besides this she is also involved in the
development of MT engines for English to Indian Languages. She is one of the experts empanelled with
TDIL Programme, Department of Electronics Information Technology (DEITY), Govt. of India, a
premier organization which foresees Language Technology Funding and Research in India. She has
several publications in various journals and conferences and also serves on the Programme Committees
and Editorial Boards of several conferences and journals.

More Related Content

What's hot

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
ijcsa
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbs
ijaia
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ijnlc
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
ijcsa
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiIJERA Editor
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
kevig
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
csandit
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
IJERA Editor
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
ijfcstjournal
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
ijnlc
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ijnlc
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
kevig
 

What's hot (16)

FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Difficulties in processing malayalam verbs
Difficulties in processing malayalam verbsDifficulties in processing malayalam verbs
Difficulties in processing malayalam verbs
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTKANALYSIS OF MWES IN HINDI TEXT USING NLTK
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 

Viewers also liked

Stream rugby cheetahs vs queensland reds live
Stream rugby cheetahs vs queensland reds liveStream rugby cheetahs vs queensland reds live
Stream rugby cheetahs vs queensland reds live
alanalaric
 
Lecture: class 1: Nov 4
Lecture: class 1: Nov 4Lecture: class 1: Nov 4
Lecture: class 1: Nov 4
Gina Asaro-Collura
 
StartAct.me - We create the new culture of success!
StartAct.me - We create the new culture of success!StartAct.me - We create the new culture of success!
StartAct.me - We create the new culture of success!
Aleksei Shabarshin
 
оек
оек оек
оек
web007
 
Kundeseminar April 2014, universell utforming og cookie loven
Kundeseminar April 2014, universell utforming og cookie lovenKundeseminar April 2014, universell utforming og cookie loven
Kundeseminar April 2014, universell utforming og cookie loven
CoreTrek
 
Tukkievaria 2013
Tukkievaria 2013Tukkievaria 2013
Tukkievaria 2013
University of Pretoria
 
Herramientas estadisticas basicas 2
Herramientas estadisticas basicas 2Herramientas estadisticas basicas 2
Herramientas estadisticas basicas 2
EDAMSS Consultoria y Capacitacion
 
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun AdeniyiFUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
Olusegun Adeniyi
 
LEÇON 257 – Que je me souvienne de ce qu’est mon but.
LEÇON 257 – Que je me souvienne de ce qu’est mon but.LEÇON 257 – Que je me souvienne de ce qu’est mon but.
LEÇON 257 – Que je me souvienne de ce qu’est mon but.
Pierrot Caron
 
jacobs mgmt leadership
jacobs mgmt leadershipjacobs mgmt leadership
jacobs mgmt leadershipDilip Deka
 
Artikel Avlöningsdag Lån (12)
Artikel Avlöningsdag Lån (12)Artikel Avlöningsdag Lån (12)
Artikel Avlöningsdag Lån (12)
tackybedding7257
 
Портфоліо викладача Сорока
Портфоліо викладача СорокаПортфоліо викладача Сорока
Портфоліо викладача Сорока
zudatua
 
Servomotor Application
Servomotor ApplicationServomotor Application
Servomotor ApplicationAzlan Ismail
 

Viewers also liked (16)

Stream rugby cheetahs vs queensland reds live
Stream rugby cheetahs vs queensland reds liveStream rugby cheetahs vs queensland reds live
Stream rugby cheetahs vs queensland reds live
 
Lecture: class 1: Nov 4
Lecture: class 1: Nov 4Lecture: class 1: Nov 4
Lecture: class 1: Nov 4
 
StartAct.me - We create the new culture of success!
StartAct.me - We create the new culture of success!StartAct.me - We create the new culture of success!
StartAct.me - We create the new culture of success!
 
оек
оек оек
оек
 
Hoofdstuk 5 eco
Hoofdstuk 5 ecoHoofdstuk 5 eco
Hoofdstuk 5 eco
 
Kundeseminar April 2014, universell utforming og cookie loven
Kundeseminar April 2014, universell utforming og cookie lovenKundeseminar April 2014, universell utforming og cookie loven
Kundeseminar April 2014, universell utforming og cookie loven
 
Tukkievaria 2013
Tukkievaria 2013Tukkievaria 2013
Tukkievaria 2013
 
Herramientas estadisticas basicas 2
Herramientas estadisticas basicas 2Herramientas estadisticas basicas 2
Herramientas estadisticas basicas 2
 
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun AdeniyiFUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
FUEL SUBSIDY: THE DAY OF RAGE By Olusegun Adeniyi
 
LEÇON 257 – Que je me souvienne de ce qu’est mon but.
LEÇON 257 – Que je me souvienne de ce qu’est mon but.LEÇON 257 – Que je me souvienne de ce qu’est mon but.
LEÇON 257 – Que je me souvienne de ce qu’est mon but.
 
jacobs mgmt leadership
jacobs mgmt leadershipjacobs mgmt leadership
jacobs mgmt leadership
 
Artikel Avlöningsdag Lån (12)
Artikel Avlöningsdag Lån (12)Artikel Avlöningsdag Lån (12)
Artikel Avlöningsdag Lån (12)
 
Портфоліо викладача Сорока
Портфоліо викладача СорокаПортфоліо викладача Сорока
Портфоліо викладача Сорока
 
Servomotor Application
Servomotor ApplicationServomotor Application
Servomotor Application
 
Susan Overpeck Resume
Susan Overpeck ResumeSusan Overpeck Resume
Susan Overpeck Resume
 
Russia e commerce update june 2013
Russia e commerce update june 2013Russia e commerce update june 2013
Russia e commerce update june 2013
 

Similar to RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI

A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration System
Editor IJCATR
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ijnlc
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET Journal
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
cscpconf
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
ijnlc
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
ijnlc
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
kevig
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
IJERA Editor
 
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
ijnlc
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
kevig
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
ijnlc
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
kevig
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
kevig
 
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHODA SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
IJwest
 
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment MethodA Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
dannyijwest
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
ijnlc
 

Similar to RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI (20)

A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration System
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Ijetcas14 444
Ijetcas14 444Ijetcas14 444
Ijetcas14 444
 
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
DEVELOPMENT OF ARABIC NOUN PHRASE EXTRACTOR (ANPE)
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
ANNOTATED GUIDELINES AND BUILDING REFERENCE CORPUS FOR MYANMAR-ENGLISH WORD A...
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHODA SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
 
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment MethodA Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 DOI : 10.5121/ijnlc.2013.2207 67 RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI Deepti Bhalla1 , Nisheeth Joshi2 and Iti Mathur3 1,2,3 Apaji Institute, Banasthali University, Rajasthan, India 1 deeptibhalla0600@gmail.com 2 nisheeth.joshi@rediffmail.com 3 mathur_iti@rediffmail.com ABSTRACT Machine Transliteration has come out to be an emerging and a very important research area in the field of machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper transliteration of name entities plays a very significant role in improving the quality of machine translation. In this paper we are doing machine transliteration for English-Punjabi language pair using rule based approach. We have constructed some rules for syllabification. Syllabification is the process to extract or separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper names and location). For those words which do not come under the category of name entities, separate probabilities are being calculated by using relative frequency through a statistical machine translation toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to Punjabi. KEYWORDS Machine Translation, Machine Transliteration, Name entity recognition, Syllabification. 1. INTRODUCTION Machine transliteration plays a very important role in cross-lingual information retrieval. The term, name entity is widely used in natural language processing. Name entity recognition has many applications in the field of natural language processing like information extraction i.e. extracting the structured text from unstructured text, data classification and question answering. In this paper we are only emphasizing on transliteration of proper names & locations using statistical approach. The process of transliterating a word from its native language to foreign language is called forward transliteration and the reverse process is called backward transliteration. This paper addresses the problem of forward transliteration i.e. from English to Punjabi. For Example, ‘Silver’ in English is transliterated to ‘ਿਸਲਵਰ’ into Punjabi instead of ‘ਚਦੀ ’. Here our main aim is to develop a transliteration system which can be used to properly transliterate the name entities as well as the words which are not name entities from source language to target language. We have used English-Punjabi parallel corpus of proper names and locations to perform our experiment. The remainder of this paper is organized as follows: In section 2 we described related work Section 3 gives a brief introduction about analysis of English and Punjabi script. Section 4 explains our methodology and experimental set up. Section 5 consists of evaluation and results. Finally in section 6 we conclude the paper.
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 68 2. RELATED WORK A lot of work has been carried out in the field of machine transliteration. Kamal Deep and Goyal [1] have addressed the problem of transliterating Punjabi to English language using rule based approach. The proposed transliteration scheme uses grapheme based method to model the transliteration problem. This technique has demonstrated transliteration from Punjabi to English for common names and achieved accuracy of 93.22%. Sharma et al. [2] presented English-Hindi Transliteration using Statistical Machine Translation in different Notations. In this paper they have shown that transliteration in WX-notation gives improved results over UTF-notation by training the English-Hindi corpus (of Indian names) using Phrase based statistical machine translation. Kamal deep et al. [3] proposed Hybrid Approach for Transliteration from Punjabi to English. In this paper they addressed the problem of transliterating Punjabi to English language using statistical rule based approach. The Authors used letter to letter mapping as baseline and tried to find out the improvements by statistical methods. kaur and Josan [4] proposed a framework which addresses the issue of Statistical Approach to Transliteration from English to Punjabi. In this Statistical Approach to transliteration is used from English to Punjabi using MOSES, a statistical machine translation tool. They applied transliteration rules and after that showed average percentage, accuracy of the system and BLEU score which comes out to be 63.31% and 0.4502 respectively. Padda et al. [5] presented conversion system on Punjabi Text to IPA. In this paper they described steps for converting Punjabi text to IPA. In this paper, they have shown how a letter of Punjabi maps directly to its corresponding IPA symbol which represents the sounds of spoken language. Josan et al. [6] presented a novel approach to improve Punjabi to Hindi transliteration by combining a basic character to character mapping approach with rule based and Soundex based enhancements. Experimental results show that this approach effectively improves the word accuracy rate and average Levenshtein distance. Dhore et al. [7] have addressed the problem of machine transliteration where given a named entity in Hindi using Devanagari script need to be transliterated in English using CRF as a statistical probability tool and n-gram as a features set. In this approach, they presented machine transliteration of named entities for Hindi-English language pair using CRF as a statistical probability tool and n-gram as feature set. They achieved accuracy of 85.79% . Musa et al. [8] proposed Syllabification Algorithm based on Syllable Rules Matching for Malay Language. In this paper, authors presented a new syllabification algorithm for Malay language. 3. ANALYSIS OF ENGLISH AND PUNJABI SCRIPT Punjabi is an ancient and Indo-Aryan language which is mainly used in regions of Punjab. It is one of the world’s 14th widely spoken languages. Its script is Gurumukhi which is based on Devanagri. Punjabi is one of the Indian language and official language of Punjab. Along with Punjab it is also spoken in neighbouring areas such as Haryana and Delhi. There are 19 Vowels (10 Independent Vowels and 9 dependent Vowels) and 41 Consonants in Punjabi language. English on the other hand is based on Roman Script. It is one of the six international languages of United Nations. There are 26 letters in English. Out of which 21 are Consonants and 5 are Vowels. 4. METHODOLOGY AND EXPERIMENTAL SETUP In our work, we have English Punjabi parallel corpus of name entities. We are basically emphasizing on transliteration of our input text with the help of rule matching. We are using statistical approach for transliteration that is we train and test our model using MOSES toolkit. With the help of GIZA++ we have calculated translation probabilities which are based on relative
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 69 frequency. For constructing our rules we are using syllabification approach. The syllable is a combination of vowel and consonant pairs such as V, VC, CV, VCC, CVC, CCVC and CVCC. Almost all the languages have VC, CV or CVC structures so we are using vowel and consonants as the basic phonification unit. Some possible rules that we have constructed for syllabification are shown in table 1: Table1: Possible combinations for syllables Syllable structure Example Syllabified form(English) Syllabified form(Punjabi) V eka [e] [ka] [ਏ] [ਕਾ] CV tarun [ta] [run] [ਤ] [ਰੁਣ] VC angela [an] [ge] [la] [ਅਨ] [ਗੇ] [ਲਾ] CVC sidhima [si] [dhi] [ma] [ਿਸ] [ਧੀ] [ਮਾ] CCVC Odisha [o] [di] [sha] [ਓ] [ਡੀ] [ਸ਼ਾ] VCC obhika [o] [bhi] [ka] [ਓ] [ਭੀ] [ਕਾ] V=Vowel, C=Consonant If we have a word P then it will be syllabified in the form of {p1,p2,p3….pn} where p1,p2..pn are the individual syllables. We have syllabified our English input using rule based approach for which we have used the following algorithm. Algorithm for syllable extraction 1. Enter input string in English. 2. Identify Vowels and Consonants. 3. Identify Vowel-Consonant combination and consider it as one syllable. 4. Identify Consonants followed by Vowels and consider them as separate syllables. 5. Identify Vowels followed by two continuous Consonants as separate syllable. 6. Consider Vowel surrounded by two Consonants as separate syllable. 7. Transliterate each syllable into Punjabi.
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 70 For converting our English input to equivalent Punjabi output we first recognize the name entities from our input sentence. For this we have used Stanford NER tool [10] which gives us name entities in English text. The text entered by the user is first analyzed and then pre-processed. After pre-processing if the selected input is a proper name or location then it is passed to the syllabification module through which the syllables are extracted. After selection of equivalent probability, syllables are combined to form the Punjabi word. If the selected input is not a name entity it will be passed to the syllabification module and will be transliterated with the help of probability matching. Separate probabilities are calculated for those words which are not name entities. Table 2: Corpus English Words Punjabi Words English Sentences Training 25,500 25,500 Testing 6080 1500 Table 2 demonstrates the English-Punjabi parallel corpus that we have used for performing our experiment. In this we have trained our language model using IRSTLM toolkit [9]. Our transliteration system follows the steps which are represented in figure 1. Example: Source sentence: Teena is going to Haryana Target sentence: ਟੀਨਾ ਇਜ ਗੋਇੰਗ ਟੂ ਹਿਰਆਣਾ In this initially the name entities will be extracted which are Teena (proper name) and Haryana (location). After pre-processing, these words will be checked with the help of name entity recognizer for identifying whether they are name entities or not. If the word is a name entity of type proper name and location, they will be passed to the syllabification module. In this case the name Teena will be syllabified in the form of Tee na and location name Haryana will be syllabified in the form of Har ya na. After syllabification, these words will be checked for their corresponding English-Punjabi probabilities and equivalent Punjabi syllables are selected. The words other than the name entities are syllabified and matched against the separate probabilities. For example going are transliterated to go ing. Finally these syllables are joined to generate final output sentence. Through language model training we have calculated N-gram probabilities which are based on relative frequency. For calculating the N-gram probability we are using the following formula [11]. (1) Where probability of a word wn given a word wn-1 is count (wn-1, wn)/count (wn-1) For example:
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 71 For a name Kunal probabilities for every syllable can be as follows: = 0.8797654 = 0.7333333 Final Score=Prob(ਕੁ|ku)×Prob(ਣਾਲ|nal) =0.64516127 Figure 1: System architecture
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 72 After selecting the equivalent Punjabi syllable on the basis of probabilities these syllables are joined together and final output is obtained. 5. EVALUATION AND RESULTS We have used a corpus of 1500 sentences which had 15982 words. Among them 45% were name entities i.e. 7192 were proper names, locations, organization and Date & Time. Among these 88% were name entities of type person & location i.e. 6328 were person names or location names. From these our system was able to correctly transliterate 6109 name entities. Remaining 55 % i.e. 8790 words were those that were not name entities and from these our system was able to transliterate 7987 words correctly. This provided us with accuracy of 88.19%. The accuracy was calculated using the formula: *100 (2) Some of the names which are correctly transliterated by our system are as follows:. Table 3: Correct output spardha ਸਪਰਧਾ sanbosh ਸੰਬੋਸ਼ kamaldeep ਕਮਲਦੀਪ haryana ਹਿਰਆਣਾ shelly ਸ਼ੇਲੀ Some of the names which are not correctly transliterated by our system are as follows: Table 4: Incorrect output falkanvai ਫਲਕੰਵਆ gowryharan ਗੋਰੀਹਰਣ seerajuddeen ਸੀਰਾਜੁਿਦਨ kolkandkar ਕੋਲਕੰ ਡਕਰ 6. CONCLUSION In this paper we have performed our experiment using statistical machine translation tool. We have shown how we can extract syllables from our input text and perform transliteration of name
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.2, April 2013 73 entities (proper names and locations) and rest of the words other than name entities in source language to target language through probability calculation. This transliteration system will prove to be very beneficial. Through this approach we attained accuracy of 88.19%. In future we will work on remaining name entities that were not included in this experiment. 7. REFERENCES [1] Kamal Deep and Vishal Goyal, (2011) ”Development of a Punjabi to English transliteration system”. In International Journal of Computer Science and Communication Vol. 2, No. 2, pp. 521-526. [2] Shubhangi Sharma, Neha Bora and Mitali Halder, (2012) “English-Hindi Transliteration using Statistical Machine Translation in different Notation” International Conference on Computing and Control Engineering (ICCCE 2012). [3] Kamal Deep, Dr.Vishal Goyal, (2011) “Hybrid Approach for Punjabi to English Transliteration System” International Journal of Computer Applications (0975 – 8887) Volume 28– No.1. [4] Jasleen kaur Gurpreet Singh josan , (2011) “Statistical Approach to Transliteration from English to Punjabi”, In Proceeding of International Journal on Computer Science and Engineering (IJCSE), Vol. 3 Issue 4, p1518. [5] Er. Sheilly Padda, Rupinderdeep Kaur, Er. Nidhi, (2012) “Punjabi Phonetic: Punjabi Text to IPA Conversion” International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com ISSN 2250-2459, Volume 2, Issue 10. [6] Gurpreet Singh Josan, Gurpreet Singh Lehal, (2010) “A Punjabi to Hindi Machine Transliteration System” Computational Linguistics and Chinese Language Processing Vol. 15, No. 2, pp. 77-102. [7] Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar, (2012) “Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields.” International Journal of Computer Applications;6/15/2012, Vol. 48, p31. [8] Musa, Hafiz, Rabith A.kadir, Azreen Azman, M.taufik Abadullah, (2011) "Syllabification algorithm based on syllable rules matching for Malay language." Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science. World Scientific and Engineering Academy and Society (WSEAS). [9] To download IRSTLM toolkit http://www.statmt.org [10] Jenny Rose Finkel, Trond Grenager, and Christopher Manning, (2005) Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370. [11] Daniel Jurafsky, James H. Martin Speech and Language processing An Introduction to speech Recognition, natural language processing, and computational linguistics. Authors Deepti Bhalla is pursuing her M.Tech in Computer Science from Banasthali University, Rajasthan and is working as a Research Assistant in English-Indian Languages Machine Translation System Project sponsored by TDIL Programme, DEITY. She has her interest in Machine Translation specifically in English-Punjabi Language Pair. She has developed various tools on Punjabi Language Processing. Her current research interest includes Natural Language Processing and Machine Translation. Nisheeth Joshi is a researcher working in the area of Machine Translation. He has been primarily working in design and development of evaluation Matrices in Indian languages. Besides this he is also actively involved in the development of MT engines for English to Indian Languages. He is one of the expert empanelled with TDIL programme, Department of electronics Information Technology (DEITY), Govt. of India, a premier organization which foresees Language Technology Funding and Research in India. He has several publications in various journals and conferences and also serves on the Programme Committees and Editorial Boards of several conferences and journals. Iti Mathur is an assistant professor at Banasthali University. Her primary area of research is computational semantics and ontological engineering. Besides this she is also involved in the development of MT engines for English to Indian Languages. She is one of the experts empanelled with TDIL Programme, Department of Electronics Information Technology (DEITY), Govt. of India, a premier organization which foresees Language Technology Funding and Research in India. She has several publications in various journals and conferences and also serves on the Programme Committees and Editorial Boards of several conferences and journals.