SlideShare a Scribd company logo
1 of 6
Download to read offline
International Association of Scientific Innovation and Research (IASIR) 
(An Association Unifying the Sciences, Engineering, and Applied Research) 
International Journal of Emerging Technologies in Computational 
and Applied Sciences (IJETCAS) 
www.iasir.net 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 191 
ISSN (Print): 2279-0047 
ISSN (Online): 2279-0055 
A Generic Transliteration tool for CLIA & MT Applications 
Nishant Sinha, Atul Kumar and Vishal Mandpe 
Computer Science Department 
IMS Engineering College 
Ghaziabad, India 
Abstract: Transliteration has been a challenging problem in natural language processing specially in cross- lingual information access sub-domain. Transliteration from English as a source language becomes more tough due to non-phonetic property of English language. Different techniques are used to perform the task of efficient and correct transliteration. But till now rarely any achieve the goal .These techniques include the all four model based on source grapheme, source phoneme, target grapheme and target phoneme knowledge. Different approaches used for machine learning are maximum entropy model, decision-tree learning, memory-based learning, Genetic algorithm learning. Aim of the paper is to develop a tool that can be used as a sub-system of cross-lingual information access system. It takes a word or group of word as input and can generate the transliterated form of world so that output can used for information access .This tool is platform independent, multi-pair transliteration system . It is something different from editor tools which also transliterate words but they can rarely gives correct transliterated word as their first probable output. Editor tools only help in typing a document in our desired script and this process of editing generally done by transliteration process because we have only one input medium which is English 
Keywords: Transliteration; GIST; Xlit; Google tool; Syllable Structure; Maximum Entropy Model 
I. Introduction 
Translation of proper names is generally recognized as a significant problem in many multi-lingual text and speech processing applications. Even when hand-crafted translation lexicons used for machine translation (MT) and cross-lingual information retrieval (CLIR) provide significant coverage of the words encountered in the text, a significant portion of the tokens not covered by the lexicon are proper names and domain-specific terminology. This lack of translations adversely affects performance. For CLIR applications in particular, proper names and technical terms are especially important, as they carry the most distinctive information in a query as corroborated by their relatively low document frequency. Finally, in interactive IR systems where users provide very short queries (e.g. 2-5 words), their importance grows even further. 
Unlike specialized terminology, however, proper names are amenable to a speech-inspired translation approach. One tries, when writing foreign names in one’s own language, to preserve the way it sounds. i.e. one uses an orthographic representation which, when “read aloud” by a speaker of one’s language sounds as much like it would when spoken by a speaker of the foreign language — a process referred to as transliteration. Therefore, if a mechanism were available to render, say, an English namein its phonemic form, and another mechanism were available to convert this phonemic string into the orthography of, say, Hindi, then one would have a mechanism for transliterating English names using Hindi characters. The first step has been addressed extensively, for other obvious reasons, in the automatic speech synthesis literature. 
The problem of transliteration of word aims to conserve the sound in target language as the sound of source language and transcript the word in target language. This process involves four steps:-[1] 
1. Conversion of an English name into a phonemic representation 
2. Translation of the English phoneme sequence into a sequence of generalized initials and finals or GIFs 
3. Transformation of the GIF sequence into a sequence of hindi sounds unit 
4. Translation of the hindi sound sequence to a character sequence. 
II. Problem faced in transliteration from English to hindi 
Since Hindi uses a phonetic script (Devnagri), i.e., every character has one and only one pronunciation, transliterating from English to Hindi is essentially a process of determining the pronunciation of English words
Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 
191-196 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 192 
and approximating them using the Devnagri script. Transliterating would have been an easy task if there were simple one-to-one rules for mapping sequences of characters in one language to another. But this is rarely the case. Many common sequences in English have different mappings based on the surrounding characters or the part-of-speech in which the word is used. In some cases, the differences are not even attributable to factors in the text, but are due to the etymology of the word or the arbitrary nature of pronunciation in English. 
For example, c is pronounced differently in calendar, circus, and cheese; ea is pronounced differently in eat, threat, and heart; and many English multi-letter combinations are realized as a single sound in pronunciation (f, ff, ph, gh can be mapped to f). 
The context usually plays a vital role in determining the pronunciation. The context can be the position, or the neighbourhood of the character sequence in terms of syllables, other sound units or characters. 
With respect to position, if a word ends with c, then mostly it is pronounced as k (mosaic,disc, cardiac etc.). Similarly, if a word ends with e, then in most cases it remains silent without producing any sound (rise, promise, Jane, rose etc.). But context information is not adequate for disambiguating g in goat and ginger. 
If we take a fixed number of characters as context, then the letter c followed by l, r or t is always pronounced as k. Also the phoneme ough is pronounced as af if its previous character is one of {c, d, n, r, t}. But, th is pronounced differently in though and thought even though it is followed by ough both cases.[2] Given these complexities a simple way of transliterating may be via lookup in a pronunciation dictionary. But, this will not work for new words and proper norms. Also, any pronunciation dictionary would be correct only for the dialect it is created for. For example, an American English pronunciation dictionary would give awkward results for people who are used to the British accent .Thus, what we need is a system that can use a pronunciation dictionary or a corpus of transliterated examples and induce from this a set of rules. This can then be easily adapted for different dialects and accents provided we are able to find a sufficient number of training examples. 
III. Test analysis of Different Transliterational Editor 
We have tested four tool available for editing in hindi which uses the concept of transliteration for the purpose of editing in hindi. These are available as a web-based service . These four tools are:- 
1. Xlit of CDAC-Mumbai 
2. GIST of CDAC-Pune 
3. Google Transliteration 
4. Microsoft tool for hindi writing 
Our test data are selected from different domain likes human names, building name, English name,city name,English words etc . 
Test Results :- English words 
Roman script 
देवनागरी लिपि 
Xlit 
GIST 
Google 
Microsoft 
Abandon 
अबंदों 
अबॅन्डन 
अबंदों 
आबंदों 
Abbreviate 
अब्बरेवीते 
अब्रीववएट 
अब्ब्रेववअते 
अब्ब्रेववयाते 
Shrink 
श्रंक 
श्ींगक 
श्श्ंक 
शरंक 
Volcano 
वोलकनो 
वॉलकेनो 
वोल्कानो 
वोलकनों 
Zooparasite 
ज़ूपरासाइट 
ज़ूपॅरसाइट 
ज़ूपराससते 
ज़ूपरसीटे 
Initial 
इनीतील 
इनीसशएल 
इनननतअल 
इननश्यल 
Injustice 
इंज़ूसटटस 
ईन्जस्टटस 
इन्जुस्टतस 
इञ्जुस्टतके 
Quotient 
कोटेंट 
क्वोशन्ट 
उओनतएन्त 
कुओटीएंट 
Honour 
होनौर 
ओनर 
होनौर 
होनौर 
Knight 
कनाइट 
कस्क्नघ्ट 
स्क्नघ्त 
नाइट
Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 
191-196 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 193 
Hindi Human names 
Roman script 
देवनागरी लिपि 
Xlit 
GIST 
Google 
Microsoft 
John 
जॉन 
जॉन 
जॉन 
जॉन 
Atul kumar 
अतुल कुमार 
अतुल कुमार 
अतुल कुमार 
अतुल कुमार 
Raman 
रमन 
रामन 
रमण 
रमन 
Rajan 
राजन 
राजन 
राजन 
राजन 
Rajat 
रजट 
रजत 
रजत 
रजत 
manu saxena 
मनु सैक्सेना 
मनु सक्सेना 
मनु सक्सेना 
मनु सक्सेना 
prashant khanduri 
प्रशांत खंद़ूरी 
प्रशांत खंड़ूरी 
प्रशांत खंड़ूरी 
प्रशांत खनदुरी 
megha jain 
मेघा जैन 
मेघा जैन 
मेघा जैन 
मेघा जैन 
anil agrawal 
अननल अग्रवाल 
अननल अग्रवाल 
अननल अग्रवाल 
अननल अग्रवाल 
amitabh bachchan 
असमताभ बच्चान 
असमताभ बच्चन 
असमताभ बच्चन 
असमताभ बच्चन 
jaya bhaduri 
जय भादुरी 
जया भादुरी 
जाया बहादुरी 
जया भादुरी 
kamal shrivastva 
कमल श्श्वसत्व 
कमल श्ीवाटतव 
कमल श्ीवाटतव 
कमाल श्ीवासत्व 
ravi rituraj 
रवव ऋतुराज 
रवी ऋतुराज 
रवव ऋतुराज 
रवव ऋतुराज 
Hans bhardwaj 
हँस भाडडवाज 
हंस भारद्वाज 
हंस भरद्वाज 
हंस भारद्वाज 
Shashi goswami 
शसश गोटवामी 
शसश गोटवामी 
शसश गोटवामी 
शसश गोटवामी 
Roshan seth 
रोशन सेठ 
रोशन सेठ 
रोशन सेठ 
रोशन सेठ 
Lata mangeshakar 
लट मंगेशकर 
लता मांगेशाकर 
लता मंगेशकर 
लता मंगेशकर 
Mohammad rafi 
मोहम्मद राफ़ी 
मोहम्मद रफ़ी 
मोहम्मद रफ़ी 
मोहम्मद रफ़ी 
Sachin tendulakar 
सश्चन तेंदुलाकर 
सश्चन तेंदुलकर 
सश्चन तेंदुलकर 
सश्चन तेंदुलकर 
Mahela Jayawardene 
महेला जयवाडडने 
महला जयवदेने 
महेला जयवादेने 
महेला जयवरदेने 
Ram iqbal 
राम इकबाल 
राम इकबाल 
राम इकबाल 
राम इकबाल 
Azizul haque 
अजीज़ूल हकुए 
अजीजुल हक 
अजीजुल हकुए 
अजीजुल हॉक 
Vasistha narayan 
वसीटथा नारायण 
वाससटथा नारायण 
वससटथा नारायण 
वसीटथा नारायण 
Tenzin priyadarshi 
तेंझिन वप्रयदशी 
तेनस्जन वप्रयादशी 
तेनस्जन वप्रयदशी 
टेंजीन वप्रयदशी 
amartya sen 
अमरत्य सेन 
अमत्यड सेन 
अमत्यड सेन 
अमत्यड सेन 
Gautam Buddha 
गौतम बुड्ढा 
गौतम बुद्ध 
गौतम बुद्ध 
गौतम बुद्ध 
Aryabhata 
अयडभता 
आयडभट 
अयाडभाता 
आयडभाता 
Asvaghosa 
अटवाघोसा 
अटवघोस 
अटवघोसा 
अटवाघोसा 
Devakinandan khatri 
देवककनांडें खतरी 
देवक़ीनंदन खत्री 
देवक़ीनंदन खत्री 
देवककनन्दन खात्री 
Bhartendu harischandra 
भरतेंदु हररसचन्र 
भारतेन्दु हररश्चंर 
भारतेंदु हररश्चंर 
भारतेन्दु हररश्चंर 
English names 
Roman script 
देवनागरी लिपि 
Xlit 
GIST 
Google 
Microsoft 
Christopher manning 
किसोफर मैनीग 
किसटोफर मॅननंग 
श्िटतोफेर मंननंग 
किटटोफर मनननंग 
William macCartney 
ववसलयम मचतडने 
ववसलयम मच्चत्नेय 
ववस्ल्लं मच्कात्नी 
ववसलयम मककरत्ने 
Barack obama 
बरैक औबामा 
बरच्क ओबामा 
बरैक ओबामा 
बराक ओबामा 
John MacCain 
जॉन मचैन 
जॉन मच्चैन 
जॉन मकैन 
जॉन मककईन 
Adam smith 
अदाम स्टमथ 
आदम स्टमथ 
अदम स्टमथ 
एडम स्टमथ 
John rawl 
जॉन रावल 
जॉन रावल 
जॉन रावल 
जॉन रावल 
Karl Marx 
करल मारक्स 
कालड मरॄ 
कालड माक्सड 
कालड माक्सड 
Immanuel Kant 
इमानुएल कंट 
इम्मनुएल कांत 
इम्मानुएल कान्त 
इम्मानुएल कान्त 
Antonio Gramsci 
अंतोऊननय ग्रामटक़ी 
एंटोननयो ग्रस्म्टच 
अंतोननयो ग्राम्टक़ी 
अंटोननओ ग्रंसक़ी
Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 
191-196 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 194 
William Shakespeare 
ववसलयम सहाकेटपेरे 
ववसलयम शेक्सवपयर 
ववस्ल्लं शकेटपारे 
ववसलयम शेक्सवपयर 
Isaac newton 
इसाक न्य़ूतों 
आइजक न्य़ूटन 
इसाक नेव्तों 
इसाक न्य़ूटन 
Stephen hawking 
टटेप्हें हवककंग 
टटीफन वी 
टटेफेन हस्व्कंग 
टटीफन हवककंग 
Building names 
Roman script 
देवनागरी लिपि 
Xlit 
GIST 
Google 
Microsoft 
Redfort 
रेडफोटड 
रेद्फोतड 
रेडफोटड 
रेडफोटड 
Qutub minar 
कुतुब समनार 
कुतुब समनार 
कुतुब मीनार 
कुतुब मीनार 
taz mahal 
ताज महाल 
तेज महल 
ताज महल 
ताज महल 
rashtrapati bhavan 
राष्ट्रपनत भवन 
राष्ट्रपनत भवन 
राष्ट्रपनत भवन 
राष्ट्रपनत भवन 
Parliament house 
परसलयामेंट हाउ 
पासलडयामैंट हाऊस 
पसलडअमेंट हाउस 
पासलडयामेंट हाउस 
Madhur bhavan 
मधुर भवन 
मधुर भवन 
मधुर भवन 
मधुर भवन 
Buland Darwaza 
बुलैंड दरवजा 
बुलंद दवाडजा 
बुलंद दरवाजा 
बुलंद दरवाजा 
Golghar 
गोलघर 
गोलघर 
गोलघर 
गोलघर 
Sanchi stupa 
संश्च सत़ूप 
सांची टतुप 
साँची टत़ूप 
सांची टत़ूप 
Lotus temple 
लॉटुस तेम्पल 
लोटस टेंपल 
लोतुस टेम्पले 
लोटस टैम्पल 
City names 
Roman script 
देवनागरी लिपि 
Xlit 
GIST 
Google 
Microsoft 
Varanasi 
वरणासी 
वरानसी 
वाराणसी 
वाराणसी 
Ara 
अर 
अरा 
अर 
अरा 
Gwalior 
गवालीओर 
ग्वासलयर 
ग्वासलयर 
ग्वासलयर 
New delhi 
न्य़ू डडलही 
न्यु टदल्ली 
न्य़ू डेल्ही 
न्य़ू टदल्ली 
Banglore 
बंगलौर 
बंगलोर 
बंगलोरे 
बंगलोरे 
Chandigarh 
चन्दीगरह 
चंडीगढ़ 
चंडीगढ़ 
चंडीगढ़ 
Allahabad 
अल्लहबद् 
इलाहाबाद 
अल्लाहाबाद 
इलाहाबाद 
Lucknow 
लकनोव 
लकनाउ 
लखनऊ 
लखनऊ 
Tiruchirappalli 
नतरूश्चरपपल्ली 
नतरुनिरप्पस्ल्ल 
नतरुश्चराप्पल्ली 
नतरुश्चरपपल्ली 
thiruvananthapuram 
टठरुवानेंटहप़ूरम 
श्थरुवनन्थपुरम 
श्थरुवानान्थापुरम 
श्थरुवनंथपुरम 
IV. Test Analysis 
Type of Word 
Transliterator 
English Word 
Hindi Human name 
English name 
Building Name 
City name 
C 
I 
C 
I 
C 
I 
C 
I 
C 
I 
Xlit System 
1 
10 
10 
30 
2 
10 
5 
10 
0 
10 
GIST system 
8 
23 
3 
6 
6 
Google System 
0 
23 
4 
8 
5 
Microsoft System 
1 
21 
6 
10 
7 
C - No of Correct output I - No of Input 
V. Comment on different System 
Xlit:- This system is based on learning by Genetic Algorithm. So in process of pure learning it gives result based on rules derived by its system . it does not use the corpus directly to transliterated available word means It does not correctly transliterated the words from which this system learn as the dictionary word result by this system is poor which is about (1/10). 
GIST:- This system gives good result for dictionary word. Perhaps it uses the learning corpus directly for transliterating dictionary words. It is also good for city names it means it has also a good parallel corpus of well-known named entities.
Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 
191-196 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 195 
Google tool:- This system is based on pattern recongnition from a large corpus of document available worldwide on internet . so its trained is good for all category of words except the English human name . it means there is less information is available on WWW by which it can learn to transliterate this type of words. 
Microsoft tool:- Very less is known about this system but it gives good result. Dictionary words result of this system is poor. It is good in hindi names English names result is also poor . So it may be possible that it directly uses the fixed rules by which any hindi words is represented in English . 
VI. Techniques which Can be explored 
Syllabification Approach 
Syllable Structure 
1. Count of no. of syllables in a word is roughly/intuitively the no. of vocalic segments in a words. 
2. Presence of a vowel is an obligatory element in the structure of a syllable. This vowel is called “nucleus”. 
3. Basic Configuration: (C)V(C). 
4. Part of syllable preceding the nucleus is called the onset. 
5. Elements coming after the nucleus are called the coda. 
6. Nucleus and coda together are referred to as the rhyme.[3] 
Ex- word sprint may 
Maximum Entropy Model 
In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints (called testable information), the probability distribution which best represents the current state of knowledge is the one with largest entropy.[4] 
The principle was first expounded by E.T. Jaynes in two papers in 1957. where he emphasized a natural correspondence between statistical mechanics and information theory. In particular, Jaynes offered a new and very general rationale why the Gibbsian method of statistical mechanics works. He argued that the entropy of statistical mechanics, and theinformation entropy of information theory, are principally the same thing. Consequently, statistical mechanics should be seen just as a particular application of a general tool of logical inference and information theory. 
In most practical cases, the testable information is given by a set of conserved quantities (average values of some moment functions), associated with the probability distribution in question. This is the way the maximum
Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 
191-196 
IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 196 
entropy principle is most often used in statistical thermodynamics. Another possibility is to prescribe some symmetries of the probability distribution. An equivalence between the conserved quantities and corresponding symmetry groups implies the same level of equivalence for both these two ways of specifying the testable information in the maximum entropy method. 
The maximum entropy principle is also needed to guarantee the uniqueness and consistency of probability assignments obtained by different methods, statistical mechanics and logical inference in particular. Strictly speaking, the trial distributions, which do not maximize the entropy, are actually not probability distributions. 
The maximum entropy principle makes explicit our freedom in using different forms of prior information. As a special case, a uniform prior probability density (Laplace's principle of indifference) may be adopted. Thus, the maximum entropy principle is not just an alternative to the methods of inference of classical statistics, but it is an important conceptual generalization of those methods. 
VII. Conclusion 
Overall the task of transliteration is a tough task which require a learning process, then apply those learned concept to determine the new word’s transliteration. It can give better result when we have a good learning technique. All the available tools are basically editor, they are not transliterator. They only help in writing contents in one’s own language script only if he knows the manual transliteration of his content in English. No one can use these tools if he does not know English. 
Here two approach discussed syllabification and maximum entropy model deals with the two aspect of transliteration process. Former deals with to reduce the context window of the letter i.e. in general if we consider context window size=3 then it means it consider the 7 letters for the context of a letter whether they all are in one syllable or not. If a letter in this context belong to other syllable of the word then it has no impact on the current considered letter’s phoneme. Later is a learning method base on Bayesian probability 
References 
[1] Virga and Khudanpur, 2003] Paola Virga and Sanjeev Khudanpur. Transliteration of Proper Names in Cross-Lingual Information Retrieval. In Proceedings of the Association for Computational Linguistics, 2003. 
[2] Automatic Derivation of Rules for Transliteration from English to Hindi: a Genetic Algorithm Approach, Alekha Mishra, Ananthakrishnan R,Sasikumar M CDAC-Mumbai. 
[3] Shankar Ananthakrishnan ,Statistical Syllabification of English Phoneme Sequences using Supervised and Unsupervised Algorithms. 
[4] K. Knight and J. Graehl. Machine Transliteration. In Computational Linguistics, pages 24(4):599–612, Dec. 1998. 
[5] Nasreen Abdul Jaleel and Leah S. Larkey. Statistical Transliteration for English-Arabic Cross Language Information Retrieval. In Conference on Information and Knowledge Management, pages 139–146, 2003. 
[6] Lee-Feng Chien Long Jiang, Ming Zhou and Chen Niu. Named Entity Translation with Web Mining and Transliteration. In International Joint Conference on Artificial Intelligence (IJCAL- 07), pages 1629–1634, 2007. 
[7] Slaven Bilac and Hozumi Tanaka. Direct Combination of Spelling and Pronunciation Information for Robust Back- transliteration. In Conferences on Computational Linguistics and Intelligent Text Processing, pages 413–424, 2005.

More Related Content

What's hot

Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiPadma Metta
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianIDES Editor
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002IJARTES
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched textsHarsh Jhamtani
 

What's hot (19)

Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
8 issues in pos tagging
8 issues in pos tagging8 issues in pos tagging
8 issues in pos tagging
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Nlp
NlpNlp
Nlp
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Moses
MosesMoses
Moses
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 

Viewers also liked

C0324011015
C0324011015C0324011015
C0324011015theijes
 
Ug506 m pipe-guide
Ug506 m pipe-guideUg506 m pipe-guide
Ug506 m pipe-guidewenroulei
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldabaux singapore
 

Viewers also liked (9)

Bpra1751
Bpra1751Bpra1751
Bpra1751
 
C0324011015
C0324011015C0324011015
C0324011015
 
2012-slk-class.pdf
2012-slk-class.pdf2012-slk-class.pdf
2012-slk-class.pdf
 
Rff dp-10-61
Rff dp-10-61Rff dp-10-61
Rff dp-10-61
 
Ug506 m pipe-guide
Ug506 m pipe-guideUg506 m pipe-guide
Ug506 m pipe-guide
 
12-R-Class_110817.pdf
12-R-Class_110817.pdf12-R-Class_110817.pdf
12-R-Class_110817.pdf
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Similar to Ijetcas14 575

Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...kevig
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...IOSR Journals
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET Journal
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsAdnanBaloch15
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...ijnlc
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
A Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration SystemA Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration SystemEditor IJCATR
 
A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemEditor IJCATR
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 

Similar to Ijetcas14 575 (20)

Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Ijetcas14 444
Ijetcas14 444Ijetcas14 444
Ijetcas14 444
 
A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems A Context-based Numeral Reading Technique for Text to Speech Systems
A Context-based Numeral Reading Technique for Text to Speech Systems
 
**JUNK** (no subject)
**JUNK** (no subject)**JUNK** (no subject)
**JUNK** (no subject)
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
CONSTRUCTION OF ENGLISH-BODO PARALLEL TEXT CORPUS FOR STATISTICAL MACHINE TRA...
 
F017163443
F017163443F017163443
F017163443
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
B0340710
B0340710B0340710
B0340710
 
A Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration SystemA Review on a web based Punjabi t o English Machine Transliteration System
A Review on a web based Punjabi t o English Machine Transliteration System
 
A Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration SystemA Review on a web based Punjabi to English Machine Transliteration System
A Review on a web based Punjabi to English Machine Transliteration System
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 

More from Iasir Journals (20)

ijetcas14 650
ijetcas14 650ijetcas14 650
ijetcas14 650
 
Ijetcas14 648
Ijetcas14 648Ijetcas14 648
Ijetcas14 648
 
Ijetcas14 647
Ijetcas14 647Ijetcas14 647
Ijetcas14 647
 
Ijetcas14 643
Ijetcas14 643Ijetcas14 643
Ijetcas14 643
 
Ijetcas14 641
Ijetcas14 641Ijetcas14 641
Ijetcas14 641
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
Ijetcas14 632
Ijetcas14 632Ijetcas14 632
Ijetcas14 632
 
Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
 
Ijetcas14 619
Ijetcas14 619Ijetcas14 619
Ijetcas14 619
 
Ijetcas14 615
Ijetcas14 615Ijetcas14 615
Ijetcas14 615
 
Ijetcas14 608
Ijetcas14 608Ijetcas14 608
Ijetcas14 608
 
Ijetcas14 605
Ijetcas14 605Ijetcas14 605
Ijetcas14 605
 
Ijetcas14 604
Ijetcas14 604Ijetcas14 604
Ijetcas14 604
 
Ijetcas14 598
Ijetcas14 598Ijetcas14 598
Ijetcas14 598
 
Ijetcas14 594
Ijetcas14 594Ijetcas14 594
Ijetcas14 594
 
Ijetcas14 593
Ijetcas14 593Ijetcas14 593
Ijetcas14 593
 
Ijetcas14 591
Ijetcas14 591Ijetcas14 591
Ijetcas14 591
 
Ijetcas14 589
Ijetcas14 589Ijetcas14 589
Ijetcas14 589
 
Ijetcas14 585
Ijetcas14 585Ijetcas14 585
Ijetcas14 585
 
Ijetcas14 584
Ijetcas14 584Ijetcas14 584
Ijetcas14 584
 

Recently uploaded

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 

Ijetcas14 575

  • 1. International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 191 ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 A Generic Transliteration tool for CLIA & MT Applications Nishant Sinha, Atul Kumar and Vishal Mandpe Computer Science Department IMS Engineering College Ghaziabad, India Abstract: Transliteration has been a challenging problem in natural language processing specially in cross- lingual information access sub-domain. Transliteration from English as a source language becomes more tough due to non-phonetic property of English language. Different techniques are used to perform the task of efficient and correct transliteration. But till now rarely any achieve the goal .These techniques include the all four model based on source grapheme, source phoneme, target grapheme and target phoneme knowledge. Different approaches used for machine learning are maximum entropy model, decision-tree learning, memory-based learning, Genetic algorithm learning. Aim of the paper is to develop a tool that can be used as a sub-system of cross-lingual information access system. It takes a word or group of word as input and can generate the transliterated form of world so that output can used for information access .This tool is platform independent, multi-pair transliteration system . It is something different from editor tools which also transliterate words but they can rarely gives correct transliterated word as their first probable output. Editor tools only help in typing a document in our desired script and this process of editing generally done by transliteration process because we have only one input medium which is English Keywords: Transliteration; GIST; Xlit; Google tool; Syllable Structure; Maximum Entropy Model I. Introduction Translation of proper names is generally recognized as a significant problem in many multi-lingual text and speech processing applications. Even when hand-crafted translation lexicons used for machine translation (MT) and cross-lingual information retrieval (CLIR) provide significant coverage of the words encountered in the text, a significant portion of the tokens not covered by the lexicon are proper names and domain-specific terminology. This lack of translations adversely affects performance. For CLIR applications in particular, proper names and technical terms are especially important, as they carry the most distinctive information in a query as corroborated by their relatively low document frequency. Finally, in interactive IR systems where users provide very short queries (e.g. 2-5 words), their importance grows even further. Unlike specialized terminology, however, proper names are amenable to a speech-inspired translation approach. One tries, when writing foreign names in one’s own language, to preserve the way it sounds. i.e. one uses an orthographic representation which, when “read aloud” by a speaker of one’s language sounds as much like it would when spoken by a speaker of the foreign language — a process referred to as transliteration. Therefore, if a mechanism were available to render, say, an English namein its phonemic form, and another mechanism were available to convert this phonemic string into the orthography of, say, Hindi, then one would have a mechanism for transliterating English names using Hindi characters. The first step has been addressed extensively, for other obvious reasons, in the automatic speech synthesis literature. The problem of transliteration of word aims to conserve the sound in target language as the sound of source language and transcript the word in target language. This process involves four steps:-[1] 1. Conversion of an English name into a phonemic representation 2. Translation of the English phoneme sequence into a sequence of generalized initials and finals or GIFs 3. Transformation of the GIF sequence into a sequence of hindi sounds unit 4. Translation of the hindi sound sequence to a character sequence. II. Problem faced in transliteration from English to hindi Since Hindi uses a phonetic script (Devnagri), i.e., every character has one and only one pronunciation, transliterating from English to Hindi is essentially a process of determining the pronunciation of English words
  • 2. Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 191-196 IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 192 and approximating them using the Devnagri script. Transliterating would have been an easy task if there were simple one-to-one rules for mapping sequences of characters in one language to another. But this is rarely the case. Many common sequences in English have different mappings based on the surrounding characters or the part-of-speech in which the word is used. In some cases, the differences are not even attributable to factors in the text, but are due to the etymology of the word or the arbitrary nature of pronunciation in English. For example, c is pronounced differently in calendar, circus, and cheese; ea is pronounced differently in eat, threat, and heart; and many English multi-letter combinations are realized as a single sound in pronunciation (f, ff, ph, gh can be mapped to f). The context usually plays a vital role in determining the pronunciation. The context can be the position, or the neighbourhood of the character sequence in terms of syllables, other sound units or characters. With respect to position, if a word ends with c, then mostly it is pronounced as k (mosaic,disc, cardiac etc.). Similarly, if a word ends with e, then in most cases it remains silent without producing any sound (rise, promise, Jane, rose etc.). But context information is not adequate for disambiguating g in goat and ginger. If we take a fixed number of characters as context, then the letter c followed by l, r or t is always pronounced as k. Also the phoneme ough is pronounced as af if its previous character is one of {c, d, n, r, t}. But, th is pronounced differently in though and thought even though it is followed by ough both cases.[2] Given these complexities a simple way of transliterating may be via lookup in a pronunciation dictionary. But, this will not work for new words and proper norms. Also, any pronunciation dictionary would be correct only for the dialect it is created for. For example, an American English pronunciation dictionary would give awkward results for people who are used to the British accent .Thus, what we need is a system that can use a pronunciation dictionary or a corpus of transliterated examples and induce from this a set of rules. This can then be easily adapted for different dialects and accents provided we are able to find a sufficient number of training examples. III. Test analysis of Different Transliterational Editor We have tested four tool available for editing in hindi which uses the concept of transliteration for the purpose of editing in hindi. These are available as a web-based service . These four tools are:- 1. Xlit of CDAC-Mumbai 2. GIST of CDAC-Pune 3. Google Transliteration 4. Microsoft tool for hindi writing Our test data are selected from different domain likes human names, building name, English name,city name,English words etc . Test Results :- English words Roman script देवनागरी लिपि Xlit GIST Google Microsoft Abandon अबंदों अबॅन्डन अबंदों आबंदों Abbreviate अब्बरेवीते अब्रीववएट अब्ब्रेववअते अब्ब्रेववयाते Shrink श्रंक श्ींगक श्श्ंक शरंक Volcano वोलकनो वॉलकेनो वोल्कानो वोलकनों Zooparasite ज़ूपरासाइट ज़ूपॅरसाइट ज़ूपराससते ज़ूपरसीटे Initial इनीतील इनीसशएल इनननतअल इननश्यल Injustice इंज़ूसटटस ईन्जस्टटस इन्जुस्टतस इञ्जुस्टतके Quotient कोटेंट क्वोशन्ट उओनतएन्त कुओटीएंट Honour होनौर ओनर होनौर होनौर Knight कनाइट कस्क्नघ्ट स्क्नघ्त नाइट
  • 3. Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 191-196 IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 193 Hindi Human names Roman script देवनागरी लिपि Xlit GIST Google Microsoft John जॉन जॉन जॉन जॉन Atul kumar अतुल कुमार अतुल कुमार अतुल कुमार अतुल कुमार Raman रमन रामन रमण रमन Rajan राजन राजन राजन राजन Rajat रजट रजत रजत रजत manu saxena मनु सैक्सेना मनु सक्सेना मनु सक्सेना मनु सक्सेना prashant khanduri प्रशांत खंद़ूरी प्रशांत खंड़ूरी प्रशांत खंड़ूरी प्रशांत खनदुरी megha jain मेघा जैन मेघा जैन मेघा जैन मेघा जैन anil agrawal अननल अग्रवाल अननल अग्रवाल अननल अग्रवाल अननल अग्रवाल amitabh bachchan असमताभ बच्चान असमताभ बच्चन असमताभ बच्चन असमताभ बच्चन jaya bhaduri जय भादुरी जया भादुरी जाया बहादुरी जया भादुरी kamal shrivastva कमल श्श्वसत्व कमल श्ीवाटतव कमल श्ीवाटतव कमाल श्ीवासत्व ravi rituraj रवव ऋतुराज रवी ऋतुराज रवव ऋतुराज रवव ऋतुराज Hans bhardwaj हँस भाडडवाज हंस भारद्वाज हंस भरद्वाज हंस भारद्वाज Shashi goswami शसश गोटवामी शसश गोटवामी शसश गोटवामी शसश गोटवामी Roshan seth रोशन सेठ रोशन सेठ रोशन सेठ रोशन सेठ Lata mangeshakar लट मंगेशकर लता मांगेशाकर लता मंगेशकर लता मंगेशकर Mohammad rafi मोहम्मद राफ़ी मोहम्मद रफ़ी मोहम्मद रफ़ी मोहम्मद रफ़ी Sachin tendulakar सश्चन तेंदुलाकर सश्चन तेंदुलकर सश्चन तेंदुलकर सश्चन तेंदुलकर Mahela Jayawardene महेला जयवाडडने महला जयवदेने महेला जयवादेने महेला जयवरदेने Ram iqbal राम इकबाल राम इकबाल राम इकबाल राम इकबाल Azizul haque अजीज़ूल हकुए अजीजुल हक अजीजुल हकुए अजीजुल हॉक Vasistha narayan वसीटथा नारायण वाससटथा नारायण वससटथा नारायण वसीटथा नारायण Tenzin priyadarshi तेंझिन वप्रयदशी तेनस्जन वप्रयादशी तेनस्जन वप्रयदशी टेंजीन वप्रयदशी amartya sen अमरत्य सेन अमत्यड सेन अमत्यड सेन अमत्यड सेन Gautam Buddha गौतम बुड्ढा गौतम बुद्ध गौतम बुद्ध गौतम बुद्ध Aryabhata अयडभता आयडभट अयाडभाता आयडभाता Asvaghosa अटवाघोसा अटवघोस अटवघोसा अटवाघोसा Devakinandan khatri देवककनांडें खतरी देवक़ीनंदन खत्री देवक़ीनंदन खत्री देवककनन्दन खात्री Bhartendu harischandra भरतेंदु हररसचन्र भारतेन्दु हररश्चंर भारतेंदु हररश्चंर भारतेन्दु हररश्चंर English names Roman script देवनागरी लिपि Xlit GIST Google Microsoft Christopher manning किसोफर मैनीग किसटोफर मॅननंग श्िटतोफेर मंननंग किटटोफर मनननंग William macCartney ववसलयम मचतडने ववसलयम मच्चत्नेय ववस्ल्लं मच्कात्नी ववसलयम मककरत्ने Barack obama बरैक औबामा बरच्क ओबामा बरैक ओबामा बराक ओबामा John MacCain जॉन मचैन जॉन मच्चैन जॉन मकैन जॉन मककईन Adam smith अदाम स्टमथ आदम स्टमथ अदम स्टमथ एडम स्टमथ John rawl जॉन रावल जॉन रावल जॉन रावल जॉन रावल Karl Marx करल मारक्स कालड मरॄ कालड माक्सड कालड माक्सड Immanuel Kant इमानुएल कंट इम्मनुएल कांत इम्मानुएल कान्त इम्मानुएल कान्त Antonio Gramsci अंतोऊननय ग्रामटक़ी एंटोननयो ग्रस्म्टच अंतोननयो ग्राम्टक़ी अंटोननओ ग्रंसक़ी
  • 4. Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 191-196 IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 194 William Shakespeare ववसलयम सहाकेटपेरे ववसलयम शेक्सवपयर ववस्ल्लं शकेटपारे ववसलयम शेक्सवपयर Isaac newton इसाक न्य़ूतों आइजक न्य़ूटन इसाक नेव्तों इसाक न्य़ूटन Stephen hawking टटेप्हें हवककंग टटीफन वी टटेफेन हस्व्कंग टटीफन हवककंग Building names Roman script देवनागरी लिपि Xlit GIST Google Microsoft Redfort रेडफोटड रेद्फोतड रेडफोटड रेडफोटड Qutub minar कुतुब समनार कुतुब समनार कुतुब मीनार कुतुब मीनार taz mahal ताज महाल तेज महल ताज महल ताज महल rashtrapati bhavan राष्ट्रपनत भवन राष्ट्रपनत भवन राष्ट्रपनत भवन राष्ट्रपनत भवन Parliament house परसलयामेंट हाउ पासलडयामैंट हाऊस पसलडअमेंट हाउस पासलडयामेंट हाउस Madhur bhavan मधुर भवन मधुर भवन मधुर भवन मधुर भवन Buland Darwaza बुलैंड दरवजा बुलंद दवाडजा बुलंद दरवाजा बुलंद दरवाजा Golghar गोलघर गोलघर गोलघर गोलघर Sanchi stupa संश्च सत़ूप सांची टतुप साँची टत़ूप सांची टत़ूप Lotus temple लॉटुस तेम्पल लोटस टेंपल लोतुस टेम्पले लोटस टैम्पल City names Roman script देवनागरी लिपि Xlit GIST Google Microsoft Varanasi वरणासी वरानसी वाराणसी वाराणसी Ara अर अरा अर अरा Gwalior गवालीओर ग्वासलयर ग्वासलयर ग्वासलयर New delhi न्य़ू डडलही न्यु टदल्ली न्य़ू डेल्ही न्य़ू टदल्ली Banglore बंगलौर बंगलोर बंगलोरे बंगलोरे Chandigarh चन्दीगरह चंडीगढ़ चंडीगढ़ चंडीगढ़ Allahabad अल्लहबद् इलाहाबाद अल्लाहाबाद इलाहाबाद Lucknow लकनोव लकनाउ लखनऊ लखनऊ Tiruchirappalli नतरूश्चरपपल्ली नतरुनिरप्पस्ल्ल नतरुश्चराप्पल्ली नतरुश्चरपपल्ली thiruvananthapuram टठरुवानेंटहप़ूरम श्थरुवनन्थपुरम श्थरुवानान्थापुरम श्थरुवनंथपुरम IV. Test Analysis Type of Word Transliterator English Word Hindi Human name English name Building Name City name C I C I C I C I C I Xlit System 1 10 10 30 2 10 5 10 0 10 GIST system 8 23 3 6 6 Google System 0 23 4 8 5 Microsoft System 1 21 6 10 7 C - No of Correct output I - No of Input V. Comment on different System Xlit:- This system is based on learning by Genetic Algorithm. So in process of pure learning it gives result based on rules derived by its system . it does not use the corpus directly to transliterated available word means It does not correctly transliterated the words from which this system learn as the dictionary word result by this system is poor which is about (1/10). GIST:- This system gives good result for dictionary word. Perhaps it uses the learning corpus directly for transliterating dictionary words. It is also good for city names it means it has also a good parallel corpus of well-known named entities.
  • 5. Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 191-196 IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 195 Google tool:- This system is based on pattern recongnition from a large corpus of document available worldwide on internet . so its trained is good for all category of words except the English human name . it means there is less information is available on WWW by which it can learn to transliterate this type of words. Microsoft tool:- Very less is known about this system but it gives good result. Dictionary words result of this system is poor. It is good in hindi names English names result is also poor . So it may be possible that it directly uses the fixed rules by which any hindi words is represented in English . VI. Techniques which Can be explored Syllabification Approach Syllable Structure 1. Count of no. of syllables in a word is roughly/intuitively the no. of vocalic segments in a words. 2. Presence of a vowel is an obligatory element in the structure of a syllable. This vowel is called “nucleus”. 3. Basic Configuration: (C)V(C). 4. Part of syllable preceding the nucleus is called the onset. 5. Elements coming after the nucleus are called the coda. 6. Nucleus and coda together are referred to as the rhyme.[3] Ex- word sprint may Maximum Entropy Model In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints (called testable information), the probability distribution which best represents the current state of knowledge is the one with largest entropy.[4] The principle was first expounded by E.T. Jaynes in two papers in 1957. where he emphasized a natural correspondence between statistical mechanics and information theory. In particular, Jaynes offered a new and very general rationale why the Gibbsian method of statistical mechanics works. He argued that the entropy of statistical mechanics, and theinformation entropy of information theory, are principally the same thing. Consequently, statistical mechanics should be seen just as a particular application of a general tool of logical inference and information theory. In most practical cases, the testable information is given by a set of conserved quantities (average values of some moment functions), associated with the probability distribution in question. This is the way the maximum
  • 6. Nishant Sinha et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 9(2), June-August, 2014, pp. 191-196 IJETCAS 14-575; © 2014, IJETCAS All Rights Reserved Page 196 entropy principle is most often used in statistical thermodynamics. Another possibility is to prescribe some symmetries of the probability distribution. An equivalence between the conserved quantities and corresponding symmetry groups implies the same level of equivalence for both these two ways of specifying the testable information in the maximum entropy method. The maximum entropy principle is also needed to guarantee the uniqueness and consistency of probability assignments obtained by different methods, statistical mechanics and logical inference in particular. Strictly speaking, the trial distributions, which do not maximize the entropy, are actually not probability distributions. The maximum entropy principle makes explicit our freedom in using different forms of prior information. As a special case, a uniform prior probability density (Laplace's principle of indifference) may be adopted. Thus, the maximum entropy principle is not just an alternative to the methods of inference of classical statistics, but it is an important conceptual generalization of those methods. VII. Conclusion Overall the task of transliteration is a tough task which require a learning process, then apply those learned concept to determine the new word’s transliteration. It can give better result when we have a good learning technique. All the available tools are basically editor, they are not transliterator. They only help in writing contents in one’s own language script only if he knows the manual transliteration of his content in English. No one can use these tools if he does not know English. Here two approach discussed syllabification and maximum entropy model deals with the two aspect of transliteration process. Former deals with to reduce the context window of the letter i.e. in general if we consider context window size=3 then it means it consider the 7 letters for the context of a letter whether they all are in one syllable or not. If a letter in this context belong to other syllable of the word then it has no impact on the current considered letter’s phoneme. Later is a learning method base on Bayesian probability References [1] Virga and Khudanpur, 2003] Paola Virga and Sanjeev Khudanpur. Transliteration of Proper Names in Cross-Lingual Information Retrieval. In Proceedings of the Association for Computational Linguistics, 2003. [2] Automatic Derivation of Rules for Transliteration from English to Hindi: a Genetic Algorithm Approach, Alekha Mishra, Ananthakrishnan R,Sasikumar M CDAC-Mumbai. [3] Shankar Ananthakrishnan ,Statistical Syllabification of English Phoneme Sequences using Supervised and Unsupervised Algorithms. [4] K. Knight and J. Graehl. Machine Transliteration. In Computational Linguistics, pages 24(4):599–612, Dec. 1998. [5] Nasreen Abdul Jaleel and Leah S. Larkey. Statistical Transliteration for English-Arabic Cross Language Information Retrieval. In Conference on Information and Knowledge Management, pages 139–146, 2003. [6] Lee-Feng Chien Long Jiang, Ming Zhou and Chen Niu. Named Entity Translation with Web Mining and Transliteration. In International Joint Conference on Artificial Intelligence (IJCAL- 07), pages 1629–1634, 2007. [7] Slaven Bilac and Hozumi Tanaka. Direct Combination of Spelling and Pronunciation Information for Robust Back- transliteration. In Conferences on Computational Linguistics and Intelligent Text Processing, pages 413–424, 2005.