English to punjabi machine translation system using hybrid approach of word s

629 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
629
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

English to punjabi machine translation system using hybrid approach of word s

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 350 ENGLISH TO PUNJABI MACHINE TRANSLATION SYSTEM USING HYBRID APPROACH OF WORD SENSE DISAMBIGUATION AND MACHINE TRANSLATION 1 Gurleen Kaur Sidhu, 2 Navjot Kaur 1 Department of Computer Science and Engineering, Sri Guru Granth Sahib World University Fatehgarh Sahib, Punjab 140406, India 2 Department of Computer Science and Engineering, Punjabi university Patiala, Punjab 140406, India ABSTRACT Machine Translation and Word Sense Disambiguation are most popular applications of Natural Language Processing, because Machine Translation is cheap and best to understand than any other language during conversation. Whereas Word Sense Disambiguation helps to get the correct meaning of particular word in which context that is used. In our system we are using hybrid approach with help of which we can disambiguate the words and can get best result of machine translation. Conditional Random Field algorithm with decision list using direct mapping is easiest method with best result to solve the problem of disambiguation. In our system, Conditional Random field, divide the data into categories and calculate the frequency of words with respect to the category. Category having maximum frequency in the sentence meaning will relates to that category. Accuracy of our System for correct sentences is 81.2% on the bases of tested sentences only. Keywords: Conditional Random Field, Machine Translation, Natural language, Word Sense disambiguation, Hybrid approach. I. INTRODUCTION During automatic translation of sentences there is a problem of incorrect sense in the target text. The process of assigning correct sense according to context is known as Word Sense Disambiguation. We have a lot of applications and online sites which are helpful to give the meaning of the input text. But they are not able to disambiguate the meanings. We try to solve this problem using hybrid approach of word sense disambiguation and machine translation. Machine INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 350-357 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 351 translation and word sense disambiguation are the most popular applications of Natural language processing. To process the data available on Internet in Blogs, website, social sites, and business site which are presented in natural language is known as Natural language processing. More information about history and overview of applications are discussed in Fig1. Introduction. Fig1. Introduction To review the previously used techniques on different languages are discussed in Literature survey. Methodology part is use to explain the proposed technique which is the combination of various sub-techniques or algorithms of Word sense disambiguation and Machine Translation. Result and discussion is use to discuss the advantages and disadvantages of the system. Conclusion explains the how much beneficial the proposed system is, accuracy is also discussed in this part. Future Work gives us the direction in this field. II. LITERATURE SURVEY Review of English study is given in Fig 2, in this brief introduction of six part of speech and their sub types are given. Whereas remaining two parts are preposition and article. Articles are use to distinguish the vowels & consonants, to define singular „a‟ used.
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 352 Fig 2. Review of Part of speech in English Review of research papers for techniques.[1]- Hybrid (statistical +rules) approach based transliteration system of person names; from a person name written in Punjabi (Gurumukhi Script), the system produces its English (Roman Script) transliteration. Experiments have shown that the performance is sufficiently high. The overall accuracy of system comes out to be 95.23%. Reasons behind the wrong answers of named entities are Multiple Transliterations, Wrong Input of Words, Character Gap, One-to-Multi mapping Problem.[2] The natural language processing is a multidisciplinary field at intersection of linguistic, psycholinguistic, Computer science and engineering, machine learning and statistics. Also gives the reasons of popularity of the Natural language processing day by day. More increase in business world more people move from one to another country, help counters are established everywhere to Conway the proper message need to process the natural language. [6]-Machine translation is used to translate the source text into the target text with or without the help of human assistance. Machine translation has various approaches: direct Translation method- word to word directly translate. Transfer-Based Translation- is done with the proper knowledge of the rule of any language in which we want to translate. Interlingua-based translation – inter-mediator is used to convert into target language. Corpus-based translation - is use the parallel corpus of source and target text. Hybrid translation- is made with the help of above all. Nancy ide (1998) [7] - define the various applications in which we can use the word sense disambiguation method. [11]- The supervised learning method of word Sense Disambiguation, which is Cosine Similarity. researcher extract two sets of features; the set of words that have occurred frequently in the text and Cosine similarity algorithm uses the concept of inner product of two vectors. After converting each context to a vector of words, cosine similarity measures the similarity between a new context and each existing context in the training corpus. [12] Researcher work on shahmukhi to Gurumukhi transliteration and try to remove the ambiguity problem. To different approaches are used for word sense disambiguation that are: state sequence representation as a Hidden Markov Model and N-gram in which small window of size -5+ is used. Accuracy for word Sense Disambiguation using both approaches is calculated more than 92%.
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 353 III. METHODOLOGY Fig 3. Flow Chat for Proposed System Algorithm for Proposed Punjabi to English Machine Translation System: Step1: START input text Step2: Check the text is present or not o If present then move to step3 Else o Display the message “please enter the text first” Step3: ANALYSIS the sentence o TOKENIZATION (Split sentence on the bases of white-space count the words) Repeat the next two steps for every token o PREPOCESSING (further divide into 2 subparts) o Text normalization (optional) Implement the proposed algorithm for American to British English o Sentence Differentiation Rules implement to check sentence is simple or compound o PART OF SPEECH TAGGING (DIRECT MAPPING IMPLEMENTED) After Analysis the sentence move on Step 4 Step4: SYNTHESIS the sentence o DIRECT MAPPING( WORD + POS ) o PRESENT then FETCH the MEANING (MOVE ON reorder) o Otherwise HYBRID APPROACH FOR WSD implement on sentence If (WORD+ POS ) having multiple CATEGORIES Increase the counter of all category(Repeat the above step for all tokens ) Check that category having (Ambiguous word+ maximum Frequency) assigns that meaning to the ambiguous word. Fetch the meaning move on next REORDER
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 354 o REORDER o According to target text. Step5: TRANSLATION ENGINE o OUTPUT ( after Reordering combine the words in the form of sentence and display) Step 6: END. IV. RESULTS AND DISCUSSION • First Case: general case is explained with 2 main examples that are give in below figures with their discussion according to their results. In this Simple sentence is entered as input which is correct in format our system show the output better than the previous one. Fig.4: Correct and incorrect Sentence with discussion • Random words used in sentence: System gives their meaning if present in the database but avoid generating the sentence. Fig.5 shows the Error given by our system due to incorrect formation of input sentence. That‟ s why our system gives the message try again. To check whether Sentence formation is incorrect Fig. 5: System gives Error
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 355 Fig.7 : Lack of Word Sense disambiguation Fig.8: Remove ambiguity of Words Our system uses the Conditional random field to remove the ambiguity of the words. In above fig. Input sentence is „we visited the bank and that was situated at the bank‟ . Bank word is ambiguous here. First we check the conjunction word so that meaning of words fetch according to the sub-parts. So in first sub-part there is no specific category the sentence relates to the general category so we fetch the meaning which is generally used most that is financial bank. Then we solve the second part here is the word ‟ situated‟ which is belongs to geography category. We fetch the both meanings of bank. But here condition is applied the word used in sentence with maximum frequency will be used. So we use the meaning of Bank related with geography category for second part. Then reorder the sentence with respect to their POS then generate the target sentence as display in the fig8. Inaccuracy of result: character-gap, wrong input, word not present in database.
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 356 V. CONCLUSION We conclude that Machine Translation and Word Sense Disambiguation are most popular applications of Natural Language Processing, because Machine Translation is cheap and best to understand than any other language during conversation. Whereas Word Sense Disambiguation helps to get the correct meaning of particular word in which context that is used. From Literature Survey, we conclude that the basic structure and various sub-parts of part of speech of both languages English and Punjabi. Also, know the previously implemented techniques by the different researchers. In our system we are using hybrid approach with help of which we can disambiguate the words and can get best result of machine translation. Conditional Random Field algorithm with decision list using direct mapping is easiest method with best result to solve the problem of disambiguation. Accuracy of our System is given below: Fig.9: Accuracy table for testing the system VI. FUTURE WORK • More techniques can combine with this system for more accuracy. • More data can use. • Categories can further classify into sub-parts. • Part of speech can more explore with sub-categories. VII. ACKNOWLEDGEMENTS As a part of my course I have taken the problem as “English to Punjabi Machine Translation System using Hybrid Approach of Word Sense Disambiguation and Machine Translation” as my Thesis Topic. I am very thankful to Mrs. Navjot Kaur, Assistant Professor, Punjabi University, and Patiala for giving me such a valuable support in doing my work. She provided all the relevant material that was sufficient for me to complete my thesis work. She provided help and time whenever asked for. Last but not least, a word of thanks for the authors of all those books and papers which I have consulted during my thesis work as well as for preparing the report. At the end thanks to the Almighty for not letting me down at the time of crisis and showing me the silver lining in the dark clouds.
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 357 VIII. REFERENCES JOURNAL [1]. Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration System, International Journal of Computer Applications (0975 – 8887)Volume 28– No.1, August 2011 [2]. Fabio Ciravegna, Recent Advances in Natural Language Processing, IEEE Computer Society 2003. [4]. J. Hutchins, An introduction to Machine Translation. Academic Press, 1992. [7]. Nancy Ide, Jean Veronis, Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art, 1998J. [8]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012, p.24. [9]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012, p.35. [10]. Durgesh D Rao, Machine Translation, pp.61-70, July1998. [13]. Kamaljeet Kaur Batra, G S Lehal, Rule Based Machine Translation of Noun Phrases from Punjabi to English, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 5, September 2010. [14]. P.Tamilselvi, S.K.Srivatsa, Case Based Word Sense Disambiguation Using Optimal Features, 2011 International Conference on Information Communication and Management IPCSIT vol.16 Singapore, (2011). BOOKS [15]. Wren & Martin, English Grammar and Composition, S.CHAND Publication, THESIS [6]. R.Harshawardhan,Rule Based Machine Translation System For English To Malayalam Language, Centre for Excellence in Computational Engineering and Networking, December 2011. [28]. Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration System, Punjabi university Patiala, September 2011. PROCEEDING PAPER [3]. Available: http://en.wikipedia.org/wiki/Natural_language_processing [11]. M. Nameh, S.M. Fakhrahmad, M. Zolghadri Jahromi, A New Approach to Word Sense Disambiguation Based on Context Similarity, Proceedings of the World Congress on Engineering 2011 Vol I, pp. 456-459. [12]. Tejinder Singh Saini, Gurpreet Singh Lehal Word Disambiguation in Shahmukhi to Gurmukhi Transliteration, Proceedings of the 9th Workshop on Asian Language Resources, Chiang Mai, Thailand, November 12 and 13, 2011, pages 79–87. [26]. Available at: http://en.wikipedia.org/wiki/Machine_translation [27]. Available at: http://en.wikipedia.org/wiki/Word-sense_disambiguation

×