Machine translation from English to Hindi


Published on

Machine translation a part of natural language processing.The algorithm suggested is word based algorithm.We have done Translation from English to Hindi
submitted by
Garvita Sharma,10103467,B3
Rajat Jain,10103571,B6

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Machine translation from English to Hindi

  1. 1. MAJOR PRESENTATION  Project Title: “ENGLISH TO HINDI MACHINE TRANSLATION”  Jaypee Institute of Information Technology, CSE Department, May 2014  Project Supervisor: Mr. K. Vimal Kumar  SUbmitted by: Garvita Sharma(10103467) Rajat jain (10103571)  PAPER COMMUNICATED TO International on artificial intelligence 2014.(“word order based machine translation”)
  2. 2. NATURAL LANGUAGE PROCESSING  Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.
  3. 3. INTRODUCTION TO NLP  Analyze, understand and generate human languages just like humans do.  Applying computational techniques to language domain..  To explain linguistic theories, to use the theories to build systems that can be of social use..  Started off as a branch of Artificial Intelligence..  Borrows from Linguistics, Psycholinguistics, Cognitive Science & Statistics.  Make computers learn our language rather than we learn theirs.
  4. 4. NLP APPLICATIONS  Question answering  Who is the first Taiwanese president?  Text Categorization/Routing  e.g., customer e-mails.  Text Mining  Find everything that interacts with BRCA1.  Machine (Assisted) Translation  Language Teaching/Learning  Usage checking  Spelling correction  Is that just dictionary lookup?
  5. 5. APPLICATIONS OF NLP  Machine Translation  Database Access  Information Retrieval  Selecting from a set of documents the ones that are relevant to a query  Text Categorization  Sorting text into fixed topic categories  Extracting data from text  Converting unstructured text into structure data  Spoken language control systems  Spelling and grammar checkers
  6. 6. LEXICAL TRANSLATION PROBLEM  Even assuming monolingual disambiguation …  Style/register differences (eg domicile, merde, medical~anatomical~familiar)  Proper names (eg Addition Barrières)  Conceptual differences  Lexical gaps
  7. 7. MACHINE TRANSLATION APPROACHES  Grammar-based  Interlingua-based  Transfer-based  Direct  Example-based  Statistical
  8. 8. STATISTICAL MACHINE TRANSLATION Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora.
  9. 9. Rule-Based vs. Statistical MT  Rule-based MT:  Hand-written transfer rules  Rules can be based on lexical or structural transfer  Pro: firm grip on complex translation phenomena  Con: Often very labor-intensive -> lack of robustness  Statistical MT  Mainly word or phrase-based translations  Translation are learned from actual data  Pro: Translations are learned automatically  Con: Difficult to model complex translation phenomena
  10. 10. DOCUMENT VS SENTENCE  MT problem: generate high quality translations of documents  However, all current MT systems work only at sentence level!  Translation of independent sentences is a difficult problem that is worth solving  But remember that important discourse phenomena are ignored!  Example: How to translate English it to French (choice of feminine vs masculine it) or German (feminine/masculine/neuter it) if object referred to is in another sentence?
  11. 11. COMPUTING TRANSLATION PROBABILITIES  Given a parallel corpus we can estimate P(e | f) The maximum likelihood estimation of P(e | f) is: freq(e,f)/freq(f)  Way too specific to get any reasonable frequencies! Vast majority of unseen data will have zero counts!  P(e | f ) could be re-defined as:   Problem: The English words maximizing  P(e | f ) might not result in a readable sentence  P(e | f )  max eif j  P(ei | f j )
  12. 12. PROBLEMS IN STATISTICAL TRANSLATION  Sentence alignment  Statistical anomalies  Data dilution  Idioms  Different word orders  Out of vocabulary (OOV) words
  13. 13. PROPOSED ALGORITHM  The Algorithm that we are following is  Calculation of the individual Probabilities  Calculation of Probabilities according to the tagged words and their precedence words.  Combining the two probabilities.  Deriving the final probabilities.  Deriving the unavailable word from the dictionary  Adding word and corresponding meaning if not available in the dictionary as well  Restructuring of sentences.  Subject Verb Object (English) -> Subject Object Verb (Hindi)  OUTPUT
  16. 16. CONCLUSIONS  The project fulfils the following functionalities:  Parallel translation according to the probabilities from the tagged corpus.  Calculation of probability according to the precedence word and precedence word tagging.  Word meaning retrieval from the attached dictionary in case of absence of input word from the corpus.  Facility of new word and corresponding meaning addition in case of absence of word from the dictionary as well
  17. 17. FUTURE WORK  The future work can include the following functionalities:  Sentence rearrangement according to the output language grammar.  Introducing tagging in the target language as well.  Calculation of precedence word and tag of the target language in order to enhance accuracy and efficiency.
  18. 18. REFERENCES  [1] D. W. Oard and B. J. Dorr. A survey of multilingual text retrieval, Technical Report MIACS-TR-96- 19, University of Maryland,Institute for Advanced Computer Studies, College Park, MD, 1996.   [2] H. H. Chen, C. C. Lin, and W. C. Lin. Construction of a chineseenglish wordnet and its application to clir. In Proceedings of 5thInternational Workshop on Information Retrieval with Asian Languages, pages 189–196, 2000.   [3] ] Hsin-Chang Yang and Chung-Hong Lee, "Multilingual Information Retrieval using GHSOM.", In Proceedings of The Eighth International Conference on Intelligent Systems Design and Applications (ISDA-2008), Vol. 1, Kaohsiung, Taiwan, Nov. 26-28, 2008, pp. 225-228.   [4] ] Jaya Saraswati, Rajita Shukla Ripple P. Goyal Pushpak Bhattacharyya, Hindi to English Wordnet Linkage: Challenges and Solutions,
  19. 19. REFERENCES Cont..  [5] L. Ballesteros and W. B. Croft. Dictionary-based methods for cross lingual information retrieval. In Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pages 791–801, 1996.  [6]Rahul Kumar Yadav and Deepa Gupta, Annotation Guidelines for Hindi-English Word Alignment. International Conference on Asian Language Processing.2010.  [7] Raju Korra, Pothula Sujatha, Sidige Chetana, Madarapu Naresh Kumar. Performance Evaluation of Multilingual Information Retrieval (MLIR) System over Information Retrieval (IR) System.IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011  [8] Ramanathan, A., P. Bhattacharyya,J. Hegde, R.M. Shah, andM. Sasikumar.2008. Simple syntactic and morphological processing can help english-hindi statistical machine translation. In Proceedings of International Joint Conference on Natural Language Processing.