Types of machine translation

4,345 views

Published on

Published in: Technology

Types of machine translation

  1. 1. Drop me a mail:Drop me a mail: rushdecoder@yahoo.comrushdecoder@yahoo.com Visit me at:Visit me at: http://http://rushdishams.googlepages.comrushdishams.googlepages.com 1Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh
  2. 2. Translation Approach  The translation process may be stated as: 1. Decoding the meaning of the source text 2. Re-encoding this meaning in the target language.  Machine translation can use a method based on linguistic rules-  words will be translated in a linguistic way  the most suitable words of the target language will replace the ones in the source language. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 2
  3. 3. Translation Approach  The success of machine translation requires the problem of natural language understanding to be solved first.  Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 3
  4. 4. Translation Approach  According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation.  These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 4
  5. 5. Translation Approach  Machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 5
  6. 6. Translation Approach  the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods.  But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 6
  7. 7. Types of Machine Translation Text Generation Syntactic Parsing Semantic Analysis Sentence Planning Source (Arabic) Target (English) Transfer Rules Direct: SMT, EBMT Interlingua
  8. 8. Rule based MT  The rule-based machine translation paradigm includes 1. transfer-based machine translation, 2. interlingual machine translation and 3. dictionary-based machine translation Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 8
  9. 9. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 9
  10. 10. Transfer based MT  Itis necessary to have an intermediate representation that captures the "meaning" of the original sentence in order to generate the correct translation  In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer- based MT, it has some dependence on the language pair involved. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 10
  11. 11. Transfer based MT  The original text is first analyzed morphologically and syntactically in order to obtain a syntactic representation.  This representation can then be refined to a more abstract level putting emphasis on the parts relevant for translation and ignoring other types of information. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 11
  12. 12. Transfer based MT  The transfer process then converts this final representation (still in the original language) to a representation of the same level of abstraction in the target language.  These two representations are referred to as "intermediate" representations.  From the target language representation, the stages are then applied in reverse. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 12
  13. 13. Transfer based MT
  14. 14. Transformation process  Morphological analysis Surface forms of the input text are classified as ○ to part-of-speech (e.g. noun, verb, etc.) and ○ sub-category (number, gender, tense, etc.) Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 14
  15. 15. Transformation process  Lexical categorization In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorization looks at the context of a word to try and determine the correct meaning in the context of the input. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 15
  16. 16. Transformation process  Lexical transfer This is basically dictionary translation the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 16
  17. 17. Transformation process  Structural transfer While the previous stages deal with words, this stage deals with larger constituents Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 17
  18. 18. Transformation process  Morphological generation  From the output of the structural transfer stage, the target language surface forms are generated. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 18
  19. 19. Transfer Types  Superficial transfer (or syntactic) This level is characterized by transferring "syntactic structures" between the source and target languages. It is suitable for languages in the same family or of the same type. for example in the Romance languages between Spanish, Catalan, French, Italian, etc. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 19
  20. 20. Transfer Types  Deep transfer (or semantic) This level constructs a semantic representation that is dependent on the source language. This representation can consist of a series of structures which represent the meaning. In these transfer systems predicates are typically produced. The translation also typically requires structural transfer. This level is used to translate between more distantly related languages (e.g. Spanish-English or Spanish- Basque, etc.) Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 20
  21. 21. Dependency Grammar
  22. 22. Case Grammar
  23. 23. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 23
  24. 24. Interlingual MT  the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language- independent representation.  The target language is then generated from the interlingua. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 24
  25. 25. Interlingual MT  In the direct approach, words are translated directly without passing through an additional representation.  In the transfer approach the source language is transformed into an abstract, less language- specific representation. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 25
  26. 26. Interlingual MT
  27. 27. Advantage and disadvantage  The advantage in multilingual machine translations is that no transfer component has to be created for each language pair  The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 27
  28. 28. Components  Dictionaries for analysis and generation  A conceptual lexicon, which is the knowledge base about events and entities known in the domain.  A set of projection rules (specific to the domain and the languages).  Grammars for the analysis and generation of the languages involved. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 28
  29. 29. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 29
  30. 30. Dictionary-based MT  The words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them  Dictionary lookups may be done with or without morphological analysis or lemmatisation  used to expedite manual translation, if the person carrying it out is fluent in both languages and therefore capable of correcting syntax and grammar. Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 30
  31. 31. Dictionary-based MT
  32. 32. Dictionary-based MT
  33. 33. Example-based MT Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 33
  34. 34. Example-based MT  characterized by its use of a bilingual corpus with parallel texts as its main knowledge base  It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning
  35. 35. Example-based MT  characterized by its use of a bilingual corpus with parallel texts as its main knowledge base  It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning
  36. 36. Example-based MT
  37. 37. Example-based MT  bilingual parallel corpora contain sentence pairs like the example shown in the table.  How much is that X ? corresponds to Ano X wa ikura desu ka.  red umbrella corresponds to akai kasa  small camera corresponds to chiisai kamera
  38. 38. Example-based MT  President Kennedy was shot dead during the parade. and The convict escaped on July 15th. We could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences.
  39. 39. Statistical MT Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh 39
  40. 40. Statistical MT  The idea behind statistical machine translation comes from information theory.  A document is translated according to the probability distribution p(e | f) that a string e in the target language (for example, English) is the translation of a string f in the source language (for example, French).
  41. 41. Statistical MT  The problem of modeling the probability distribution p(e | f) has been approached in a number of ways. One intuitive approach is to apply Bayes Theorem
  42. 42. where the translation model p(f | e) is the probability that the source string is the translation of the target string, and the language model p(e) is the probability of seeing that target language string string.
  43. 43. Statistical MT  Finding the best translation is done by picking up the one that gives the highest probability

×