4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation
Upcoming SlideShare
Loading in...5

4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation






Total Views
Views on SlideShare
Embed Views



3 Embeds 123

http://expert-itn.eu 121
http://www.expert-itn.eu 1
http://www.linkedin.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation 4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Translation Presentation Transcript

  • Example-Based Machine Translation Josef van Genabith, CNGL, Dublin City University Khalil Sima’an, University of Amsterdam
  • Acknowledgements Notes are based on Carl, M. and Way, A., editors (2003). Recent Advances in Example-Based Machine Translation. Kluwer Academic Publishers, Dordrecht, The Netherlands Sandipan Dandapat “Mitigating  the Problems of SMT  using  EBMT”  PhD  Thesis,  DCU,  2012 Dandapat, S., Morrissey, S., Way, A., and van Genabith, J. (2012). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the EACL 2012 Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to MachineTranslation (HyTra), pages 48--58, Avignon, France. Gough, N. and Way, A. (2004). Robust Large-Scale EBMT with Marker-Based Segmentation. In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, (TMI 2004), page 95– 104,Baltimore, MD. Green, T. (1979). The Necessity of Syntax Markers: Two Experiments with Artificial Languages. Journal of Verbal Learning and Behavior, 18:481–496. Groves, D. and Way, A. (2006). Hybridity in MT: Experiments on the Europarl Corpus. In Proceedings of the 11th Annual Conference of the European Association for Machine Translation, (EAMT 2006), page 115–124, Oslo, Norway. Hutchins, J. (2005). Example-Based Machine Translation: a Review and Commentary. Machine Translation, 19(3–4):197–211. Lepage, Y. and Denoual,  E.  (2005c).  The  ‘purest’  EBMT  System  Ever  Built:  No Variables, No Templates, No Training, Examples, Just examples, Only Examples. In Proceedings of the 2nd Workshop on Example-based Machine Translation, a Workshop at the MT Summit X, page 81–90, Phuket, Thailand. Nagao, M. (1984). A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In Elithorn, A. and Banerji, R., editors, Artificial and Human Intelligence, page 173–180. North-Holland, Amsterdam. Somers, H., Dandapat, S., and Naskar, S. K. (2009). A review of EBMT using proportional analogy. In Proceedings of the 3rd Workshop on Example-Based Machine Translation (EBMT 2009), pages 53--60, Dublin, Ireland. Dekai Wu (2006) MT Model Space: Statistical versus Compositional versus Example-Based Machine Translation, Machine Translation (2005) 19:213-227
  • 3/23 Example (Sato & Nagao 1990) Input He buys a book on international politics Matches + Alignment From: Sandipan Dandapt, PhD, 2012 He buys a notebook. Kare wa nōto o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu. Recombination Result Kare wa kokusai seiji nitsuite kakareta hon o kau.
  • EBMT “.  .  .  translation  is  a  fine  and  exciting  art,  but  there  is  much  about  it   that is mechanical and routine.” Martin Kay (1997) SMT and EBMT systems are corpus-based approaches to MT SMT: phrase translation probabilities, word reordering probabilities, lexical weighting EBMT: usually lacks well defined probability model
  • EBMT and TM EBMT generally uses a sentence-aligned parallel text (TM) as the primary source of data. EBMT systems search the source side of the example-base for close matches to the input sentences and obtain corresponding target segments at runtime target segments are reused during recombination EBMT  is  often  linked  with  the  related  concept  of  “Translation   Memory”  (TM). TM is an interactive tool for human translators EBMT is a fully automatic translation
  • EBMT EBMT: supposed to be good on limited amounts of data and homogeneous data (lots of repetition) EBMT systems produce a good translation while SMT systems fail and vice versa
  • EBMT and SMT phrase-based SMT approach has proven to be the most successful MT approach in MT competitions e.g. NIST, WMT, IWSLT etc. SMT systems discard the actual training data once the translation model and language model have been estimated => cannot always guarantee good quality translations for sentences which closely match those in the training corpora
  • EBMT Two main approaches to EBMT: Runtime using proportional analogy Compile time using generalized translation template-based EBMT model
  • Background rule-based or data-driven MT Data driven MT: EBMT and SMT Corpus-based data driven approaches derive knowledge from parallel corpora to translate new input Mostly SMT today A few EBMT (hybrid) systems include CMU-EBMT (Brown, 2011) and Cunei (Phillips, 2011) No commercial EBMT?
  • Where did it start? Nagao (1984) “MT  by  analogy principle” “Man  does  not  translate  a  simple  sentence  by  doing  deep  linguistic   analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, ... then by translating these phrases into other language phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference.”  (Nagao, 1984, p.178)
  • EBMT core steps Translation in three steps: matching, alignment and recombination (Somers, 2003): Matching: finds the example or set of examples from the bitext which most closely match the source-language string to be translated. Alignment: extracts the source–target translation equivalents from the retrieved examples of the matching step. Recombination: produces the final translation by combining the target translations of the relevant subsentential fragments.
  • EBMT core steps Sandipan Dandapt, PhD, 2012
  • Informal Example Where can I find tourist information Where can I find ladies dresses⇔ payan kıyafetlerini nereden bulabilirim just in front of the tourist information⇔ turist bilgilerini hemen önünde Where can I find ⇔ nereden bulabilirim tourist information ⇔ turist bilgilerini Where can I find tourist information ⇔ turist bilgilerini nereden bulabilirim Update  example  base  …  
  • Varieties of EBMT EBMT systems differ widely in their matching stages involve a distance or similarity measure of some kind (e.g. edit distance) Character-Based Matching dynamic programming technique, e.g. Levenshtein distance a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision. Problem: system will chose (b) given (a)
  • Varieties of EBMT Word-Based Matching: Nagao (1984) uses dictionaries and thesauri to determine the relative word distance in terms of meaning (Sumita et al., 1990) a. The President agrees with the decision. b. The President disagrees with the decision. c. The President concurs with the decision. System will chose (c) given (a). Linguistics/knowledge heavy: WordNet,  thesaurus,  ontology,  …  
  • Varieties of EBMT Pattern-Based Matching similar examples can be used to abstract  “generalised”  translation   templates Brown (1999): NE equivalence classes, such as person, date and city some linguistic information, such as gender and number a. John Miller flew to Frankfurt on December 3rd. b. ⟨FIRSTNAME-M⟩ ⟨LASTNAME⟩ flew to ⟨CITY⟩ on ⟨MONTH⟩ ⟨ORDINAL⟩. c. ⟨PERSON-M⟩ flew to ⟨CITY⟩ on ⟨DATE⟩.
  • Varieties of EBMT Syntax-Based Matching Kaji et al. (1992): Source and target side parsers Alignment using bilingual dictionaries Generate translation templates
  • Varieties of EBMT X1[NP] no nagasa wa saidai 512 baito de aru ⇔ The maximum length of X1[NP] is 512 bytes X1[NP] no nagasa wa saidai X2[N] baito de aru ⇔ The maximum length of X1[NP] is X2[N] bytes Looks almost like HPB-SMT ….
  • Varieties of EBMT Marker-Based Matching Green (1979): Closed call marker words/morphs Use to chunk Veale and Way (1997), Gough and Way (2004)
  • Varieties of EBMT that is almost a personal record for me this autumn ⇔ c’ est pratiquement un record personnel pour moi cet automne that is almost a personal record for me this autumn ⇔ c’  est pratiquement un record personnel pour moi cet automne [<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’  est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne]
  • Varieties of EBMT [<DET>that is almost] [<DET>a personal record] [<PREP>for <PRON> me <DET> this autumn] ⇔ [<DET>c’  est pratiquement] [<DET>un record personnel] [<PREP>pour <PRON> moi <DET> cet automne] a. <DET>that is almost ⇔ <DET>c’  est pratiquement b. <DET>a personal record ⇔ <DET>un record personnel c. <PREP>for me this autumn ⇔ <PREP>pour moi cet automne a. <DET> is almost ⇔ <DET> est pratiquement b. <DET> personal record ⇔ <DET> record personnel c. <PREP> me this autumn ⇔ <PREP> moi cet automne
  • Varieties of EBMT a. <PREP> for ⇔ <PREP> pour b. autumn ⇔ automne
  • 23/23 Varieties of EBMT The monkey ate a peach. The man ate a peach. saru wa momo o tabeta. hito wa momo o tabeta monkey man The  …  ate  a  peach.   saru hito …  wa momo o tabeta The dog ate a rabbit. inu wa usagi o tabeta dog rabbit The  …  ate  a  …  .   inu usagi …  wa … o tabeta
  • Approaches to EBMT first introduced as an analogy-based approach to MT “case-based”,  “memory-based”  and  “experience-guided”  MT Many,  many  varieties  ….. two main approaches With or without preprocessing/training stage Pure/runtime EBMT vs. compiled EBMT
  • Approaches to EBMT Pure/runtime EBMT: (e.g. Lepage and Denoual, 2005b) No time consumed for training/preprocessing But:  runtime/translation  complexity  very  considerable  … Compiled approaches: (e.g. Al-Adhaileh and Tang,1999; Cicekli and G¨uvenir, 2001) Pre-compute units below sentence level before prediction/translation time
  • Pure/runtime EBMT Everything happens at the translation stage Lepage and Denoual (2005c) Based on proportional analogy (PA) type of analogical learning A : B :: C : D “A is to B as C is to D”
  • Analogical Reasoning A : B :: C : D “A is to B as C is to D” A global relationship between 4 objects “::”      ~      “=“ “Analogical  equation” A : B :: C : D? Can have one or more solutions Plato,  Aristotle,  … Artificial Intelligence
  • Analogies a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken
  • Analogies a. lungs are to humans as gills are to X? X = fish b. cat : kitten :: dog : X? X = puppy c. speak : spoken :: break : X? X = broken
  • Analogies a. lungs are to humans as gills are to fish b. cat : kitten :: dog : puppy c. speak : spoken :: break : broken Note: only (c) is a formal analogy! Computable using string operations …  That’s  the  guys  we’ll  be  concerned  with  most  (but  not  exclusively  ….)  
  • Solving Analogical Equations Lepage (1998) algorithm that solves analogical equations over strings or characters based on longest common subsequences, and edit distance can handle insertion/deletion of prefixes and suffixes (22a), exchange of prefixes/suffixes (22b), infixing (22c) parallel infixing (22d).
  • Solving Analogical Equations a. (French) répression : répressionnaire :: réaction : x ⇒ x=réactionnaire b. wolf : wolves :: leaf : x ⇒ x=leaves c. (German) fliehen : floh :: schließen : x ⇒ x=schloß d. (Proto-Semitic) yasriqu : sariq :: yanqimu : x ⇒ x=naqim
  • Solving Analogical Equations
  • Pure EBMT Analogy-Based EBMT “The  ‘purest’  EBMT  system  ever  built:  no  variables,  no  templates,  no   training, examples, just examples, only examples”   (Lepage and Denoual 2005c)
  • Pure EBMT Nadaron en el mar. Atravesaron el río nadando. Flotó en el mar. ???? Atravesó el río flotando
  • Pure EBMT 1. Find a pair ⟨A,B⟩ of sentences in the example set that satisfies the PA in Equation: A : B :: C(?) : It floated across the river Solving this results in C = It floated in the sea. 2. Take the translations corresponding to A, B and C : A′,B′  and C′. 3. Solve Equation: A′  : B′  :: C′  : D′  (3.3) D′  is the desired translation.
  • Pure EBMT This is complex O(n²) possibilities for ⟨A,B⟩ Quadratic time for longest common subsequences and edit distances Time bounded solutions Heuristics  …….  (!)
  • Pure EBMT
  • Pure EBMT analogical equation Given the three entities (A, B, and C) of a PA Lepage (1998): algorithm to solve an analogical equation to construct the fourth entity (D). Simple and some good results ALEPH EBMT system did very well on data from the IWSLT 2004 competition, coming a close second to the competition winner on all measures(Lepage and Denoual, 2005b, p.273). GREYC system
  • Pure EBMT: Challenges But: Low recall Processing time when example base gets large .. Yea : Yep :: At five a.m. : At five p.m. Yea : Yep :: At five a.m. : At five p.m.
  • Pure EBMT: Challenges
  • Pure EBMT: Challenges
  • Compiled/off-line EBMT learns translation templates from parallel sentences (Cicekli and G¨uvenir, 2001)
  • Compiled/off-line EBMT a. I will drink orange juice : portakal suyu içeceğim b. I will drink coffee : kahve içeceğim => a. I will drink : içeceğim b. coffee : kahve c. orange juice : portakal suyu => a. I will drink XS : XT içeceğim b. XS coffee : kahve XT c. XS orange juice : portakal suyu XT
  • Compiled/off-line EBMT Translation templates essentially reduce the data-sparsity problem by generalizing some of the word sequences Another example: the work on the marker based approach we saw earlier on
  • Compiled/off-line EBMT Boundary friction Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. Der schöne Junge aß sein Frühstück I saw the handsome boy. Ich sah den schönen Jungen. A woman entered the room. Eine Frau betrat den Raum. Output: den schönen Jungen betrat den Raum Solutions? Labelled fragments (remember where you got the fragment from – use its context) Target-language grammar Target language model (as in SMT)
  • EBMT, SMT, RBMT Dekai Wu MT model space: statistical versus compositional versus examplebased machine translation a perspective on EBMT from a statistical MT standpoint What is the definition of EBMT? Do we even know what EBMT is? Is there a strict definition of EBMT, or are there simply a large number of different models all using corpora, rules, and statistics to varying degrees? Is X a kind of EBMT model? Does X’s  model  qualify  as   EBMT but not SMT? Are all SMT models (perhaps excluding the IBM models) actually EBMT models as well? Are EBMT models actually SMT models? Can a rule-based model be example-based? Statistical? a three-dimensional  “MT  model  space
  • EBMT, SMT, RBMT Dekai Wu
  • EBMT, SMT, RBMT Nagao (1984)  first  proposed  “translation  by  analogy”  (cf.  Lepage and Denoual 2005) analogical models arising in the mid-1980s under various similar names  including  “case-based  reasoning”  (CBR)  as  in Kolodner (1983a,b),  “exemplar-based  reasoning”  as  in  Porter  and  Bareiss (1986) or Kibler and  Aha  (1997),  “instance-based  reasoning”  as  in   Aha  et  al.  (1991),  “memory based reasoning”  as  in  Stanfill and Waltz (1988),  or  “analogy-based  reasoning”  as  in Hall (1989) or Veloso and Carbonell (1993). Collins and Somers (2003) CBR: specific implication about how and when learning and adaptation take place
  • EBMT, SMT, RBMT nontrivial use of a large library of examples/cases/exemplars/instances at runtime that is, during the task performance/testing phase rather than the learning/training phase New problems are solved at runtime via analogy to similar examples retrieved from the library, which are broken down, adapted, and recombined as needed to form a solution This stands in contrast to most other machine learning approaches which focus on heavy offline learning/training phases, so as to compile or generalize large example sets into abstracted performance models consisting of various forms of abstracted schemata (which are normally much smaller than the entire set of training examples).
  • EBMT, SMT, RBMT Leaning toward memorization rather than abstraction of the training set makes some significant tradeoffs. On one hand, given sufficiently large example libraries, memorization avoids loss of coverage often caused by incorrect generalization or overgeneralization. In the extreme case, memorization approaches are guaranteed to reproduce exactly all unique sentence translations from the training corpus, something abstracted schematic approaches may not necessarily do. On the other hand, memorization approaches tend to undergeneralize, and runtime space and time complexity are vastly increased. EBMT: SMT with fairly ad hoc  numerical  measures  …  
  • EBMT, SMT, RBMT EBMT - SMTS Modern EBMT systems incorporate both; for example, Aramaki et al. (2005), Langlais and Gotti (2006), Liu et al. (2006), and Quirk and Menezes (2006) aim for probabilistic formulations of EBMT in terms of statistical inference Lots of references in Dekai’s paper
  • EBMT, SMT, RBMT Dekai Wu
  • EBMT, SMT, RBMT Dekai Wu
  • A few other approaches Using EBMT chunks (marker hypothesis) in SMT Using SMT chunks in EBMT Making EBMT more efficient: using IR technology for retrieval Combining EBMT with TM
  • Conclusion A taste of EBMT Hard to define what exactly EBMT is and how it relates to SMT, RBMT and TM Some core EBMT approaches Runtime, pure Compile time/preprocessing Many  EBMT  approaches  seem  to  be  essentially  hybrid  “in  nature/in   practice” EBMT: quo vadis – where are you going?
  • Compiled/off-line EBMT
  • Compiled/off-line EBMT
  • Compiled/off-line EBMT
  • Compiled/off-line EBMT