Itzulpen automatikoaren aukerak eta dimentsioak (2005)

1,234 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,234
On SlideShare
0
From Embeds
0
Number of Embeds
60
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Itzulpen automatikoaren aukerak eta dimentsioak (2005)

  1. 1. Itzulpen automatikoaren aukerak eta dimentsioak Joseba Abaitua eta Asier Alcázar DELi taldea www.deli.deusto.es Udako Ikastaroak - Miramar 2005
  2. 2. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak (Consumer Corpus) </li></ul><ul><li>Bibliografia </li></ul>
  3. 3. Hasi baino lehen… <ul><li>Zenbat? </li></ul><ul><ul><li>Idazle-itzultzaile </li></ul></ul><ul><ul><li>Hizkuntzalari </li></ul></ul><ul><ul><li>Informatikari </li></ul></ul><ul><ul><li>Ingeniari </li></ul></ul><ul><ul><li>Matematikari </li></ul></ul><ul><ul><li>(Besterik: fisikari, filosofo, psikologo, teologo...) </li></ul></ul>
  4. 4. Aurrekariak <ul><ul><li>G uru ek esan dutena : </li></ul></ul><ul><ul><ul><li>Y orick Wilks, 1976; </li></ul></ul></ul><ul><ul><ul><li>M artin Kay, 1980; </li></ul></ul></ul><ul><ul><ul><li>A lan K . Melby, 1995; </li></ul></ul></ul><ul><ul><ul><li>K evin Knight, 2005 . </li></ul></ul></ul>
  5. 5. Aurrekariak (1/5) Yorick Wilks (1976) <ul><li>Richard Montague (1930-1971) </li></ul><ul><li>and </li></ul><ul><li>Ludwig Wittgenstein (1889-1951) </li></ul><ul><ul><li>represent views which are diametrically opposed </li></ul></ul><ul><ul><li>on the key issue of formalization : of whether, and in what ways, formalisms to express the content of natural language (NL) should be constructed. &quot;I shall argue, too, that the influence of Wittgenstein has been largely beneficial while that of Montague has been largely malign.&quot; </li></ul></ul>
  6. 6. Aurrekariak (2/5) Yorick Wilks (1976) <ul><li>Is there a hidden structure to NL? </li></ul><ul><li>Montague : Yes; clearly he belives there is </li></ul><ul><li>a simple logic as the hidden structure of NL. </li></ul><ul><li>Wittgenstein : Yes, but not one to be revealed by simple techniques, like logic. (He refers to &quot;deep grammatical structures&quot;, but it is not clear what exactly he ment for them). </li></ul>
  7. 7. Aurrekariak (3/5) Martin Kay (1980) <ul><li>&quot;...the computer is improperly used [...] when the attempt is made to mechanize the non-mechanical, or something whose mechanistic substructure science has not yet revealed . In other words, it happens when we attempt to use computers to do something we do not really understand. History provides no better example of improper use of compuers than machine translation.&quot; </li></ul>
  8. 8. Aurrekariak (3/5) Martin Kay (1980) <ul><li>&quot;...ordenagailua ez da ondo erabiltzen mekanikoa ez dena edo zientziarentzat oraindik azpiegitura mekaniko ezezaguna duena mekanizatzeko erabiltzen denean. Bestela esanda, ordenagailua gaizki erabiltzen dugu guk geuk ondo ulertzen ez dugun zerbait egiteko erabiltzen dugunean. Ordenagailuaren erabilera desegokietan, historiak ez digu itzulpen automatikoa baino adibide hoberik emango.&quot; </li></ul>
  9. 9. Aurrekariak (3/5) Martin Kay (1980) <ul><li>&quot;No es adecuado encomendar al ordenador que mecanice lo que no es mecánico, o algo cuya subestructura mecánica no ha sido revelada para la ciencia . En otras palabras, el ordenador se usa inadecuadamente cuando intentamos que haga algo que nosotros mismos no comprendemos. La historia no puede ofrecer un ejemplo mejor de uso inapropiado del ordenador que la traducción automática.&quot; </li></ul>
  10. 10. Aurrekariak (4/5) Alan K. Melby (1995) <ul><li>&quot;[By 1978] I had become convinced that </li></ul><ul><li>what we had been looking for, the universal, </li></ul><ul><li>language-independent set of sememes, </li></ul><ul><li>did not exist &quot;. </li></ul><ul><li>&quot;Words of general vocabulary are more like wild horses. They do not want to remain stable in their meanining&quot;. </li></ul>
  11. 11. Aurrekariak (5/5) Kevin Knight (2005) <ul><li>“ Deep secrets of translation </li></ul><ul><li>lie burried in these large data sets [parallel corpora] , waiting to be uncovered by automatic analysis” . </li></ul>
  12. 12. Aurrekariak (5/5) Kevin Knight (2005) <ul><li>“ Itzulpengintzaren sekretu handiak ezkutatuta daude data multzo erraldoi [corpora paralelo] hauen barruan, analisi automatikoaren bitartez aurkituak izateko zain” </li></ul>
  13. 13. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak </li></ul><ul><li>Bibliografia </li></ul>
  14. 14. Status quaestionis: Aukerak <ul><li>Direct Machine Translation </li></ul><ul><li>Indirect Machine Translation </li></ul><ul><ul><li>Transfer </li></ul></ul><ul><ul><li>Interlingua </li></ul></ul><ul><li>Example based-MT </li></ul><ul><ul><li>(EBMT) </li></ul></ul><ul><li>Statistical MT (SMT) </li></ul><ul><ul><li>Syntax-based statistical MT (SBMT) </li></ul></ul><ul><li>Rule-based MT (RBMT) </li></ul><ul><li>Knowledge-based MT </li></ul><ul><li> (KBMT) </li></ul><ul><li>Analogy-based MT </li></ul>
  15. 15. S tatus quaestionis: A ukerak Vauquois triangelua
  16. 16. S tatus quaestionis: A ukerak: KMBT (interlingua) Shun Ha Sylvia Konecna Wong (2001)
  17. 17. Status quaestionis: Aukerak SBMT SMT RBMT KBMT EBMT
  18. 18. Status quaestionis: Aukerak SBMT SMT RBMT KBMT EBMT <ul><li>1970 </li></ul><ul><li>1990 </li></ul><ul><li>2000 </li></ul>hybrid
  19. 19. Status quaestionis: Aukerak SBMT SMT RBMT KBMT EBMT <ul><li>1970 </li></ul><ul><li>1990 </li></ul><ul><li>2000 </li></ul>Montague ( ‘ maltzurra ’ ) hybrid KBMT
  20. 20. Status quaestionis: Aukerak KBMT <ul><li>1970 </li></ul><ul><li>2000 </li></ul>http://www.isi.edu/natural-language/projects/IL-Annot/
  21. 21. S tatus quaestionis: A ukerak: KMBT (interlingua) http://www.isi.edu/natural-language/projects/IL-Annot/ IL-Annot Interlingua Annotation Designing an Interlingua -- a neutral representation of text meaning that sits between languages and can be used to facilitate machine translation and other multi-lingual applications -- has been a dream of computational linguists for several decades. Despite progress on several fronts, a number of key phenomena stubbornly resist standardization. They include: case roles (can one list the most basic roles associated with each action, and each object?), aspect (for example, what does the continuous form in English mean?), discourse connectives (what is the general principle underlying &quot;but&quot; and &quot;however&quot;, for example?), etc. This project is a collaboration with people from six universities around the country.
  22. 22. Status quaestionis: Kevin Knight (2005) <ul><li>1989 </li></ul><ul><li>2005 </li></ul>SMT “ Automatic statistical training has recently made a major impact on MT accuracy” “ the transition from gibberish to understandable output can be attributed to the increased availability of parallel data and progress in modeling, decoding and automatic evaluation” <ul><li>Mordoilotik emaitz ulergarrietara </li></ul><ul><li>erabiltzeko moduan gero eta testu paralelo gehiago </li></ul><ul><li>ebaluaketa automatiko </li></ul><ul><li>eredu estatistiko </li></ul><ul><li>eta deskodetzaile berriak </li></ul>
  23. 23. Status quaestionis: Kevin Knight (2005) <ul><li>1989 </li></ul><ul><li>2005 </li></ul>SMT <ul><li>Basic elements of (S)MT: </li></ul><ul><li>Training data: </li></ul><ul><ul><li>Millions of words of bilingual data exist for dozens of language pairs </li></ul></ul><ul><ul><ul><li>Parlament proceedings (Hansard) </li></ul></ul></ul><ul><ul><ul><li>UN transcripts </li></ul></ul></ul><ul><ul><ul><li>Multilingual newswires </li></ul></ul></ul><ul><ul><ul><li>etc. </li></ul></ul></ul><ul><ul><li>Google caches many </li></ul></ul>
  24. 24. Status quaestionis: Kevin Knight (2005) <ul><li>1989 </li></ul><ul><li>2005 </li></ul>SMT <ul><li>Basic elements of (S)MT: </li></ul><ul><li>Evaluation methods until 2002 </li></ul><ul><ul><li>Lack of gold standard for a test set </li></ul></ul><ul><ul><li>High degree of variation in HT </li></ul></ul><ul><ul><li>Unclear how to measure the distance between MT and HT (because of large scale w/ph mov.) </li></ul></ul><ul><ul><li>No fast develop/test/evaluate cycle </li></ul></ul><ul><ul><li>Subjective evaluation processes </li></ul></ul><ul><ul><li>Complex protocols for counting errors (lexical, syntactic, semantic…) </li></ul></ul>
  25. 25. Status quaestionis: Kevin Knight (2005) <ul><li>1989 </li></ul><ul><li>2005 </li></ul>SMT <ul><li>Basic elements of (S)MT: </li></ul><ul><li>Evaluation methods after 2002 </li></ul><ul><ul><li>Papineni et al 2002: BLEU </li></ul></ul><ul><ul><li>Translation performance (adequacy and fluency) correlates with the number of n-grams that co-occur between translated and reference documents </li></ul></ul><ul><ul><li>The higher the overlap, the higher the performance </li></ul></ul><ul><ul><li>2004: BLEU is the metric of choice in DARPA’s evaluation at NIST </li></ul></ul>
  26. 26. Status quaestionis: Aukerak <ul><li>Machine translation? </li></ul><ul><li>Machine aided translation? </li></ul><ul><li>Software localisation? </li></ul><ul><li>Multilingual content management? </li></ul>
  27. 27. SARE-Bi: functions <ul><li>Retrieving docs. </li></ul><ul><ul><li>filtering </li></ul></ul><ul><ul><ul><li>based on metadata </li></ul></ul></ul><ul><ul><li>searching </li></ul></ul><ul><ul><ul><li>free text </li></ul></ul></ul><ul><ul><ul><li>any language </li></ul></ul></ul>
  28. 28. SARE-Bi: filter results <ul><li>A row for each document </li></ul><ul><ul><li>visualisation link modification link </li></ul></ul>
  29. 29. SARE-Bi: visualisation <ul><li>Export tool </li></ul><ul><ul><li>TEI & TMX </li></ul></ul><ul><li>Complete doc. </li></ul><ul><ul><li>to retrieve full contents </li></ul></ul><ul><li>Segmented doc. </li></ul><ul><ul><li>to see language correspondence </li></ul></ul>
  30. 30. SARE-Bi: search results <ul><li>Found segments </li></ul><ul><ul><li>in all document languages </li></ul></ul><ul><ul><li>equivalent to translation memory browsing </li></ul></ul><ul><li>Includes visualisation link </li></ul>
  31. 31. SARE-Bi: adding a document (first step) <ul><li>User provides: </li></ul><ul><ul><li>values for metadata </li></ul></ul><ul><ul><li>languages of the document (may be just one) </li></ul></ul>
  32. 32. <ul><li>User input Metadata management </li></ul><ul><li>Segmentation and alignment </li></ul><ul><ul><li>user can verify that these tasks are OK </li></ul></ul><ul><li>Same page for document modification </li></ul>SARE-Bi: adding a document (second step)
  33. 33. SARE-Bi: metadata (categorisation of documents) <ul><li>Hierarchical taxonomy of several levels </li></ul><ul><ul><li>3 functions, 25 genres, and 256 topics (UD) </li></ul></ul><ul><ul><li>e.g. a certificate of attendance at a short course has: </li></ul></ul><ul><ul><ul><li>1-function informative </li></ul></ul></ul><ul><ul><ul><li>2-genre certificate </li></ul></ul></ul><ul><ul><ul><li>3-topic attendance </li></ul></ul></ul>30000/inquirir 31100/ ficha 31101/ aceptación o renuncia de beca 31102/ boletín de inscripción 31103/ datos de viaje 31104/ modelo de pago 31105/ relación de coordinadores departamentales 31106/ planificación actividad de profesores 31107/ prácticas 31108/ datos estadísticos 31109/ boletín subscripción revista 31200/ impreso 31201/ de solicitud de beca 31202/ de solicitud de expediente 31203/ de solicitud de admisión 31204/ de solicitud de alojamiento 31205/ de programa Sócrates 31206/ de matrícula 31207/ factura 31208/ recibí 31209/ petición de fotocopias
  34. 34. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak </li></ul><ul><li>Bibliografia </li></ul>
  35. 35. Itzulpen automatikoaren helburuak <ul><li>Ebaluaketa sistemak: </li></ul><ul><li>ISLE taxonomy for MT Evaluation </li></ul><ul><li>http:// www . isi . edu /natural- language / mteval / </li></ul><ul><li>IBM ren BLEU sistema </li></ul><ul><li>NIST 2002 Machine Translation Evaluation Plan (MT-02) </li></ul><ul><li>MTSE2005 Evaluation Measures for MT and/or Summarization ACL 2005 Workshop (W9 ) </li></ul><ul><li>http:// www . isi . edu /~ cyl /MTSE2005/ </li></ul>
  36. 36. Ebaluaketa sistemak : ISLE taxonomy for MT Evaluation <ul><li>1 Evaluation requirements </li></ul><ul><li>* 1.1 The purpose of evaluation </li></ul><ul><li>* 1.2 The object of evaluation </li></ul><ul><li>* 1.3 Characteristics of the translation task </li></ul><ul><li>o 1.3.1 Assimilation </li></ul><ul><li>o 1.3.2 Dissemination </li></ul><ul><li>o 1.3.3 Communication </li></ul><ul><li>* 1.4 User characteristics </li></ul><ul><li>o 1.3.1 Machine translation user </li></ul><ul><li>o 1.3.2 Translation consumer </li></ul><ul><li>o 1.3.3 Organisational user </li></ul><ul><li>* 1.5 Input characteristics (author and text) </li></ul>
  37. 37. Ebaluaketa sistemak : ISLE taxonomy for MT Evaluation <ul><li>2 System characteristics to be evaluated </li></ul><ul><li>* 2.1 System internal characteristics </li></ul><ul><li>o 2.1.1 MT system-specific characteristics </li></ul><ul><li>o 2.1.2 Translation process models </li></ul><ul><li>o 2.1.3 Linguistic resources and utilities </li></ul><ul><li>o 2.1.4 Characteristics of process flow </li></ul><ul><li>* 2.2 System external characteristics </li></ul><ul><li>o 2.2.1 Functionality </li></ul><ul><li>o 2.2.2 Reliability </li></ul><ul><li>o 2.2.3 Usability </li></ul><ul><li>o 2.2.4 Efficiency </li></ul><ul><li>o 2.2.5 Maintainability </li></ul><ul><li>o 2.2.6 Portability </li></ul><ul><li>o 2.2.7 Cost </li></ul>
  38. 38. Ebaluaketa sistemak : ISLE taxonomy for MT Evaluation <ul><li>* 2.2 System external characteristics </li></ul><ul><li>o 2.2.1 Functionality </li></ul><ul><li>+ 2.2.1.1 Suitability </li></ul><ul><li>+ 2.2.1.2 Accuracy </li></ul><ul><ul><ul><ul><ul><li># 2.2.1.2.1 Fidelity </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li># 2.2.1.2.2 Consistency </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li># 2.2.1.2.3 Terminology </li></ul></ul></ul></ul></ul><ul><li>+ 2.2.1.3 Wellformedness </li></ul><ul><li>+ 2.2.1.4 Interoperability </li></ul><ul><li>+ 2.2.1.5 Compliance </li></ul><ul><li>+ 2.2.1.6 Security </li></ul>
  39. 39. ISLE taxonomy for MT Evaluation : Fidelity / correctness / precision <ul><li>Definition </li></ul><ul><ul><li>Subjective evaluation of the degree to which the information contained in the original text has been reproduced without distortion in the translation (Van Slype). </li></ul></ul><ul><ul><li>Measurement of the correctness of the information transferred from the source language to the target language (Halliday in Van Slype's Critical Report). </li></ul></ul><ul><li>Metrics </li></ul><ul><ul><li>Carroll (in Van Slype's Critical Report): Rating of sentences read out of context on a 9-point scale. </li></ul></ul><ul><ul><li>Crook and Bishop (in Van Slype's Critical Report): Rating on a 25-point scale. </li></ul></ul><ul><ul><li>Halliday (in Van Slype's Critical Report): Assessment of the correctness of the information transferred. </li></ul></ul><ul><ul><li>Leavitt (in Van Slype's Critical Report): Rating of text units read on a 9-point scale. </li></ul></ul><ul><ul><li>Miller and Beebe-Center (in Van Slype's Critical Report ): Rating of a text on a 100-point scale. </li></ul></ul>
  40. 40. ISLE taxonomy for MT Evaluation : Fidelity / correctness / precision <ul><li>Metrics </li></ul><ul><ul><li>White and O'Connell (in DARPA 94 ): Rating of 'Adequacy' on a 5-point scale. </li></ul></ul><ul><ul><li>Bleu evaluation tool kit (in Papineni et al. 2001 ): Automatic n-gram comparison of translated sentences with one or more human reference translations. </li></ul></ul><ul><ul><li>Rank-order evaluation of MT system: correlation of automatically computed semantic and syntactic attributes of the MT output with human scores for adequacy and informativeness, and also fluency. Hartley and Rajman 2001 and 2002 . </li></ul></ul><ul><ul><li>Automated word-error-rate evaluation (in Och , Tillmann and Ney , 1999 ). </li></ul></ul><ul><ul><li>Automated metric using head transducers ( Alshawi et al, 2000 ). </li></ul></ul>
  41. 41. ISLE taxonomy for MT Evaluation : Fidelity / correctness / precision <ul><li>Notes </li></ul><ul><ul><li>The fidelity rating has been found to be equal to or lower than the comprehensibility rating, since the unintelligible part of the message is not found in the translation. Any variation between the comprehensibility rating and the fidelity rating is due to additional distortion of the information, which can arise from: loss of information (silence) - example: word not translated </li></ul></ul><ul><ul><li>interference (noise) - example: word added by the system </li></ul></ul><ul><ul><li>distortion from a combination of loss and interference - example: word badly translated </li></ul></ul><ul><ul><li>Detailed analysis of the fidelity of a translation is very difficult to carry out, since each sentence conveys not a single item of information or a series of elementary items of information, but rather a portion of message or a series of complex messages whose relative importance in the sentence is not easy to appreciate. </li></ul></ul><ul><ul><li>Some automated metrics assume a fidelity evaluation as a human ground truth, or are relevant to fidelity evaluation. </li></ul></ul><ul><li>View or add comments (179) </li></ul>
  42. 42. Ebaluaketa sistemak : IBMren BLEU <ul><ul><li>Papineni et al 2002: BLEU </li></ul></ul><ul><ul><li>Translation performance (adequacy and fluency) correlates with the number of n-grams that co-occur between translated and reference documents </li></ul></ul><ul><ul><li>The higher the overlap, the higher the performance </li></ul></ul><ul><ul><li>2004: BLEU is the metric of choice in DARPA’s evaluation at NIST </li></ul></ul>
  43. 43. Ebaluaketa sistemak : IBMren BLEU
  44. 44. Ebaluaketa sistemak : IBMren BLEU
  45. 45. Ebaluaketa sistemak : NIST evaluation since 2003 <ul><li>An the winer of the </li></ul><ul><li>2005 June competition has been: </li></ul><ul><li>Google’s SMT system!!!, developed by Franz Och (formerly at ISI) </li></ul><ul><li>Asier Alcázar-ek esan dit </li></ul>
  46. 46. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak </li></ul><ul><li>Bibliografia </li></ul>
  47. 47. IA ren dimentsioak: Kalitatea vs. estaldura estaldura kalitatea CAT MT
  48. 48. IA ren dimentsioak: Egokitasuna - jariortasuna Rober C. Berwick 2003
  49. 49. Itzulpen automatikoaren dimentsioak <ul><li>Testu mota / genero (ISLE) </li></ul><ul><li>Dentsitate lexikala (ISLE) </li></ul><ul><li>Alde linguistiko eta kulturala </li></ul><ul><li>Garapena eta baliabideak (ISLE) </li></ul><ul><li>Hedabideak (ISLE) </li></ul><ul><li>Helburuak (ISLE) </li></ul>
  50. 50. Itzulpen automatikoaren dimentsioak <ul><li>Alde linguistiko eta kulturala </li></ul><ul><li>Garapena eta baliabideak (ISLE) </li></ul><ul><li>Testu mota / genero (ISLE) </li></ul><ul><li>Dentsitate lexikala (ISLE) </li></ul><ul><li>Hedabideak (ISLE) </li></ul><ul><li>Helburuak (ISLE) </li></ul>
  51. 51. IA ren dimentsioak: Alde linguistiko eta kulturala <ul><li>Zenbat hizkuntz familia? </li></ul><ul><li>es pr / ca </li></ul><ul><li>es en / de </li></ul><ul><li>es eu </li></ul><ul><li>es ja </li></ul>
  52. 52. Status quaestionis: Aukerak <ul><li>Direct Machine Translation </li></ul><ul><li>Indirect Machine Translation </li></ul><ul><ul><li>Transfer </li></ul></ul><ul><ul><li>Interlingua </li></ul></ul><ul><li>Example based-MT </li></ul><ul><ul><li>(EBMT) </li></ul></ul><ul><li>Statistical MT (SMT) </li></ul><ul><ul><li>Syntax-based statistical MT (SBMT) </li></ul></ul><ul><li>Rule-based MT (RBMT) </li></ul><ul><li>Knowledge-based MT </li></ul><ul><li> (KBMT) </li></ul><ul><li>Analogy-based MT </li></ul>
  53. 53. Status quaestionis: Kevin Knight (2005) <ul><li>1989 </li></ul><ul><li>2005 </li></ul>SMT “ Automatic statistical training has recently made a major impact on MT accuracy” “ the transition from gibberish to understandable output can be attributed to the increased availability of parallel data and progress in modeling, decoding and automatic evaluation”
  54. 54. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak </li></ul><ul><li>Bibliografia </li></ul>
  55. 55. Adibideak <ul><li>Untranslatability ( a deed , Renascence text , poem1 , poem2 , poem3 , linguistic jokes ) </li></ul><ul><li>Literary translation ( Harry Potter , Linguae Vasconum Primitiae , Othelo ) </li></ul><ul><li>Translation of news ( an event , opinion article ) </li></ul><ul><li>Autotranslation ( Bernardo Atxaga , Javi Cillero , Unai Etxebarria) </li></ul><ul><li>Machine Translation ( conference abstracts , report summary , sport ) </li></ul><ul><li>Comparable news ( economy , sport ) </li></ul><ul><li>Consumer corpus (Asier Alcázar) </li></ul>
  56. 56. Aurkibidea <ul><li>Aurrekariak </li></ul><ul><ul><li>G uruek esan dutena ( Y. Wilks, 1976; M. Kay, 1980; A. Melby, 1995; K. Knight, 2005 ) </li></ul></ul><ul><ul><li>Status quaestionis Itzulpen automatikoan: Aukerak </li></ul></ul><ul><li>Noiz lortzen ditu bere helburuak IA sistema batek </li></ul><ul><ul><li>Azken joerak IA ren ebaluaketan (ISLE, Bleu , NIST ) </li></ul></ul><ul><li>Dimentsioak </li></ul><ul><li>Adibideak </li></ul><ul><li>Bibliografia </li></ul>
  57. 57. Bibliografia (1/2) <ul><li>E. Charniak, K. Night, K. Yamada. 2003. Syntax-based language models for machine translation. Proceedings of MT Summit IX. http:// www . isi . edu /natural- language / projects / rewrite /mtsummit03. pdf </li></ul><ul><li>J. Díaz-Labrador, J. Abaitua, G. Araolaza, I. Jacob, F. Quintana. 2003. Metadata for multilingual content management A practical experience with the SARE-Bi system. ASLIB Translating & the Computer 25, pp. 151-170. http:// www . deli . deusto .es/ AboutUs / Publications /TC25_ deli _final. pdf </li></ul><ul><li>M. Galley, M. Hopkins, K. Night, D. Marcu. 2004. What's in a translation rule? Proceedings of NAALCL-HLT. http:// www . isi . edu /natural- language / projects / rewrite / whatsin . pdf </li></ul><ul><li>A. Hartley, M. Rajman. 2001. Automatically Predicting MT Systems Rankings Compatible with Fluency, Adequacy or Informativeness Scores. Proceedings of Workshop on MT Evaluation &quot;Who did what to whom?&quot; at MT Summit VIII,pp. 29-34. http:// www . issco . unige . ch / projects / isle /MT- Summit - wsp . html </li></ul><ul><li>K. Knight, D. Marcu. 2005. Machine Translation in the Year 2004, Proceedings of ICASSP. http:// www . isi . edu /natural- language / mt /icassp05. pdf </li></ul><ul><li>K. Papineni, S. Roukos, T. Ward, W.J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311-318. http:// acl . ldc . upenn . edu /P/P02/P02-1040. pdf </li></ul><ul><li>J. White. 2003. How to evaluate machine translation. En H.Somers (ed.) Computers and Translation: a translator's guide. J. Benjamins, pp. 211-244. </li></ul>
  58. 58. Bibliografia (2/2) <ul><li>Yorik Wilks. 1976. Philosophy of Language, in Eugene Charniak and Yorik Wilks (eds), pp. 205-233. Computational Semantics: An introduction to Artificial Intelligence and Natural Language Comprehension , North-Holland. </li></ul><ul><li>Martin Kay. 1997. The Proper Place of Men and Machines in Language Translation. Machine Translation 12:3-23. </li></ul><ul><li>Melby, Alan K. 1995. The Possibility of Language. A discussion of the nature of language with implications for human and machine translation . John Benjamins </li></ul><ul><li>K. Knight, D. Marcu. 2005. Machine Translation in the Year 2004, Proceedings of ICASSP . </li></ul>

×