New Tools and Resources to Support Machine Translation
Upcoming SlideShare
Loading in...5
×
 

New Tools and Resources to Support Machine Translation

on

  • 230 views

 

Statistics

Views

Total Views
230
Views on SlideShare
229
Embed Views
1

Actions

Likes
0
Downloads
4
Comments
0

1 Embed 1

https://fiu.blackboard.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good afternoon! My name is AB and I am a PhD student working on MT. I am affiliated with Universidade do Porto-Linguateca and New York University. My interests have centered on MT after working on a commercial MT system for over 7 years. In this presentation , I will introduce ParaMT, a paraphraser applied to machine translation, which was developed during my research work.
  • Outline First an introduction to distinguish HT from MT Then talk about the resources and tools developed within the scope of my PhD research
  • Human translation cannot be replaced by machine translation, at least until there are breakthroughs in the limitation of machine translation to sentence level translation, and in artificial intelligence.
  • Some facts about Machine Translation For most of human history, translation was an exclusively human activity. Before that, machine translation was only accessible to a very restricted niche of the market, and computer-aided translation was used only by professional translators.
  • Despite the availability of funding and many talented researchers worldwide, most efforts to build cost-effective, industrial strength, high-quality machine translation have fallen short of their goals, since first attempts in the 1950's. Successful machine translation has been difficult to achieve because of two major hurdles: complexity and ambiguity of language.
  • Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language.
  • More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text. Typical problems in machine translation They often produce errors
  • Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
  • "bom partido" também pode ser considerado um composto e "tirar partido de" como uma expressao fixa ou semi-fixa
  • Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
  • A support verb construction is defined as a predicate noun construction containing a main verb which has a weak semantic value. Support verb constructions is an area where statistics tend to “trap” systems. If statistical systems are not sensitive to these constructions, the consequence may be misleading translations. Linguistic knowledge about support verb constructions provides a statistical system with special training data that could correct this problem.
  • So, according to this desire to see better results, my main objectives were: READ 1, 2, 3.
  • The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
  • The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
  • In any language processing application, the linguistic resources represent the foundation. In machine translation especially, the linguistic resources are the driving force that boosts the translation process. Port4NooJ is developed on two original sources: NooJ linguistic environment and OpenLogos lexical resources. Linguateca’s resources were also used.
  • The system includes several dictionaries. The structure of the dictionary is XXX
  • The system includes several dictionaries. The structure of the dictionary is XXX
  • I will skip this slide on the inflectional and derivational descriptions.
  • Este slide apresenta uma gramática local para a análise e reconhecimento de construções com verbos suporte elementares e o parafraseamento monolingue que podemos ver na concordância. Paralelamente podemos encontrar, à esquerda a CVS e à direita um verbo lexical que lhe é equivalente.
  • Neste slide temos representada mais uma concordância, desta vez para o reconhecimento e parafraseamento de construções com verbos suporte elementares que co-ocorrem com nomes predicativos da área biomédica. À esquerda está representada a CVS e à direita um verbo lexical que lhe é equivalente ou uma variante estilística da construção, que pode ser construída a partir de um verbo suporte não elementar, tal como efectuar ou realizar ou por uma construção do tipo “sujeitar-se a” ou “submeter-se a”, no caso de o sujeito da CVS ser obrigatoriamente um paciente. À esquerda está representada a CVS e à direita as suas paráfrases.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
  • A concordância representada neste slide ilustra o reconhecimento e parafraseamento bilingue PT-EN de CVS. À esquerda temos a CVS em português e à direita, um verbo lexical equivalente em inglês.
  • Two main conclusions derived from this work are:

New Tools and Resources to Support Machine Translation New Tools and Resources to Support Machine Translation Presentation Transcript

  • Anabela Barreirobarreiro_anabela@hotmail.comFLUP & CLUP-LinguatecaNew York UniversityNew Tools and Resources to SupportMachine TranslationMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • OutlineMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human Translation vs Machine TranslationAn objective and purpose distinction must be establishedbetween human translation and machine translation!•They use different methods•They apply to different types of texts•They serve different purposes•They face different barriers•They are NOT in competition!Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human TranslationProfessional translation requires:•a profound knowledge of the source language and nativeproficiency of the target language•above-average writing skills•an insightful knowledge of the social-cultural aspects of thesource and target languages•knowledge of the grammar of the two languages, theirwriting conventions, and the situational and cultural context•In the case of scientific and technical translation, subjectmatter knowledge is required, including terminologies of thefield or knowledge domain.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human TranslationTheory of translation has been dealing with controversialissues:•problems related to privileging meaning over form•visibility or invisibility of the translator•being faithful to the author or trying to make the textaccessible to the reader (and which kind of reader)•giving value to the source language culture (foreignise) ormaking the text suitable for the target language culture(domesticate)•Allowing languages/cultures with more impact topredominate over languages/cultures with less impact, or beingcreative, etc.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human TranslationThe most relevant aspect in translation is to define thepurpose of each translation, which is related to thecharacteristics of each text.… And to define paraphrasing capabilities.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human Translation: Types of TextsA certain subjectivity and distance from the sourcelanguage text is allowed in translation of literary text for thesake of maintaining the artistic and aesthetic aspects of thetarget language text [Hermans, 1985] [Landers, 2001].Literary translation may be considered an ART [Leighton,1990] [Weaver, 2002], where the translator has more freedomof expression.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Human Translation: Types of TextsTechnical, commercial, and legal translators, like theauthors of the original texts, are more restrained in their use oflanguage, and they need to be precise and convey the exactmeaning of the original text.Technical texts are not meant to be beautiful but ratherto be informative, instructive and explanatory. Their mainfunction is to be clear, so the easier they are to read, the betterthey are understood.Technical translation may be regarded as a CRAFT[Newmark, 1988] [Biguenet & Schulte, 1989] for which bothtechnical and linguistic competence is essential, but creativityand vagueness prohibited.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine TranslationWith more translation being performed by machines,new challenges are imposed on the field, theoretical traditionsshaken and the need to rethink the status of translationbecomes more evident. Of all automated applications, machinetranslation compels us to reconsider the nature of translation.ART and CRAFT are NOT appropriate concepts formachine translation, because it has necessarily to rely onlinguistics and computer science.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation1- Automated translation of text or speech from one naturallanguage into another2- An important tool that assists human translators3- It has become available to the general public in the last fewyears due to:• sophisticated computers• continuous development of computer software capabilities• internet boomMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation (cont.)Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation Bottlenecks1.Complexity of language2.Ambiguity of language3.Wordiness (related to text quality)Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation: Limitations• The task of delivering high-quality machine translation of certaintypes of texts and complex linguistic phenomena is difficult• It is difficult to grasp humour, sarcasm, and other human feelingsexpressed in/by means of sophisticated linguistic expression• Difficulties in handling extra-sentential and extra-textual andextra-linguistic information (problems of culture or context),because knowledge of the world cannot be assumed• Difficult to deal with anaphora resolutionMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation Linguistic Challenges1.Homography2.Cross-language phenomena (lexical divergences and idiomsand cross-language syntactic transformations, such aspassives)3.Identification of named entities4.Capacity to deal with long sentences and wordiness5.Unusual alterations to the order of words in the targetlanguage6.Enhanced dictionaries and grammars to recognize andtranslate multiword expressionsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation Linguistic Challenges: Examples• Handling of ellipsisadvanced ambiguity problems – related to anaphoraO João visitou muitos países do mundo. A Maria não visitou nenhum.=> João has visited many countries in the world. Maria hasn’t visited any.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation Linguistic Challenges: Examples• Common-noun nuance resolution / homography(1) ele não quis tomar partido de ninguém(2) ele é um bom partido(3) ele tirou partido da situação(4) ele pertence a esse partido (político)(5) o copo está partido(6) já esteve em melhor partidoMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Machine Translation Linguistic Challenges: ExamplesTranslation Engine Translation ResultsFreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still thisweek, defined advances or not for a candidacy to the RTLRS.WorldLingo advances despite he is to make an effort in the direction to still take a decision this week,defining if he advances or he does not stop a candidacy to the RTLRS.Translation Engine Translation ResultsGoogle Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.Amikai que eu não posso fazer para uma decisão sobre qualquer coisa estes dias.FreeTranslation Eu não posso tomar uma decisão sobre algo estes dias.Babelfish Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias.E-Translation Server Não posso tomar uma decisão sobre qualquer coisa estes dias.I cant make a decision about anything these days. [Compara]Francisco Vieira adianta ainda que está a fazer um esforço no sentido detomar uma decisão ainda esta semana, definindo se avança ou não parauma candidatura à RTLRS. [CdP]Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Multiword Expressions: Support Verb ConstructionsSupport verb construction = predicate noun constructionis a multiword expression containing a verb with weak semantic valueand a noun which is the predicate of the sentence.Predicate nouns can be:morphologically related to a verbfazer uma apresentação de = apresentarpay a visit to = to visitautonomousfazer um mestrado - *mestrarhave fun - *to funMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Main Objectives1.Build a body of lexical, syntactic and semantic knowledgearound support verb constructions2.Apply this linguistic knowledge to paraphrasing3.Improve machine translationMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Outcome: ResourcesPort4NooJ•an open source, ontology driven Portuguese linguisticsystem, which integrates a bilingual extension forPortuguese-English machine translationDicTUM•Dicionário de Termos e Unidades Multipalavra•a Dictionary of Multiword ExpressionsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Outcome: ToolsReWriter•a monolingual paraphraser to pre-edit texts, usingparaphrasing capabilities•Portuguese version ReEscreveParaMT•a bilingual/multilingual paraphraser to be integrated inmachine translation systemsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ResourcesPort4NooJ - Publicly available at:http://www.nooj4nlp.nethttp://www.linguateca.pt/Repositorio/Port4Nooj/Based on:•NooJ linguistic environment (http://www.nooj4nlp.net/)•OpenLogos English-Portuguese dictionary (http://logos-os.dfki.de/)OpenLogos is an open-source derivative of the Logos Machine Translation SystemData Used•COMPARA (http://www.linguateca.pt/COMPARA)•METRA (http://www.linguateca.pt/metra)•Other corporaMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIVdoença maníaco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorderdoença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolardisorderasma,N+FLX=CASA+AB+state+PULM+EN=asthmaAmesterdão,N+PL+city+EN=AmsterdamEstados Unidos da América,N+PL+coun+EN=United States of AmericaÁfrica,N+PL+cont+EN=AfricaExtremo Oriente,N+PL+othprop+EN=Far EastMediterrâneo,N+FLX=ANO+PL+water+EN=MediterraneanAlpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine AlpsONU,N+AN+org+EN=UNSyntactic-SemanticAttributesEnglishTransferInflectionalParadigmPart ofSpeechLemmamesa,N+FLX=CASA+CO+surf+EN=tablecair,V+FLX=ATRAIR+INMO+IntoType+EN=fallholandês,A+FLX=INGLÊS+AN+lang+EN=Dutchactualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadaysalguém,PRO+IMPERS+INDEF+EN=somebodyporque,RELINT+why+EN=whye,CONJ+JOIN+EN=anddurante,PREP+TEMP+EN=duringcada,DET+IMPERS+INDEF+SG+EN=eachterceiro+NUM+ord+EN=one thirdPort4NooJ Dictionariesa curto prazo,ADV+TEMP+EN=in the short runa favor de,PREP+CAUS+EN=in favor ofcada um,PRO+INDEF+SG+EN=each onede quem,INT+ThatType+EN=whosequem quer que seja,REL+WhateverType+EN=whoeveralém disso,CONJ+COOR+EN=besidesum quarto,NUM+frac+EN=one fourthadro da igreja,N+FLX=MENINO+PL+encl+EN=churchyardcabo de vassoura,N+FLX=MENINO+COtool+EN=broomstickbebida alcoólica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMBbebida alcoólica,N+FLX=CASA+MA+liqu+EN=booze+slangcor de laranja,A+NAV+Apred+EN=orangesul-americano,A+FLX=ALTO+AN+des+EN=South Americana curto prazo,ADV+LocTime+TEMP+EN=in the short runfora de serviço,ADV+STAT+phr+EN=out of orderhá muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time agoisto é,CONJ+COOR+EN=i.e.já não,CONJ+COOR+EN=no longermesmo assim,CONJ+SUB+EN=even sojuntamente com,PREP+ASSOC+EN=along withà direita de,PREP+Loc+AT+EN=at the right ofem conformidade com,PREP+ALOG+EN=in congruence withGeneral dictionarysample representing allPoS, variable andinvariable forms Sample of thedictionary of TermsandMultiword ExpressionsDicTUMSample of invariablecompounds in thegeneral dictionarySample of thedictionary ofBiomedical TermsSample of thedictionary ofProper NamesMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Port4NooJ DictionariesSample of termsclassified as Information+ Instructional/legalMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Syntactic-Semantic Ontology      Representation abstract language    Hierarchical taxonomy (sets, supersets and (sometimes) subsets)    Based on Logos SAL ontology    Integrated in the dictionary    It represents both meaning (semantics), and structure (syntax)    Over 1,000 categoriesMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Syntactic-Semantic Ontology  Noun Supersetsconcretemassanimateplaceinformationabstractprocess (intr)process (tr)measuretimeaspectiveSets and Subsets of the CONCRETE Noun SupersetClick on CONCRETE Superset, sets and subsets for explanationsfunctionalsreceptaclesbearing surfaceslinks/bridgesthresholds, focalpoints, barriersconduitsfastenersdevices, toolscloth thingstructural elementsconcretizations ofverbalsconcretizations ofmass nounsundifferentiatedfunctionalsproduct/brandnames* * *agentivessoftwarevehiclesmetersmachines/systemscommunication agentsconcrete chemicalagentsundifferentiatedagentives* * *natural thingsminute floraplantstreestrees/woodmiscellaneous naturalthings* * *other concrete sets*impulses/lightsblemishes/marksedibles (non-mass)edibles/colorclassifiersamorphousatomisticundifferentiatedconcrete things* * **With one exception, thesesets have no subsetsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Syntactic-Semantic OntologyCategory Mnemonic Examples in English Examples in Portugueseagentives CO+undagt See subsets See subsetssoftware CO+soft routine rotina, ficheiroconcrete chemical agents CO+chem catalyst, warhead ácido sulfúricomachines/systems CO+mach battery, camera máquina fotográficavehicles CO+vehic truck, ship automóvelmeters CO+meter clock, gauge manómetrocommunication agents CO+comm radio, radar rádiofunctionals CO+undfunc trinket, ornament ornamentodevices/tools CO+tool pliers alicatefasteners CO+fast nail, tendon pregobearing surfaces CO+surf table, shelf mesareceptacles CO+recp bottle, barrel garrafaconduits CO+cond chute, artery artériathresholds/focal points/barriers CO+barr wall, door portalinks/bridges CO+link circuit, nerve circuitocloth things CO+cloth shirt, blanket camisolastructural elements CO+struc spar, bone ossoconcretizations of verbals CO+verb threadingconcretizations of mass nouns CO+mass acid liningproduct/brand names CO+brand Windows NT Windows NTnatural things CO+nat See subsets See subsetsminute flora CO+flora algae, spore algaplants CO+plant rose, weed ervatrees CO+tree apple, willow macieiratrees/wood CO+trwd oak, maple carvalhomisc. natural things CO+mnat pebble, iceberg icebergedibles (non-mass) CO+ednm pork chop costoletaedibles/color CO+edcol orange, cherry laranjaimpulses/lights Col+ight lamp, beam lâmpadablemishes/marks CO+blem scratch, freckle sardaclassifiers CO+class element elementoamorphous CO+amor breeze, tide brisaatomistic CO+atom electron, atom átomoundifferentiated CO+obj trifle, curio  Categories ofCONCRETE nounsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ME - MEASURE Noun Sets and SubsetsSets and SubsetsMnemonics (=SynSem)Examplesabstract concepts measured by unit ME+abs humidity, lengthdiscrete measurable concepts ME+dis sum, incrementunits of measure ME+unit See subsetsunits of weight ME+unit+wt ounce, poundunits of velocity ME+unit+vel mph, megahertzunits of volume measure ME+unit+vol gallon, literunits of temperature ME+unit+temp degrees celsiusunits of energy/force ME+unit+ener watt, horsepowermeasurement systems ME+unit+sys fahrenheit, kelvinunits of duration ME+unit+dur hour, minute, yearspecialized units of measure ME+unit+spec oersted, ohm, phonunits of money/value ME+unit+value dollar, euro, forintunits of linear/area measure ME+unit+lin inch, yard, milegeneral undifferentiated measure ME+undif degree, gross, shareSyntactic-Semantic Ontology  Categories ofMEASURE nounsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  •   Inflectional and Derivational DescriptionNoun Inflectional ParadigmAdjective InflectionalParadigmPronoun Inflectional ParadigmVerb Inflectional ParadigmAdverb Inflectional Paradigm Determiner Inflectional ParadigmInterrogative Pronoun InflectionalParadigm Nominalization DerivationalParadigmMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Paraphrasing and Translation GrammarsTranslation and bilingual paraphrasing of simple sentencesGraph to translate simple sentencesMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Verb entries:• Identification of derivational paradigms for nominalizations(annotation NDRV) and predicate adjectives (annotation ADRV)• Link to the derived noun’s support verbs and to the adjective’scopula verbs (annotation VSUP and annotation VCOP)adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃOazedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTOExplicit Marking of Derivation and Support VerbMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Adjective entries:• Identification of derivational paradigms for adverbializations(annotation AVDRV)literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTEAutonomous predicate nouns:• Identification of autonomous predicate nouns (annotationNpred)• Identification of a semantically related verbcurso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=umExplicit Marking of Derivation and Semantic Verb AssociationMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: a Monolingual Standalone ParaphraserRecognition and monolingual paraphrasingof support verb constructions(support verb construction / morphologically related lexical verb)Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: ExamplesRecognition and paraphrasing of elementary support verb constructionsco-occurring with predicate nouns of the biomedical field(support verb construction / lexical verb or stylistic variant / non-elementary support verb construction)Elementary SVC > Lexical VerbElementary SVC > non-elementary SVCrealizar/efectuarElementary SVC > sujeitar-se asubmeter-se aONLY if the SUBJECT is a patientMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Application - InterfaceInteractive ReWriterfor word processing applicationssuch as text editingMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Application - InterfaceInteractive ReWriterfor word processing applicationssuch as text editingMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Application - InterfaceMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Application - InterfaceMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Application - InterfaceMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Extensibility1.Applications to General Language2.Applications to Technical LanguageMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Extensibility - Examples[Paraphrasing adverbials]à volta da órbita ≡ periorbital (popular versus technical)around the orbit of the eye periorbital≡[Paraphrasing relative clauses - into adjectival pastparticiples]N0 que têm sido escritos N0 que foram descritos N0≡ ≡escritosN0 that have been written N0 that were described≡ ≡N0 written [Paraphrasing if clauses]se for necessário se necessário≡if it is necessary if necessary≡Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Extensibility - Examples[Paraphrasing coordinated noun phrases - conjoiningor disjoining]recursos linguísticos para o ensino e para a investigaçãoŦ ?linguistic resources for teaching and for research≡ recursos linguísticos para o ensino e a investigaçãoŦ linguistic resources for teaching and research[Paraphrasing subjunctive clauses - into infinitives]pedimos o favor que confirme a sua participaçãoŦ *we ask the favor that you confirm your attendance≡ pedimos o favor de confirmar a sua participaçãoŦ *we ask the favor of confirming your attendanceMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Extensibility - Examples[Paraphrasing marked-up constructions]se a necessidade do utilizador é criar um texto em linguagem controladaŦ ?if the end-user need is to create controlled language text≡ se o utilizador necessita de criar um texto em linguagem controladaŦ if the end-user needs to create controlled language text[Paraphrasing of vague and undefined or null subject sentences](whenever the real subject/actor is known)[-] houve um grito na rua [N-PRON]/≡ alguém gritou na ruaŦ there was shouting in the street [N-PRON]/≡ someone shouted in thestreetMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ReWriter: Extensibility - Examples[Paraphrasing passives - whenever suitable]Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveuesse livro em 2008That book was written by Saramago in 2008 Saramago wrote that≡book in 2008Florida foi atingida por um tornado ≡ Um tornado atingiu a FloridaFlorida was hit by a tornado A tornado hit Florida≡O carro foi roubado ≡ Alguém roubou o carroThe car was stolen ≡ Someone stole the carMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ParaMT: a Bilingual/Multilingual Paraphraser for MTRecognition and bilingual paraphrasing of support verb constructions (Portuguese support verb construction / corresponding English verb)Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Preliminary Quantitative Results  SVC RecognitionPrecisionSVC RecognitionRecallSVC ParaphrasingPrecisionPôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%Evaluation of recognition and paraphrasing of support verb constructions500 sentences100 for each elementary support verbMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • ConclusionsLinguistic knowledge applied to a machinetranslation system improves its output quality.Effective results from linguistically based researchon paraphrases can save substantial effort andresources employed by machine translation systemsMestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008
  • Thank you for your attention!AcknowledgementsThis work was partly supported by grant SFRH/BD/14076/2003from Fundação para a Ciência e a Tecnologia, co-financed byPOSI and partly by Fundação para a Computação CientíficaNacional.Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008