Your SlideShare is downloading. ×
0
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

639

Published on

SPIDER is a system for paraphrasing in document editing and revision. It was designed to help with writing optimization, but its applicability extends to MT pre-editing.

SPIDER is a system for paraphrasing in document editing and revision. It was designed to help with writing optimization, but its applicability extends to MT pre-editing.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
639
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SPIDER: A SYSTEM FOR PARAPHRASING IN DOCUMENT EDITING AND REVISION APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING Anabela Barreiro ab@metatrad.comCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 2. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCHCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 3. IMPORTANCE OF PARAPHRASES IN NLP TASKS  Question Answering [Ibrahim et al., 2003], [Paşca, 2003], [Duboué & Chu-Carroll, 2006]  Information Extraction and Text Mining [Ibrahim et al., 2003], [Shinyama et al., 2002] [Shinyama & Sekine, 2003], [Sekine, 2005] [Paşca, 2005], [Paşca & Dienes, 2005]  Summarization [McKeown et al., 2002], [Barzilay, 2001, 2003], [Hirao et al., 2004] [Zhou et al., 2006b]  Natural Language Generation [Iordanskaja et al. 1991]  Plagiarism Detection [Potthast et al., 2010], [Vila et al., 2010]  Machine Translation [Zhou et al., 2006], [Callison-Burch et al., 2006a, 2006b, 2007 and 2008] [Barreiro, 2008, 2009, 2011]CICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 4. THE PRACTICAL NEED FOR PARAPHRASES IN PEDAGOGICAL CONTEXTS  Text Processing and Authoring Aids Writing and revision of original/creative/customized texts  Learning Tools Native and second language learning Creation of clear and understandable text content e.g. students learning language and writing skills  Style Editors Uniformization /consistency of styleCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 5. THE PRACTICAL NEED FOR PARAPHRASES IN PROFESSIONAL CONTEXTS  Technical Writing Professional high quality documentation and domain-specific texts Controlled language  Linguistic Quality Assurance Linguistic quality of generic texts and specialized documentation Verification/validation of meaningful content  Text Optimization Readable / publishable texts (business-oriented or purpose-oriented content)  Terminology Search for the “exact” term or relevant keywords  Translation Indispensable for human and machine translation (pre-editing and post-editing)CICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 6. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCHCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 7. SPIDER PARAPHRASING SYSTEM FIRST STEPS Initially developed for Portuguese 1st version – ReEscreve publicly available service at http://www.linguateca.pt/ReEscreve/ 2nd version – eSPERTo (Portuguese: the smart/clever one; expert) currently being integrated in a cyber school project within the scope of an educational program Writing exercises – students learning how to improve their writing skills in the Portuguese language English SPIDER prototype to assist writing of domain-specific textsCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 8. SPIDER IMPORTANT FEATURES  Applies linguistic knowledge to recognize and generate paraphrases automatically (preserves the source text semantics and grammaticality - inflectional features) in the suggestions provided (included transformations of multi-word units)  Uses text-editing mechanisms which provide a variety of alternatives for each expression and the possibility to choose among them (according to personal preferences, style, idiomacity, etc.)  Allows users to suggest new expressions that can be immediately applied to their text, making the text editing process easier, more flexible, and upgradable  Designed to help with writing optimization, understandability and translatability (improvement of the quality of the source text so that it can cause a positive impact in translation)CICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 9. PARAPHRASES COVERED BY SPIDER  Synonyms in context (ex: phrasal verbs into equivalent expressions) to clear up (weather) = (weather) to become better/brighter  Support verb constructions into single verbs and stylistic variants to make a decision = to decide; to make an audit = to perform an audit  Aspectual constructions into single verbs to launch an attack = to attack  Adverbials (compounds into single adverbs) in a constructive way = constructively  Relatives into participial adjectives the president that was elected = the president elect  Relatives into possessives the role that Europe plays/has = the role of Europe  Relatives into compound nouns (and vice-versa) a container for the milk = a milk container; a bottle made of plastic = a plastic bottle  Agentive passives into actives the man was released by the police officer = the police officer released the manCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 10. INTERFACE SUGGESTIONS FOR EXAMPLE SENTENCES Suggestions for general language linguistic phenomena Compound adverbs > single adverbs Relatives > participial adjectives Support verb constructions > single verbsCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 11. INTERFACE SELECTION OF PARAPHRASING GRAMMARS FOR SPECIFIC LINGUISTIC PHENOMENA Users can select among general and technical dictionaries (more than oneselection allowed), grammars for specific linguistic transformations (one, severalor all grammars can be selected). The interface provides sample texts for testing. Informative details about the linguistic resources selected Sample LEGAL textCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 12. INTERFACE SELECTION OF A DOMAIN DICTIONARY Identification of legal terms in the text Suggestions for the term “breach of law” Users can select one term from the list of suggestions or provide a new suggestionCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 13. INTERFACE SUGGESTIONS PROVIDED AND USER’S CAPABILITY TO ADD NEW REWRITING OPTIONS The user can suggest new words or expressions (synonyms or paraphrases) It is possible to go back and change the user option as many times as necessary Text rewritten • In red, the expressions in the source text • In green, suggestions provided by SPIDER and selected by the userCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 14. LINGUISTIC RESOURCES  Eng4NooJ – linguistic knowledge system • OpenLogos dictionary (http://logos-os.dfki.de/) • converted into NooJ format, and enhanced with new properties, including derivational and morpho-syntactic and semantic relations • Morphological system • Contextual rules and grammars • Domain specific dictionary (sample “legal terms”)CICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 15. LINGUISTIC RESOURCES General language dictionary entries impress,V+FLX=POLISH+SAL=PVPCpleasetype+PT=impressionar+DRV=NDRV01:BOOK+ VSUP=make+VSUP=cause+NPREP=on Morpho-syntactic aesthetic,AFLX=NATURAL+SAL=AVstate+PT=aesthetically+DRV=AVDRV03 and semantic relations skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02 NDRV04 = <B>ion/Npred+Nom Rules to transform morpho-syntactically ADRV02 = <B>icable and semantically AVDRV01 = <E>ly/ADV related words of different parts of AVDRV04 = <B>tically/ADV speech Grammar to recognize adverbial compounds and transform them into equivalent single adverbs Contextual rulesRules to improve precisionin specific contexts[bring(vt)) N(charge; action)> present(vt) N(idem)]CICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 16. LINGUISTIC RESOURCES Sample of terms classified as Information + Instructional/legalCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 17. EVALUATION RESULTS: PARAPHRASING PRECISION Corpus: 500 sentences 100 sentences for each of 5 elementary support verbs SVC Recognition SVC Recognition SVC Paraphrasing Precision Recall Precision Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6% Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1% Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7% Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1% Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8% Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4% Evaluation of recognition and paraphrasing of support verb constructionsCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 18. EVALUATION RESULTS: IMPACT ON TRANSLATABILITY (MT) Same corpus, 50 sentences selected randomly (i) automated pre-processing of support verb constructions with SPIDER and conversion into equivalent single verbs (ii) pre-processed sentences (automatically generated paraphrases) and original text are submitted to MT and the output translations for both original and pre-processed sentences were compared • 29 (58%) of the best translations were of automatically generated paraphrases • 9 (18%) were of support verb constructions • 12 (24%) were equally bad or equally good CONCLUSION The experiment indicates that paraphrases such as those generated by SPIDER help improve translation scores • The automated paraphrasing of support verb constructions through SPIDER allowed a significant improvement of the quality of the MT results in that contextCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 19. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCHCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 20. FUTURE APPLICATIONS? • Writing / authoring aid (word processing applications) • Language composition tool - general and technical language (e.g. student texts or legal texts) • Text production and style editor • Terminology verification tool - professional use of terminology in technical domains (elimination of informal, idiomatic, slang use of language) • Empirical testbed for linguistic quality assurance (source and target texts) • Text editing (machine translation pre-editing and post-editing) and translation aid • Controlled language tool • Consistent, direct, and simple language • Restricted grammar (avoid certain types of construction) • Avoid complex reasoning, figures of speech, metaphors, etc. • Elimination of wordiness • “Revision memory” tool (≈ “translation memory”) - recycling of validated reviewed sentences, structures or phrasesCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 21. FUTURE RESEARCH FROM SPIDER TO MACHINE TRANSLATION a fazer um estágio para dar aulas de / tutor Religião a fazer um estágio para dar aulas de / lecture Religião a fazer um estágio para dar aulas de / teach Religião começa a dar exemplos / exemplify : sentia-se capaz de dar um murro em / punch quem quisesse detê-lo gostávamos de lhe dar uma palavrinha / speak . $ENCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan
  • 22. SPIDER: A SYSTEM FOR PARAPHRASING IN DOCUMENT EDITING AND REVISION APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING Anabela Barreiro ab@metatrad.comCICLing 2011 February 20-26, 2011Anabela Barreiro Tokyo, Japan

×