SlideShare a Scribd company logo
1 of 22
Download to read offline
SPIDER: A SYSTEM FOR PARAPHRASING
       IN DOCUMENT EDITING AND REVISION
          APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING




                            Anabela Barreiro



                              ab@metatrad.com




CICLing 2011                                        February 20-26, 2011
Anabela Barreiro                                    Tokyo, Japan
OUTLINE
          INTRODUCTION
                  PARAPHRASES IN NLP
                  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

          SPIDER
                  FIRST STEPS
                  IMPORTANT FEATURES
                  PARAPHRASES COVERED BY SPIDER
                  INTERFACE
                  LINGUISTIC RESOURCES
                  EVALUATION RESULTS

          THE FUTURE
                  FUTURE APPLICATIONS?
                  FUTURE RESEARCH


CICLing 2011                                                              February 20-26, 2011
Anabela Barreiro                                                          Tokyo, Japan
IMPORTANCE OF PARAPHRASES IN NLP TASKS
         Question Answering
          [Ibrahim et al., 2003], [Paşca, 2003], [Duboué & Chu-Carroll, 2006]
         Information Extraction and Text Mining
          [Ibrahim et al., 2003], [Shinyama et al., 2002] [Shinyama & Sekine, 2003],
          [Sekine, 2005] [Paşca, 2005], [Paşca & Dienes, 2005]
         Summarization
          [McKeown et al., 2002], [Barzilay, 2001, 2003], [Hirao et al., 2004] [Zhou et
          al., 2006b]
         Natural Language Generation
          [Iordanskaja et al. 1991]
         Plagiarism Detection
          [Potthast et al., 2010], [Vila et al., 2010]
         Machine Translation
          [Zhou et al., 2006], [Callison-Burch et al., 2006a, 2006b, 2007 and 2008]
          [Barreiro, 2008, 2009, 2011]



CICLing 2011                                                                    February 20-26, 2011
Anabela Barreiro                                                                Tokyo, Japan
THE PRACTICAL NEED FOR PARAPHRASES
                    IN PEDAGOGICAL CONTEXTS

          Text Processing and Authoring Aids
           Writing and revision of original/creative/customized texts
          Learning Tools
           Native and second language learning
           Creation of clear and understandable text content
           e.g. students learning language and writing skills
          Style Editors
           Uniformization /consistency of style




CICLing 2011                                                            February 20-26, 2011
Anabela Barreiro                                                        Tokyo, Japan
THE PRACTICAL NEED FOR PARAPHRASES
                    IN PROFESSIONAL CONTEXTS
          Technical Writing
           Professional high quality documentation and domain-specific texts
           Controlled language
          Linguistic Quality Assurance
           Linguistic quality of generic texts and specialized documentation
           Verification/validation of meaningful content
          Text Optimization
           Readable / publishable texts (business-oriented or purpose-oriented content)
          Terminology
           Search for the “exact” term or relevant keywords
          Translation
           Indispensable for human and machine translation (pre-editing and post-editing)


CICLing 2011                                                                   February 20-26, 2011
Anabela Barreiro                                                               Tokyo, Japan
OUTLINE
          INTRODUCTION
                  PARAPHRASES IN NLP
                  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

          SPIDER
                  FIRST STEPS
                  IMPORTANT FEATURES
                  PARAPHRASES COVERED BY SPIDER
                  INTERFACE
                  LINGUISTIC RESOURCES
                  EVALUATION RESULTS

          THE FUTURE
                  FUTURE APPLICATIONS?
                  FUTURE RESEARCH


CICLing 2011                                                              February 20-26, 2011
Anabela Barreiro                                                          Tokyo, Japan
SPIDER PARAPHRASING SYSTEM
                                      FIRST STEPS

            Initially developed for Portuguese
            1st version – ReEscreve
            publicly available service at http://www.linguateca.pt/ReEscreve/

            2nd version – eSPERTo (Portuguese: the smart/clever one; expert)
            currently being integrated in a cyber school project within the scope of an
            educational program

            Writing exercises – students learning how to improve their writing skills in
            the Portuguese language

            English SPIDER
            prototype to assist writing of domain-specific texts



CICLing 2011                                                               February 20-26, 2011
Anabela Barreiro                                                           Tokyo, Japan
SPIDER
                             IMPORTANT FEATURES
       Applies linguistic knowledge to recognize and generate paraphrases
      automatically (preserves the source text semantics and grammaticality -
      inflectional features) in the suggestions provided (included transformations of
      multi-word units)
       Uses text-editing mechanisms which provide a variety of alternatives for
      each expression and the possibility to choose among them (according to
      personal preferences, style, idiomacity, etc.)
       Allows users to suggest new expressions that can be immediately applied
      to their text, making the text editing process easier, more flexible, and
      upgradable
       Designed to help with writing optimization, understandability and
      translatability (improvement of the quality of the source text so that it can cause
      a positive impact in translation)


CICLing 2011                                                                 February 20-26, 2011
Anabela Barreiro                                                             Tokyo, Japan
PARAPHRASES COVERED BY SPIDER
       Synonyms in context (ex: phrasal verbs into equivalent expressions)
             to clear up (weather) = (weather) to become better/brighter
       Support verb constructions into single verbs and stylistic variants
             to make a decision = to decide; to make an audit = to perform an audit
       Aspectual constructions into single verbs
             to launch an attack = to attack
       Adverbials (compounds into single adverbs)
             in a constructive way = constructively
       Relatives into participial adjectives
             the president that was elected = the president elect
       Relatives into possessives
             the role that Europe plays/has = the role of Europe
       Relatives into compound nouns (and vice-versa)
             a container for the milk = a milk container; a bottle made of plastic = a plastic bottle
       Agentive passives into actives
             the man was released by the police officer = the police officer released the man


CICLing 2011                                                                       February 20-26, 2011
Anabela Barreiro                                                                   Tokyo, Japan
INTERFACE
                       SUGGESTIONS FOR EXAMPLE SENTENCES
 Suggestions for general language
      linguistic phenomena



                                                          Compound adverbs >
                                                            single adverbs




                                                                    Relatives >
                                                               participial adjectives



                                         Support verb constructions >
                                                 single verbs




CICLing 2011                                                                   February 20-26, 2011
Anabela Barreiro                                                               Tokyo, Japan
INTERFACE
       SELECTION OF PARAPHRASING GRAMMARS FOR SPECIFIC
                                        LINGUISTIC PHENOMENA
    Users can select among general and technical dictionaries (more than one
selection allowed), grammars for specific linguistic transformations (one, several
or all grammars can be selected). The interface provides sample texts for testing.


                                                                                      Informative details about the
                                                                                       linguistic resources selected




                                                                  Sample LEGAL text




CICLing 2011                                                                                            February 20-26, 2011
Anabela Barreiro                                                                                        Tokyo, Japan
INTERFACE
                          SELECTION OF A DOMAIN DICTIONARY




                                                                                  Identification of legal terms in the text




                       Suggestions for the term “breach of law”

 Users can select one term from the list of suggestions or provide a new suggestion

CICLing 2011                                                                                                 February 20-26, 2011
Anabela Barreiro                                                                                             Tokyo, Japan
INTERFACE
  SUGGESTIONS PROVIDED AND USER’S CAPABILITY TO ADD NEW REWRITING
                                                         OPTIONS




                                                                              The user can suggest new words or
                                                                            expressions (synonyms or paraphrases)

                                                                            It is possible to go back and change the user
                                                                                   option as many times as necessary

                                Text rewritten
                 • In red, the expressions in the source text
    •   In green, suggestions provided by SPIDER and selected by the user




CICLing 2011                                                                                     February 20-26, 2011
Anabela Barreiro                                                                                 Tokyo, Japan
LINGUISTIC RESOURCES
        Eng4NooJ – linguistic knowledge system
       • OpenLogos dictionary (http://logos-os.dfki.de/)
       • converted into NooJ format, and enhanced with new
             properties, including derivational and morpho-syntactic
             and semantic relations
       • Morphological system
       • Contextual rules and grammars
       • Domain specific dictionary (sample “legal terms”)




CICLing 2011                                                 February 20-26, 2011
Anabela Barreiro                                             Tokyo, Japan
LINGUISTIC RESOURCES
                          General language dictionary entries
      impress,V+FLX=POLISH+SAL=PVPCpleasetype+PT=impressionar+DRV=NDRV01:BOOK+
      VSUP=make+VSUP=cause+NPREP=on                                   Morpho-syntactic
      aesthetic,AFLX=NATURAL+SAL=AVstate+PT=aesthetically+DRV=AVDRV03 and semantic
                                                                         relations
      skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02

       NDRV04 = <B>ion/Npred+Nom                 Rules to transform
                                                morpho-syntactically
       ADRV02 = <B>icable                         and semantically
       AVDRV01 = <E>ly/ADV                        related words of
                                                  different parts of
       AVDRV04 = <B>tically/ADV                        speech
                                                                       Grammar to recognize adverbial compounds and
                                                                        transform them into equivalent single adverbs


      Contextual rules

Rules to improve precision
in specific contexts
[bring(vt)) N(charge; action)
> present(vt) N(idem)]



CICLing 2011                                                                                     February 20-26, 2011
Anabela Barreiro                                                                                 Tokyo, Japan
LINGUISTIC RESOURCES




                                          Sample of terms classified
                                              as Information +
                                             Instructional/legal




CICLing 2011                                  February 20-26, 2011
Anabela Barreiro                              Tokyo, Japan
EVALUATION RESULTS: PARAPHRASING
                                     PRECISION
                   Corpus: 500 sentences
                   100 sentences for each of 5 elementary support verbs

                     SVC Recognition            SVC Recognition            SVC Paraphrasing
                        Precision                    Recall                    Precision
       Pôr              73/73 - 100%              73/100 – 73%                72/73 - 98.6%
       Tomar            75/75 - 100%              75/100 – 75%                68/73 - 93.1%
       Ter              65/65 - 100%              65/100 – 65%                59/65 - 90.7%
       Dar               57/60 - 95%              57/100 – 57%                46/51 - 90.1%
       Fazer           43/45 – 95.5%              43/100 – 43%                40/45 - 88.8%
       Average        62.6/63.6 - 98.4%          62.6/100 - 62.6%             57/61 - 93.4%

                              Evaluation of recognition and paraphrasing
                                    of support verb constructions



CICLing 2011                                                                     February 20-26, 2011
Anabela Barreiro                                                                 Tokyo, Japan
EVALUATION RESULTS: IMPACT ON
                   TRANSLATABILITY (MT)
     Same corpus, 50 sentences selected randomly

     (i) automated pre-processing of support verb constructions with SPIDER and
          conversion into equivalent single verbs
     (ii) pre-processed sentences (automatically generated paraphrases) and original text
          are submitted to MT and the output translations for both original and pre-processed
          sentences were compared

     • 29 (58%) of the best translations were of automatically generated paraphrases
     • 9 (18%) were of support verb constructions
     • 12 (24%) were equally bad or equally good

     CONCLUSION
     The experiment indicates that paraphrases such as those generated by SPIDER help
     improve translation scores
     • The automated paraphrasing of support verb constructions through SPIDER
       allowed a significant improvement of the quality of the MT results in that context

CICLing 2011                                                                  February 20-26, 2011
Anabela Barreiro                                                              Tokyo, Japan
OUTLINE
          INTRODUCTION
                  PARAPHRASES IN NLP
                  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

          SPIDER
                  FIRST STEPS
                  IMPORTANT FEATURES
                  PARAPHRASES COVERED BY SPIDER
                  INTERFACE
                  LINGUISTIC RESOURCES
                  EVALUATION RESULTS

          THE FUTURE
                  FUTURE APPLICATIONS?
                  FUTURE RESEARCH


CICLing 2011                                                              February 20-26, 2011
Anabela Barreiro                                                          Tokyo, Japan
FUTURE APPLICATIONS?
     •     Writing / authoring aid (word processing applications)
     •     Language composition tool - general and technical language (e.g. student texts or legal
     texts)
     •     Text production and style editor
     •     Terminology verification tool - professional use of terminology in technical domains
                (elimination of informal, idiomatic, slang use of language)
     •      Empirical testbed for linguistic quality assurance (source and target texts)
     •     Text editing (machine translation pre-editing and post-editing) and translation aid
     •     Controlled language tool
                   •   Consistent, direct, and simple language
                   •   Restricted grammar (avoid certain types of construction)
                   •   Avoid complex reasoning, figures of speech, metaphors, etc.
                   •   Elimination of wordiness
     •     “Revision memory” tool (≈ “translation memory”) - recycling of validated reviewed
                sentences, structures or phrases



CICLing 2011                                                                               February 20-26, 2011
Anabela Barreiro                                                                           Tokyo, Japan
FUTURE RESEARCH
                    FROM SPIDER TO MACHINE TRANSLATION

         a fazer um estágio para   dar aulas de / tutor         Religião
         a fazer um estágio para   dar aulas de / lecture       Religião
         a fazer um estágio para   dar aulas de / teach         Religião
         começa a                  dar exemplos / exemplify     :
         sentia-se capaz de        dar um murro em / punch      quem quisesse detê-lo
         gostávamos de lhe         dar uma palavrinha / speak   .




                                                                                    $EN



CICLing 2011                                                               February 20-26, 2011
Anabela Barreiro                                                           Tokyo, Japan
SPIDER: A SYSTEM FOR PARAPHRASING
       IN DOCUMENT EDITING AND REVISION
          APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING




                            Anabela Barreiro



                              ab@metatrad.com




CICLing 2011                                        February 20-26, 2011
Anabela Barreiro                                    Tokyo, Japan

More Related Content

Similar to SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...
Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...
Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...Luz Rello
 
CALL (computer Assisted Language)
CALL (computer Assisted Language)CALL (computer Assisted Language)
CALL (computer Assisted Language)syeda12345
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...IJECEIAES
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languageshs0041
 
Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataFernando Silva Parreiras
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...IJCI JOURNAL
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...ijnlc
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET Journal
 
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...IRJET Journal
 
Metaphors in the ESP business class
Metaphors in the ESP business classMetaphors in the ESP business class
Metaphors in the ESP business classTomate Algo Ecuador
 
Language acquisition
Language acquisitionLanguage acquisition
Language acquisitionMuzo Bacan
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpusThennarasuSakkan
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
Pos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil TextsPos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil Textsijcnes
 

Similar to SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro (20)

Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...
Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...
Ricardo Baeza-Yates, Luz Rello - Lexical Quality of Social Media - ICWSM - FO...
 
CALL (computer Assisted Language)
CALL (computer Assisted Language)CALL (computer Assisted Language)
CALL (computer Assisted Language)
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
Content Writing Optimization with ReWriter
Content Writing Optimization with ReWriterContent Writing Optimization with ReWriter
Content Writing Optimization with ReWriter
 
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
Different valuable tools for Arabic sentiment analysis: a comparative evaluat...
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
 
Towards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software DataTowards a Marketplace of Open Source Software Data
Towards a Marketplace of Open Source Software Data
 
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
Ara--CANINE: Character-Based Pre-Trained Language Model for Arabic Language U...
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language ModelsIRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
IRJET- Tamil Speech to Indian Sign Language using CMUSphinx Language Models
 
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...
 
Metaphors in the ESP business class
Metaphors in the ESP business classMetaphors in the ESP business class
Metaphors in the ESP business class
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
Language acquisition
Language acquisitionLanguage acquisition
Language acquisition
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Pos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil TextsPos Tagging for Classical Tamil Texts
Pos Tagging for Classical Tamil Texts
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F)

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Nooj2017 cmota-etal
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

  • 1. SPIDER: A SYSTEM FOR PARAPHRASING IN DOCUMENT EDITING AND REVISION APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING Anabela Barreiro ab@metatrad.com CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 2. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCH CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 3. IMPORTANCE OF PARAPHRASES IN NLP TASKS  Question Answering [Ibrahim et al., 2003], [Paşca, 2003], [Duboué & Chu-Carroll, 2006]  Information Extraction and Text Mining [Ibrahim et al., 2003], [Shinyama et al., 2002] [Shinyama & Sekine, 2003], [Sekine, 2005] [Paşca, 2005], [Paşca & Dienes, 2005]  Summarization [McKeown et al., 2002], [Barzilay, 2001, 2003], [Hirao et al., 2004] [Zhou et al., 2006b]  Natural Language Generation [Iordanskaja et al. 1991]  Plagiarism Detection [Potthast et al., 2010], [Vila et al., 2010]  Machine Translation [Zhou et al., 2006], [Callison-Burch et al., 2006a, 2006b, 2007 and 2008] [Barreiro, 2008, 2009, 2011] CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 4. THE PRACTICAL NEED FOR PARAPHRASES IN PEDAGOGICAL CONTEXTS  Text Processing and Authoring Aids Writing and revision of original/creative/customized texts  Learning Tools Native and second language learning Creation of clear and understandable text content e.g. students learning language and writing skills  Style Editors Uniformization /consistency of style CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 5. THE PRACTICAL NEED FOR PARAPHRASES IN PROFESSIONAL CONTEXTS  Technical Writing Professional high quality documentation and domain-specific texts Controlled language  Linguistic Quality Assurance Linguistic quality of generic texts and specialized documentation Verification/validation of meaningful content  Text Optimization Readable / publishable texts (business-oriented or purpose-oriented content)  Terminology Search for the “exact” term or relevant keywords  Translation Indispensable for human and machine translation (pre-editing and post-editing) CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 6. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCH CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 7. SPIDER PARAPHRASING SYSTEM FIRST STEPS Initially developed for Portuguese 1st version – ReEscreve publicly available service at http://www.linguateca.pt/ReEscreve/ 2nd version – eSPERTo (Portuguese: the smart/clever one; expert) currently being integrated in a cyber school project within the scope of an educational program Writing exercises – students learning how to improve their writing skills in the Portuguese language English SPIDER prototype to assist writing of domain-specific texts CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 8. SPIDER IMPORTANT FEATURES  Applies linguistic knowledge to recognize and generate paraphrases automatically (preserves the source text semantics and grammaticality - inflectional features) in the suggestions provided (included transformations of multi-word units)  Uses text-editing mechanisms which provide a variety of alternatives for each expression and the possibility to choose among them (according to personal preferences, style, idiomacity, etc.)  Allows users to suggest new expressions that can be immediately applied to their text, making the text editing process easier, more flexible, and upgradable  Designed to help with writing optimization, understandability and translatability (improvement of the quality of the source text so that it can cause a positive impact in translation) CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 9. PARAPHRASES COVERED BY SPIDER  Synonyms in context (ex: phrasal verbs into equivalent expressions) to clear up (weather) = (weather) to become better/brighter  Support verb constructions into single verbs and stylistic variants to make a decision = to decide; to make an audit = to perform an audit  Aspectual constructions into single verbs to launch an attack = to attack  Adverbials (compounds into single adverbs) in a constructive way = constructively  Relatives into participial adjectives the president that was elected = the president elect  Relatives into possessives the role that Europe plays/has = the role of Europe  Relatives into compound nouns (and vice-versa) a container for the milk = a milk container; a bottle made of plastic = a plastic bottle  Agentive passives into actives the man was released by the police officer = the police officer released the man CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 10. INTERFACE SUGGESTIONS FOR EXAMPLE SENTENCES Suggestions for general language linguistic phenomena Compound adverbs > single adverbs Relatives > participial adjectives Support verb constructions > single verbs CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 11. INTERFACE SELECTION OF PARAPHRASING GRAMMARS FOR SPECIFIC LINGUISTIC PHENOMENA Users can select among general and technical dictionaries (more than one selection allowed), grammars for specific linguistic transformations (one, several or all grammars can be selected). The interface provides sample texts for testing. Informative details about the linguistic resources selected Sample LEGAL text CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 12. INTERFACE SELECTION OF A DOMAIN DICTIONARY Identification of legal terms in the text Suggestions for the term “breach of law” Users can select one term from the list of suggestions or provide a new suggestion CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 13. INTERFACE SUGGESTIONS PROVIDED AND USER’S CAPABILITY TO ADD NEW REWRITING OPTIONS The user can suggest new words or expressions (synonyms or paraphrases) It is possible to go back and change the user option as many times as necessary Text rewritten • In red, the expressions in the source text • In green, suggestions provided by SPIDER and selected by the user CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 14. LINGUISTIC RESOURCES  Eng4NooJ – linguistic knowledge system • OpenLogos dictionary (http://logos-os.dfki.de/) • converted into NooJ format, and enhanced with new properties, including derivational and morpho-syntactic and semantic relations • Morphological system • Contextual rules and grammars • Domain specific dictionary (sample “legal terms”) CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 15. LINGUISTIC RESOURCES General language dictionary entries impress,V+FLX=POLISH+SAL=PVPCpleasetype+PT=impressionar+DRV=NDRV01:BOOK+ VSUP=make+VSUP=cause+NPREP=on Morpho-syntactic aesthetic,AFLX=NATURAL+SAL=AVstate+PT=aesthetically+DRV=AVDRV03 and semantic relations skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02 NDRV04 = <B>ion/Npred+Nom Rules to transform morpho-syntactically ADRV02 = <B>icable and semantically AVDRV01 = <E>ly/ADV related words of different parts of AVDRV04 = <B>tically/ADV speech Grammar to recognize adverbial compounds and transform them into equivalent single adverbs Contextual rules Rules to improve precision in specific contexts [bring(vt)) N(charge; action) > present(vt) N(idem)] CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 16. LINGUISTIC RESOURCES Sample of terms classified as Information + Instructional/legal CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 17. EVALUATION RESULTS: PARAPHRASING PRECISION Corpus: 500 sentences 100 sentences for each of 5 elementary support verbs SVC Recognition SVC Recognition SVC Paraphrasing Precision Recall Precision Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6% Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1% Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7% Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1% Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8% Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4% Evaluation of recognition and paraphrasing of support verb constructions CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 18. EVALUATION RESULTS: IMPACT ON TRANSLATABILITY (MT) Same corpus, 50 sentences selected randomly (i) automated pre-processing of support verb constructions with SPIDER and conversion into equivalent single verbs (ii) pre-processed sentences (automatically generated paraphrases) and original text are submitted to MT and the output translations for both original and pre-processed sentences were compared • 29 (58%) of the best translations were of automatically generated paraphrases • 9 (18%) were of support verb constructions • 12 (24%) were equally bad or equally good CONCLUSION The experiment indicates that paraphrases such as those generated by SPIDER help improve translation scores • The automated paraphrasing of support verb constructions through SPIDER allowed a significant improvement of the quality of the MT results in that context CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 19. OUTLINE INTRODUCTION  PARAPHRASES IN NLP  PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS SPIDER  FIRST STEPS  IMPORTANT FEATURES  PARAPHRASES COVERED BY SPIDER  INTERFACE  LINGUISTIC RESOURCES  EVALUATION RESULTS THE FUTURE  FUTURE APPLICATIONS?  FUTURE RESEARCH CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 20. FUTURE APPLICATIONS? • Writing / authoring aid (word processing applications) • Language composition tool - general and technical language (e.g. student texts or legal texts) • Text production and style editor • Terminology verification tool - professional use of terminology in technical domains (elimination of informal, idiomatic, slang use of language) • Empirical testbed for linguistic quality assurance (source and target texts) • Text editing (machine translation pre-editing and post-editing) and translation aid • Controlled language tool • Consistent, direct, and simple language • Restricted grammar (avoid certain types of construction) • Avoid complex reasoning, figures of speech, metaphors, etc. • Elimination of wordiness • “Revision memory” tool (≈ “translation memory”) - recycling of validated reviewed sentences, structures or phrases CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 21. FUTURE RESEARCH FROM SPIDER TO MACHINE TRANSLATION a fazer um estágio para dar aulas de / tutor Religião a fazer um estágio para dar aulas de / lecture Religião a fazer um estágio para dar aulas de / teach Religião começa a dar exemplos / exemplify : sentia-se capaz de dar um murro em / punch quem quisesse detê-lo gostávamos de lhe dar uma palavrinha / speak . $EN CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 22. SPIDER: A SYSTEM FOR PARAPHRASING IN DOCUMENT EDITING AND REVISION APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING Anabela Barreiro ab@metatrad.com CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan