SlideShare a Scribd company logo
technology
from seed
CLUE-Aligner
An Alignment Tool to Annotate Pairs of
Paraphrastic and Translation Units
LREC - Portorož, May2th 2016
ANABELA BARREIRO
INESC-ID
FRANCISCO RAPOSO
INESC-ID / UTL TIAGO
LUÍS
VOICEINTERACTION
Alignment
• Set of correspondences or relationships between linguistic
units which are semantico-syntactically related
– Paraphrases (found within the same language = monolingual)
• EN: to make a distinction between | EN: to distinguish between
– Translations (found in different languages = bilingual)
• EN: to keep it simple | PT: simplificar
Alignment task
• NLP task that consists of the identification of translation or
paraphrastic relationships among those linguistic units
(words, MWU or expressions) in sentence pairs that have been
identified as paraphrases or translations of each other
Introduction
2
• Sure alignments correspond to expressions/translations that
satisfy the criteria for optimum/full equivalence
• They are reciprocal – it is possible to translate the expression
from the source to the target language and vice-versa
• Optimum equivalence refers to the highest level of translation equivalence on
both linguistic and extra-linguistic levels (Bayar,2007)
• venture capital markets | mercados de capital de risco (S)
• Possible alignments correspond to expressions/translations
that satisfy the criteria for approximate equivalence
• They do not meet all of the requirements for absolute
equivalence. They are not reciprocal wrt source/target
language
• began | a vu le jour (P)
has seen the day
3
Sure and Possible Alignments
• Supervised learning uses high quality alignments, hand-
made by linguists (Blunsom & Cohn, 2006; Ambati et al., 2010)
– supervised methods take into consideration context, syntax
and other grammatical and sematic information
• Guidelines for manual alignment:
– English–French - Blinker project (Melamed, 1998)
– Czech–English (Kruijff-Korbayová et al., 2006; Bojar &
Prokopová, 2006)
– Spanish–English (Lambert et al., 2005)
– Paraphrase alignment guidelines (Callison-Burch et al. 2008)
Background
4
1. Lack of multilingual datasets
– Publicly available alignments are mostly bilingual, with the
exception of 6 multilingual sets (Graça et al., 2008)
2. Lack of linguistically-motivated alignment guidelines
– Previously proposed guidelines cover cross-linguistic
phenomena superficially, excluding important alignment
challenges presented by discontiguous MWU (DMWU) and
other non-adjacent linguistic phenomena or syntactic
discontinuity (e.g., extraposition, topicalization, etc.)
3. Lack of tools
– Tools are inefficient with DMWU and phrasal expressions
that are complex to align and require representation as non-
contiguous block alignments
Current Shortcomings
5
– Alpaco - Blinker project (Rassier & Pedersen, 2003)
– ICA - Interactive Clue Aligner (Tiedemann, 2003; 2004; 2011)
*The "clue alignment approach” is based on mainly word-level alignment
clues. Our approach is based on manual alignments of cross-language MWU
and phrasal expressions -- that allows representing semantically equivalent
non-adjacent structures, such as DMWU in translation and paraphrasing
– Yawat (Germann, 2008)
– SWIFT (Gilmanov et al., 2014)
– among others
Related Alignment Tools
6
• Web alignment interactive tool inspired in Linear-B (Callison-
Burch & Bannard, 2004), (Callison-Burch, 2007)
• Allows the block-alignment of contiguous and DMWU
• Uses a matrix visualization and a coloring schemes that help
distinguish between sure and possible alignments
• Allows storage of pairs of paraphrastic units, with indication
of the place of insertions, represented by "[ ]"
– I urge [ ] to | Exorto [ ] a
– This feature is valuable in the construction of translation
rules or grammars and syntactic parsers that use those
paraphrastic pairs, for which precision is important
– It is also important in ML to help learning constituents
7
CLUE* = Cross-Language Unit Elicitation
CLUE-Aligner
insertion
insertion
Black cells represent full/optimal semantic correspondence
Grey cells represent approximate semantic correspondence
Light orange cell groups represent unaligned P-insertions
Dark orange cell groups represent unaligned S-insertions
pre-processing of
contracted forms
still ainda
CLUE-Aligner Interface
Single Word Alignments
and Block Alignments
Discontiguous Multiwords
and InsertionsLight green cell / cell groups represent aligned P-insertions
Dark green cell / cell groups represent aligned S-insertions
• Inspired by the Logos Model (Scott, 2003; Barreiro et al.,
2011), which relies on deep semantico-syntactic analysis to
translate contiguous and DMWU, often mistranslated by MT
systems – have proven successful in commercial MT systems
• to draw a distinction between
• to bring [INSERTION] to a conclusion
• I would urge the European Commission to bring the process of
adopting the directive on additional pensions to a conclusion
• Supported by the Lexicon-Grammar theoretical framework
and transformational grammar (Gross, 1968; 1975)
• The alignment task of the translation pairs of units resulted in
a gold collection, achievable due to the CLUE-Aligner
Alignment Guidelines
10
• Allows visualization of automatic phrase alignments and can
be used for correcting inaccurate alignments
– can load previously (and, possibly, automatically) generated
alignments (segments) for the parallel sentences
• Allows alignment of smaller individual or MWU inside DMWU
• Useful in human and machine translation evaluation
• Future development plans include automatic alignment
– alignments containing pairs of paraphrastic or translation
units can be used to train ML systems
• Developed under the scope of the eSPERTo project
https://esperto.l2f.inesc-id.pt/esperto/aligner/index.pl?
11
CLUE-Aligner
Use of Paraphrastic Units in eSPERTo
12
the man who is American
the man from America
the man with American nationality
…
The American man
https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl
• Linguistic-based alignments extracted from quality corpora:
– Contribute to increased precision and recall in SMT systems, with
subsequent improvement of translation quality
– Are a valuable asset for applications that require monolingual
paraphrases
• We moved forward by creating a tool that handles non-
adjacent structures, allowing the alignment of DMWU and
phrasal expressions to improve translation applications
• Improvements to CLUE-Aligner include:
– to feed it with existing translation or paraphrastic knowledge
previously aligned or generated with a linguistic processing tool
– To enhance it in order to align and extract automatically large
amounts of alignment pairs to be applied to paraphrasing and MT
case studies
Conclusions and Future Work
13
14
Thank you!
Acknowledgements
This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo
EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012

More Related Content

What's hot

Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
SubramanianMuthusamy3
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
Ahmed Magdy Ezzeldin, MSc.
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
CJ Jenkins
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
ThennarasuSakkan
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityJie Bao
 
Equivalence of pda, cfg1
Equivalence of pda, cfg1Equivalence of pda, cfg1
Equivalence of pda, cfg1
Dr. ABHISHEK K PANDEY
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
Dhabal Sethi
 
Wide Coverage Semantic Representations from a CCG Parser
Wide Coverage Semantic Representations from a CCG ParserWide Coverage Semantic Representations from a CCG Parser
Wide Coverage Semantic Representations from a CCG Parser
Mark Chang
 
Construction Grammar
Construction GrammarConstruction Grammar
Construction Grammarmaricell095
 
7. name binding and scopes
7. name binding and scopes7. name binding and scopes
7. name binding and scopes
Zambales National High School
 
Towards a mnemonic classification of software languages
Towards a mnemonic classification of software languagesTowards a mnemonic classification of software languages
Towards a mnemonic classification of software languages
Mikhail Barash
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
CITE
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
Padma Metta
 
Jq3616701679
Jq3616701679Jq3616701679
Jq3616701679
IJERA Editor
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
ijnlc
 
Toc syllabus updated
Toc syllabus updatedToc syllabus updated
Toc syllabus updated
ssuserfa7e73
 
Bluej
BluejBluej

What's hot (19)

Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Using construction grammar in conversational systems
Using construction grammar in conversational systemsUsing construction grammar in conversational systems
Using construction grammar in conversational systems
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
Equivalence of pda, cfg1
Equivalence of pda, cfg1Equivalence of pda, cfg1
Equivalence of pda, cfg1
 
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Wide Coverage Semantic Representations from a CCG Parser
Wide Coverage Semantic Representations from a CCG ParserWide Coverage Semantic Representations from a CCG Parser
Wide Coverage Semantic Representations from a CCG Parser
 
Construction Grammar
Construction GrammarConstruction Grammar
Construction Grammar
 
7. name binding and scopes
7. name binding and scopes7. name binding and scopes
7. name binding and scopes
 
Towards a mnemonic classification of software languages
Towards a mnemonic classification of software languagesTowards a mnemonic classification of software languages
Towards a mnemonic classification of software languages
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
Jq3616701679
Jq3616701679Jq3616701679
Jq3616701679
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 
Toc syllabus updated
Toc syllabus updatedToc syllabus updated
Toc syllabus updated
 
Bluej
BluejBluej
Bluej
 

Viewers also liked

Michel Jacquinot
Michel JacquinotMichel Jacquinot
Michel Jacquinot
Amicalecoworking
 
The Necessary Changes
The Necessary Changes The Necessary Changes
The Necessary Changes
Napoleon Gomez
 
El Yaguareté
El Yaguareté El Yaguareté
El Yaguareté
capacitacioncecilia
 
MAE - Informe diario 12-02-2016
MAE - Informe diario 12-02-2016MAE - Informe diario 12-02-2016
MAE - Informe diario 12-02-2016
Marcelo Pablo Mercs
 
N930075124 Lindsay Rochelle-CV
N930075124 Lindsay Rochelle-CVN930075124 Lindsay Rochelle-CV
N930075124 Lindsay Rochelle-CVRochelle Lindsay
 
GRUPO 4
GRUPO 4GRUPO 4
GRUPO 4
Teach for All
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro
 

Viewers also liked (7)

Michel Jacquinot
Michel JacquinotMichel Jacquinot
Michel Jacquinot
 
The Necessary Changes
The Necessary Changes The Necessary Changes
The Necessary Changes
 
El Yaguareté
El Yaguareté El Yaguareté
El Yaguareté
 
MAE - Informe diario 12-02-2016
MAE - Informe diario 12-02-2016MAE - Informe diario 12-02-2016
MAE - Informe diario 12-02-2016
 
N930075124 Lindsay Rochelle-CV
N930075124 Lindsay Rochelle-CVN930075124 Lindsay Rochelle-CV
N930075124 Lindsay Rochelle-CV
 
GRUPO 4
GRUPO 4GRUPO 4
GRUPO 4
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 

Similar to CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
Kalyanee Baruah
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
iosrjce
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lifeng (Aaron) Han
 
Natural Language Processing and Language Learning
Natural Language Processing and Language LearningNatural Language Processing and Language Learning
Natural Language Processing and Language Learning
antonellarose
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
Amir Abdalla
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
Linda Garcia
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
csandit
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
ijnlc
 

Similar to CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units (20)

Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Cross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold setsCross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold sets
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
ReseachPaper
ReseachPaperReseachPaper
ReseachPaper
 
Natural Language Processing and Language Learning
Natural Language Processing and Language LearningNatural Language Processing and Language Learning
Natural Language Processing and Language Learning
 
SMT3
SMT3SMT3
SMT3
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in PortugueseAutomatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
INESC-ID (Spoken Language Systems Laboratory - L2F)
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Nooj2017 cmota-etal
 
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in PortugueseAutomatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 

CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

  • 1. technology from seed CLUE-Aligner An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units LREC - Portorož, May2th 2016 ANABELA BARREIRO INESC-ID FRANCISCO RAPOSO INESC-ID / UTL TIAGO LUÍS VOICEINTERACTION
  • 2. Alignment • Set of correspondences or relationships between linguistic units which are semantico-syntactically related – Paraphrases (found within the same language = monolingual) • EN: to make a distinction between | EN: to distinguish between – Translations (found in different languages = bilingual) • EN: to keep it simple | PT: simplificar Alignment task • NLP task that consists of the identification of translation or paraphrastic relationships among those linguistic units (words, MWU or expressions) in sentence pairs that have been identified as paraphrases or translations of each other Introduction 2
  • 3. • Sure alignments correspond to expressions/translations that satisfy the criteria for optimum/full equivalence • They are reciprocal – it is possible to translate the expression from the source to the target language and vice-versa • Optimum equivalence refers to the highest level of translation equivalence on both linguistic and extra-linguistic levels (Bayar,2007) • venture capital markets | mercados de capital de risco (S) • Possible alignments correspond to expressions/translations that satisfy the criteria for approximate equivalence • They do not meet all of the requirements for absolute equivalence. They are not reciprocal wrt source/target language • began | a vu le jour (P) has seen the day 3 Sure and Possible Alignments
  • 4. • Supervised learning uses high quality alignments, hand- made by linguists (Blunsom & Cohn, 2006; Ambati et al., 2010) – supervised methods take into consideration context, syntax and other grammatical and sematic information • Guidelines for manual alignment: – English–French - Blinker project (Melamed, 1998) – Czech–English (Kruijff-Korbayová et al., 2006; Bojar & Prokopová, 2006) – Spanish–English (Lambert et al., 2005) – Paraphrase alignment guidelines (Callison-Burch et al. 2008) Background 4
  • 5. 1. Lack of multilingual datasets – Publicly available alignments are mostly bilingual, with the exception of 6 multilingual sets (Graça et al., 2008) 2. Lack of linguistically-motivated alignment guidelines – Previously proposed guidelines cover cross-linguistic phenomena superficially, excluding important alignment challenges presented by discontiguous MWU (DMWU) and other non-adjacent linguistic phenomena or syntactic discontinuity (e.g., extraposition, topicalization, etc.) 3. Lack of tools – Tools are inefficient with DMWU and phrasal expressions that are complex to align and require representation as non- contiguous block alignments Current Shortcomings 5
  • 6. – Alpaco - Blinker project (Rassier & Pedersen, 2003) – ICA - Interactive Clue Aligner (Tiedemann, 2003; 2004; 2011) *The "clue alignment approach” is based on mainly word-level alignment clues. Our approach is based on manual alignments of cross-language MWU and phrasal expressions -- that allows representing semantically equivalent non-adjacent structures, such as DMWU in translation and paraphrasing – Yawat (Germann, 2008) – SWIFT (Gilmanov et al., 2014) – among others Related Alignment Tools 6
  • 7. • Web alignment interactive tool inspired in Linear-B (Callison- Burch & Bannard, 2004), (Callison-Burch, 2007) • Allows the block-alignment of contiguous and DMWU • Uses a matrix visualization and a coloring schemes that help distinguish between sure and possible alignments • Allows storage of pairs of paraphrastic units, with indication of the place of insertions, represented by "[ ]" – I urge [ ] to | Exorto [ ] a – This feature is valuable in the construction of translation rules or grammars and syntactic parsers that use those paraphrastic pairs, for which precision is important – It is also important in ML to help learning constituents 7 CLUE* = Cross-Language Unit Elicitation CLUE-Aligner
  • 8. insertion insertion Black cells represent full/optimal semantic correspondence Grey cells represent approximate semantic correspondence Light orange cell groups represent unaligned P-insertions Dark orange cell groups represent unaligned S-insertions
  • 9. pre-processing of contracted forms still ainda CLUE-Aligner Interface Single Word Alignments and Block Alignments Discontiguous Multiwords and InsertionsLight green cell / cell groups represent aligned P-insertions Dark green cell / cell groups represent aligned S-insertions
  • 10. • Inspired by the Logos Model (Scott, 2003; Barreiro et al., 2011), which relies on deep semantico-syntactic analysis to translate contiguous and DMWU, often mistranslated by MT systems – have proven successful in commercial MT systems • to draw a distinction between • to bring [INSERTION] to a conclusion • I would urge the European Commission to bring the process of adopting the directive on additional pensions to a conclusion • Supported by the Lexicon-Grammar theoretical framework and transformational grammar (Gross, 1968; 1975) • The alignment task of the translation pairs of units resulted in a gold collection, achievable due to the CLUE-Aligner Alignment Guidelines 10
  • 11. • Allows visualization of automatic phrase alignments and can be used for correcting inaccurate alignments – can load previously (and, possibly, automatically) generated alignments (segments) for the parallel sentences • Allows alignment of smaller individual or MWU inside DMWU • Useful in human and machine translation evaluation • Future development plans include automatic alignment – alignments containing pairs of paraphrastic or translation units can be used to train ML systems • Developed under the scope of the eSPERTo project https://esperto.l2f.inesc-id.pt/esperto/aligner/index.pl? 11 CLUE-Aligner
  • 12. Use of Paraphrastic Units in eSPERTo 12 the man who is American the man from America the man with American nationality … The American man https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl
  • 13. • Linguistic-based alignments extracted from quality corpora: – Contribute to increased precision and recall in SMT systems, with subsequent improvement of translation quality – Are a valuable asset for applications that require monolingual paraphrases • We moved forward by creating a tool that handles non- adjacent structures, allowing the alignment of DMWU and phrasal expressions to improve translation applications • Improvements to CLUE-Aligner include: – to feed it with existing translation or paraphrastic knowledge previously aligned or generated with a linguistic processing tool – To enhance it in order to align and extract automatically large amounts of alignment pairs to be applied to paraphrasing and MT case studies Conclusions and Future Work 13
  • 14. 14 Thank you! Acknowledgements This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012