This paper reports our first attempt of integrating eSPERTo’s paraphrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eSPERTo’s base resources, and the necessary modifications to these resources
that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
1. technology
from seed
ESPERTO’S PARAPHRASTIC KNOWLEDGE
APPLIED TO QUESTION-ANSWERING
AND SUMMARIZATION
Cristina Mota
Luísa Coheur
Ricardo Ribeiro
Francisco Raposo
Anabela Barreiro
NOOJ International Conference- České Budějovice, June 10th 2016
3. eSPERTo – System for Paraphrasing in Editing and Revision of Texts
• Main objective
– Design and development of a linguistically enhanced paraphrase generator
• Semantico-syntactic and multiword units
• Sensitive to context
• Method
– Hybrid system, combining statistics and linguistic knowledge to identify and generate new and
more complex paraphrases
– Exploitation of existing paraphrasing resources
• Web platform
– Interactive application to help Portuguese language learners in producing and revising their
texts
– Text-editing mechanisms which provide a variety of alternatives for each expression
– Users can choose or suggest expressions that can be immediately applied to their text
– Support to writing optimization, understandability and translatability
Introduction to the eSPERTo Project
3
5. eSPERTo Paraphrase Processing
5
noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt
eSPERTo Web Interface
User configuration
eSPERTo Web Interface
Result presentation
teste.txt:0,17,O homem que é americano
teste.txt:0,17,O homem de América
teste.txt:0,17,O homem de nacionalidade americana
teste.txt:0,17,O homem de naturalidade americana
teste.txt:0,17,O homem de origem americana
teste.txt:0,39,o trabalho foi apresentado por O homem americano
teste.txt:18,10,efectuar apresentação
teste.txt:18,10,fazer apresentação
teste.txt:18,10,realizar apresentação
7. • Port4NooJ is the Portuguese module for NooJ (Silberztein
2005, 2016)
• Derived from OpenLogos EN-PT bilingual resources
(http://logos-os.dfki.de/)
• Enhanced with new properties, including derivational and
morpho-syntactic, semantic relations, paraphrastic
knowledge
eSPERTo Resources: Port4NooJ 2.0
8. • Semantico-Syntactic Abstraction Language (SAL) properties
• Multiword Units
• Support Verb Constructions
• Inflectional and Derivational Descriptions
• Grammars
– Morphological: to handle contractions
– Syntactic:
• identify and annotate dates and temporal expressions,
• disambiguate words or sequences of words, i.e., to filter out
lexical or syntactic annotations in the text
• paraphrase several types of constructions
• translate simple sentences
eSPERTo Resources: Port4NooJ 2.0
9. eSPERTo Paraphrases (subset)
• Support verb constructions into single verbs
– to make a decision = to decide
– to give support to N(AN) = to support N(AN)
– to get into contact with = to contact
• Support verb constructions into their stylistic variants
– to make an audit = to perform an audit
– to make an impression = to create an impression
• Adverbs (compounds into single adverbs)
– in a constructive way = constructively
• Agentive passives into actives (and vice-versa)
– the young man is released by the police officer
= the police officer releases the young man
10. • Adjective constructions supported by different copulative verbs
– estar perdido (to be lost) = andar perdido (walk around lost)
• Constructions involving patronymic adjectives
– (de origem portuguesa (of Portuguese origin/roots) = portugueses (Portuguese) = de Portugal
(from Portugal)
• Generic noun phrases
– é um indivíduo estúpido (he is a fool) = é um estúpido (he is a fool) = é estúpido (he is a fool)
• Cross-constructions
– o idiota do rapaz (the idiot of the boy) = o rapaz é um idiota (the boy is an idiot)
• Appropriate noun constructions
– foi moderado nos seus comentários (he was moderated in his comments) = os seus comentários
foram moderados (his comments were moderated) = foi moderado (he was moderated)
eSPERTo Paraphrases (subset)
10
12. Application 1 – Question-answering
12
• EDGAR has a knowledge base built on question/answer pairs
• Explore eSPERTo paraphrases to enrich EDGAR knowledge base
provide all possible ways of rewriting the same question
• EDGAR calculates the lexical distance between a user utterance
and each question in the knowledge base The question with the
shortest distance to the user utterance will trigger the answer
• The paraphrase generator allows the same answer to semantically
equivalent questions
Scenario: EDGAR is a conversational agent that answers
visitors questions in a museum (Fialho et al., 2013).
13. Application 1 – Question-answering
13
Onde é que nasceste?
Nasceste onde?
Qual é que é o seu local de nascimento?
O seu local de nascimento é qual?
Qual é que é a tua nacionalidade?
A tua nacionalidade é qual?
És de onde?
És daqui?
És português?
De onde é que és?
És de Portugal?
És de origem portuguesa?
És de nacionalidade portuguesa?
Nasci em Portugal, mas sou Inglês, …
14. A1 – Question-answering Evaluation
14
• EDGAR’s KB had originally 848 sentences
• eSPERTo matched 2028 times with sequences from these
sentences, being 359 unique matches
• To avoid looping during the expansion of the knowledge base,
some paraphrases such as ingleses / que são ingleses (English /
that are English) were discarded
Recall Precision F-Measure
Baseline 0.7972 0.7889 0.7930
Baseline+eSPERTo 0.8149 0.7763 0.7951
15. • Qual é que é o seu nome [completo ficar completo] ?
– multiword: nome completo
– disambiguation: completo, V should be eliminated, leaving
just completo, A
• [Como Tomar comida] é que te chamas?
– Priority dictionary: como, ADV; como, CONJ
• [Vives Fazer vida] onde?
– Vsup should inflect as the original verb
A1 – Question-answering Evaluation
15
17. Application 2 – Summarization
17
• Explore eSPERTo paraphrases to identify redundant information
rewrite different phrases that are equivalent with the same paraphrase
• Main challenge: identify the best candidate among the equivalent
expressions to be used in rewriting the text
• eSPERTo was used in the summarization pre-processing phase
• Evaluation was done with TeMário, a corpus of 100 newspaper articles in
Brazilian Portuguese (Pardo and Rino 2003)
generate different versions of TeMário by using different
paraphrasing grammars
Scenario: Summarization component (Ribeiro, 2011) of
SSNT, a system for selective dissemination of multimedia
content (Neto et al., 2003; Trancoso et al., 2003; Amaral et al., 2007)
18. • Evaluation of three different groups of paraphrases:
– (i) active/passive
– (ii) constructions involving patronymic adjectives
is it better to use the shortest construction (o prefeito carioca) or
the one with the equivalent toponym (o prefeito do Rio de
Janeiro)?
– (iii) simple adverb (rapidamente) / equivalent adjectival (de
modo|forma/jeito rápid(a/o)) or nominal (com rapidez)
construction
Application 2 – Summarization
18
19. Quais seriam as reações desejáveis no campo macroeconômico por parte das autoridades
da Europa e do Japão .
…
Com frequência cada vez maior as ações e os bônus parecem mover-se juntos .
Quando os bancos centrais derrubam os preços dos títulos as ações tendem a
acompanhá-los mesmo quando o aperto do crédito foi desencadeado pela perspectiva
de lucros e produção em alta .
…
E se vários países atuassem em conjunto Espanha Itália França e Reino Unido uma
modesta apreciação do dólar melhoraria a competitividade européia
de maneira muito oportuna .
Application 2 – Summarization
19
a perspectiva desencadeou o aperto do crédito
frequentemente
europeias e nipónico
oportunamente
Example of paraphrasing with shortest constructions
20. Application 2 – Summarization
20
ID Paraphrase Type
Documents
rewritten
Sequences
rewritten
1 ADVmente → (de modo | maneira | jeito A) | (com N) 80 215
2 (de modo | maneira | jeito A) → ADVmente 73 322
3 SAN 70 305
4 Passive → Active 7 7
5 Active → Passive 33 58
2,3,
4
Shortest 90 682
22. • First impression
– Minor improvements in performance of both the
conversational agent and of summarization task
• Next steps
– Analyze thoroughly:
• the results of paraphrasing
correct problems at the source (eSPERTo)
identify domain specific problems and solutions
• differences of performance with and without paraphrasing
identify best parameterization of resources to parapharase
– Adapt eSPERTo resources to each case scenario
Conclusions and Future Work
22
23. 23
Thank you!
Acknowledgements
This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo
EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012.