SlideShare a Scribd company logo
1 of 23
technology
from seed
ESPERTO’S PARAPHRASTIC KNOWLEDGE
APPLIED TO QUESTION-ANSWERING
AND SUMMARIZATION
Cristina Mota
Luísa Coheur
Ricardo Ribeiro
Francisco Raposo
Anabela Barreiro
NOOJ International Conference- České Budějovice, June 10th 2016
Overview
2
eSPERTo Paraphrasing
Edgar, Virtual QA Agent
(Fialho et al. 2013)
SSNT Summarization
(Ribeiro 2011)
eSPERTo – System for Paraphrasing in Editing and Revision of Texts
• Main objective
– Design and development of a linguistically enhanced paraphrase generator
• Semantico-syntactic and multiword units
• Sensitive to context
• Method
– Hybrid system, combining statistics and linguistic knowledge to identify and generate new and
more complex paraphrases
– Exploitation of existing paraphrasing resources
• Web platform
– Interactive application to help Portuguese language learners in producing and revising their
texts
– Text-editing mechanisms which provide a variety of alternatives for each expression
– Users can choose or suggest expressions that can be immediately applied to their text
– Support to writing optimization, understandability and translatability
Introduction to the eSPERTo Project
3
eSPERTo Paraphrase Processing
4
https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl
eSPERTo Paraphrase Processing
5
noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt
eSPERTo Web Interface
User configuration
eSPERTo Web Interface
Result presentation
teste.txt:0,17,O homem que é americano
teste.txt:0,17,O homem de América
teste.txt:0,17,O homem de nacionalidade americana
teste.txt:0,17,O homem de naturalidade americana
teste.txt:0,17,O homem de origem americana
teste.txt:0,39,o trabalho foi apresentado por O homem americano
teste.txt:18,10,efectuar apresentação
teste.txt:18,10,fazer apresentação
teste.txt:18,10,realizar apresentação
eSPERTo Paraphrase Processing
6
https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl
the man who is American
the man from America
the man with American nationality
…
The American man
• Port4NooJ is the Portuguese module for NooJ (Silberztein
2005, 2016)
• Derived from OpenLogos EN-PT bilingual resources
(http://logos-os.dfki.de/)
• Enhanced with new properties, including derivational and
morpho-syntactic, semantic relations, paraphrastic
knowledge
eSPERTo Resources: Port4NooJ 2.0
• Semantico-Syntactic Abstraction Language (SAL) properties
• Multiword Units
• Support Verb Constructions
• Inflectional and Derivational Descriptions
• Grammars
– Morphological: to handle contractions
– Syntactic:
• identify and annotate dates and temporal expressions,
• disambiguate words or sequences of words, i.e., to filter out
lexical or syntactic annotations in the text
• paraphrase several types of constructions
• translate simple sentences
eSPERTo Resources: Port4NooJ 2.0
eSPERTo Paraphrases (subset)
• Support verb constructions into single verbs
– to make a decision = to decide
– to give support to N(AN) = to support N(AN)
– to get into contact with = to contact
• Support verb constructions into their stylistic variants
– to make an audit = to perform an audit
– to make an impression = to create an impression
• Adverbs (compounds into single adverbs)
– in a constructive way = constructively
• Agentive passives into actives (and vice-versa)
– the young man is released by the police officer
= the police officer releases the young man
• Adjective constructions supported by different copulative verbs
– estar perdido (to be lost) = andar perdido (walk around lost)
• Constructions involving patronymic adjectives
– (de origem portuguesa (of Portuguese origin/roots) = portugueses (Portuguese) = de Portugal
(from Portugal)
• Generic noun phrases
– é um indivíduo estúpido (he is a fool) = é um estúpido (he is a fool) = é estúpido (he is a fool)
• Cross-constructions
– o idiota do rapaz (the idiot of the boy) = o rapaz é um idiota (the boy is an idiot)
• Appropriate noun constructions
– foi moderado nos seus comentários (he was moderated in his comments) = os seus comentários
foram moderados (his comments were moderated) = foi moderado (he was moderated)
eSPERTo Paraphrases (subset)
10
11
Application 1
Question-answering
Application 1 – Question-answering
12
• EDGAR has a knowledge base built on question/answer pairs
• Explore eSPERTo paraphrases to enrich EDGAR knowledge base
 provide all possible ways of rewriting the same question
• EDGAR calculates the lexical distance between a user utterance
and each question in the knowledge base The question with the
shortest distance to the user utterance will trigger the answer
• The paraphrase generator allows the same answer to semantically
equivalent questions
Scenario: EDGAR is a conversational agent that answers
visitors questions in a museum (Fialho et al., 2013).
Application 1 – Question-answering
13
Onde é que nasceste?
Nasceste onde?
Qual é que é o seu local de nascimento?
O seu local de nascimento é qual?
Qual é que é a tua nacionalidade?
A tua nacionalidade é qual?
És de onde?
És daqui?
És português?
De onde é que és?
És de Portugal?
És de origem portuguesa?
És de nacionalidade portuguesa?
Nasci em Portugal, mas sou Inglês, …
A1 – Question-answering Evaluation
14
• EDGAR’s KB had originally 848 sentences
• eSPERTo matched 2028 times with sequences from these
sentences, being 359 unique matches
• To avoid looping during the expansion of the knowledge base,
some paraphrases such as ingleses / que são ingleses (English /
that are English) were discarded
Recall Precision F-Measure
Baseline 0.7972 0.7889 0.7930
Baseline+eSPERTo 0.8149 0.7763 0.7951
• Qual é que é o seu nome [completo  ficar completo] ?
– multiword: nome completo
– disambiguation: completo, V should be eliminated, leaving
just completo, A
• [Como  Tomar comida] é que te chamas?
– Priority dictionary: como, ADV; como, CONJ
• [Vives  Fazer vida] onde?
– Vsup should inflect as the original verb
A1 – Question-answering Evaluation
15
16
Application 2
Summarization
Application 2 – Summarization
17
• Explore eSPERTo paraphrases to identify redundant information
 rewrite different phrases that are equivalent with the same paraphrase
• Main challenge: identify the best candidate among the equivalent
expressions to be used in rewriting the text
• eSPERTo was used in the summarization pre-processing phase
• Evaluation was done with TeMário, a corpus of 100 newspaper articles in
Brazilian Portuguese (Pardo and Rino 2003)
 generate different versions of TeMário by using different
paraphrasing grammars
Scenario: Summarization component (Ribeiro, 2011) of
SSNT, a system for selective dissemination of multimedia
content (Neto et al., 2003; Trancoso et al., 2003; Amaral et al., 2007)
• Evaluation of three different groups of paraphrases:
– (i) active/passive
– (ii) constructions involving patronymic adjectives
 is it better to use the shortest construction (o prefeito carioca) or
the one with the equivalent toponym (o prefeito do Rio de
Janeiro)?
– (iii) simple adverb (rapidamente) / equivalent adjectival (de
modo|forma/jeito rápid(a/o)) or nominal (com rapidez)
construction
Application 2 – Summarization
18
Quais seriam as reações desejáveis no campo macroeconômico por parte das autoridades
da Europa e do Japão .
…
Com frequência cada vez maior as ações e os bônus parecem mover-se juntos .
Quando os bancos centrais derrubam os preços dos títulos as ações tendem a
acompanhá-los mesmo quando o aperto do crédito foi desencadeado pela perspectiva
de lucros e produção em alta .
…
E se vários países atuassem em conjunto Espanha Itália França e Reino Unido uma
modesta apreciação do dólar melhoraria a competitividade européia
de maneira muito oportuna .
Application 2 – Summarization
19
a perspectiva desencadeou o aperto do crédito
frequentemente
europeias e nipónico
oportunamente
Example of paraphrasing with shortest constructions
Application 2 – Summarization
20
ID Paraphrase Type
Documents
rewritten
Sequences
rewritten
1 ADVmente → (de modo | maneira | jeito A) | (com N) 80 215
2 (de modo | maneira | jeito A) → ADVmente 73 322
3 SAN 70 305
4 Passive → Active 7 7
5 Active → Passive 33 58
2,3,
4
Shortest 90 682
Application 2 – Summarization
21
System Paraphase type ROUGE-1
Manhattan (SSC = 2) Active → Passive 0.444
Fractional (N = 1.(3), SSC = 2) Active → Passive 0.443
Fractional (N = 1.(3), SSC = 2)
ADVmente
→ (de modo | maneira | jeito A) | (com N) 0.443
Fractional (N = 1.(3), idf, H1.3) Passive → Active 0.443
Fractional (N = 1.(3), SSC = 2) - 0.442
Fractional (N = 1.(3), idf, H1.3) - 0.442
Manhattan (SSC = 2) - 0.442
• First impression
– Minor improvements in performance of both the
conversational agent and of summarization task
• Next steps
– Analyze thoroughly:
• the results of paraphrasing
correct problems at the source (eSPERTo)
identify domain specific problems and solutions
• differences of performance with and without paraphrasing
identify best parameterization of resources to parapharase
– Adapt eSPERTo resources to each case scenario
Conclusions and Future Work
22
23
Thank you!
Acknowledgements
This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo
EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012.

More Related Content

What's hot

Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translationguest873a50
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aatAAT Taiwan
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech taggerFrancisco Zamora-Martinez
 
Map constraint for abstraction
Map constraint for abstractionMap constraint for abstraction
Map constraint for abstractionLawrie Hunter
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Guy De Pauw
 
speech recognition and removal of disfluencies
speech recognition and removal of disfluenciesspeech recognition and removal of disfluencies
speech recognition and removal of disfluenciesAnkit Sharma
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly
 

What's hot (19)

Nlp
NlpNlp
Nlp
 
Ceis 8
Ceis 8Ceis 8
Ceis 8
 
Lec 15,16,17 NLP.machine translation
Lec 15,16,17  NLP.machine translationLec 15,16,17  NLP.machine translation
Lec 15,16,17 NLP.machine translation
 
NLP and Deep Learning
NLP and Deep LearningNLP and Deep Learning
NLP and Deep Learning
 
(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat(Final) cidoc 2009 chinese lang translation of the aat
(Final) cidoc 2009 chinese lang translation of the aat
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Fafl notes [2010] (sjbit)
Fafl notes [2010] (sjbit)Fafl notes [2010] (sjbit)
Fafl notes [2010] (sjbit)
 
Adding morphological information to a connectionist Part-Of-Speech tagger
Adding morphological information  to a connectionist Part-Of-Speech taggerAdding morphological information  to a connectionist Part-Of-Speech tagger
Adding morphological information to a connectionist Part-Of-Speech tagger
 
Map constraint for abstraction
Map constraint for abstractionMap constraint for abstraction
Map constraint for abstraction
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
 
speech recognition and removal of disfluencies
speech recognition and removal of disfluenciesspeech recognition and removal of disfluencies
speech recognition and removal of disfluencies
 
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
Grammarly AI-NLP Club #6 - Sequence Tagging using Neural Networks - Artem Che...
 
NLP
NLPNLP
NLP
 
Nltk
NltkNltk
Nltk
 

Similar to eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization

Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishSteven Saffels
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageLushanthan Sivaneasharajah
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Declare Your Language: Syntax Definition
Declare Your Language: Syntax DefinitionDeclare Your Language: Syntax Definition
Declare Your Language: Syntax DefinitionEelco Visser
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 

Similar to eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization (20)

OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
Cross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold setsCross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold sets
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in PortugueseAutomatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
 
An exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP SpanishAn exploratory corpus study of the AP Spanish
An exploratory corpus study of the AP Spanish
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 
Declare Your Language: Syntax Definition
Declare Your Language: Syntax DefinitionDeclare Your Language: Syntax Definition
Declare Your Language: Syntax Definition
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
4.3.pdf
4.3.pdf4.3.pdf
4.3.pdf
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F)

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Nooj2017 cmota-etal
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 

Recently uploaded

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 

Recently uploaded (20)

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization

  • 1. technology from seed ESPERTO’S PARAPHRASTIC KNOWLEDGE APPLIED TO QUESTION-ANSWERING AND SUMMARIZATION Cristina Mota Luísa Coheur Ricardo Ribeiro Francisco Raposo Anabela Barreiro NOOJ International Conference- České Budějovice, June 10th 2016
  • 2. Overview 2 eSPERTo Paraphrasing Edgar, Virtual QA Agent (Fialho et al. 2013) SSNT Summarization (Ribeiro 2011)
  • 3. eSPERTo – System for Paraphrasing in Editing and Revision of Texts • Main objective – Design and development of a linguistically enhanced paraphrase generator • Semantico-syntactic and multiword units • Sensitive to context • Method – Hybrid system, combining statistics and linguistic knowledge to identify and generate new and more complex paraphrases – Exploitation of existing paraphrasing resources • Web platform – Interactive application to help Portuguese language learners in producing and revising their texts – Text-editing mechanisms which provide a variety of alternatives for each expression – Users can choose or suggest expressions that can be immediately applied to their text – Support to writing optimization, understandability and translatability Introduction to the eSPERTo Project 3
  • 5. eSPERTo Paraphrase Processing 5 noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt eSPERTo Web Interface User configuration eSPERTo Web Interface Result presentation teste.txt:0,17,O homem que é americano teste.txt:0,17,O homem de América teste.txt:0,17,O homem de nacionalidade americana teste.txt:0,17,O homem de naturalidade americana teste.txt:0,17,O homem de origem americana teste.txt:0,39,o trabalho foi apresentado por O homem americano teste.txt:18,10,efectuar apresentação teste.txt:18,10,fazer apresentação teste.txt:18,10,realizar apresentação
  • 6. eSPERTo Paraphrase Processing 6 https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl the man who is American the man from America the man with American nationality … The American man
  • 7. • Port4NooJ is the Portuguese module for NooJ (Silberztein 2005, 2016) • Derived from OpenLogos EN-PT bilingual resources (http://logos-os.dfki.de/) • Enhanced with new properties, including derivational and morpho-syntactic, semantic relations, paraphrastic knowledge eSPERTo Resources: Port4NooJ 2.0
  • 8. • Semantico-Syntactic Abstraction Language (SAL) properties • Multiword Units • Support Verb Constructions • Inflectional and Derivational Descriptions • Grammars – Morphological: to handle contractions – Syntactic: • identify and annotate dates and temporal expressions, • disambiguate words or sequences of words, i.e., to filter out lexical or syntactic annotations in the text • paraphrase several types of constructions • translate simple sentences eSPERTo Resources: Port4NooJ 2.0
  • 9. eSPERTo Paraphrases (subset) • Support verb constructions into single verbs – to make a decision = to decide – to give support to N(AN) = to support N(AN) – to get into contact with = to contact • Support verb constructions into their stylistic variants – to make an audit = to perform an audit – to make an impression = to create an impression • Adverbs (compounds into single adverbs) – in a constructive way = constructively • Agentive passives into actives (and vice-versa) – the young man is released by the police officer = the police officer releases the young man
  • 10. • Adjective constructions supported by different copulative verbs – estar perdido (to be lost) = andar perdido (walk around lost) • Constructions involving patronymic adjectives – (de origem portuguesa (of Portuguese origin/roots) = portugueses (Portuguese) = de Portugal (from Portugal) • Generic noun phrases – é um indivíduo estúpido (he is a fool) = é um estúpido (he is a fool) = é estúpido (he is a fool) • Cross-constructions – o idiota do rapaz (the idiot of the boy) = o rapaz é um idiota (the boy is an idiot) • Appropriate noun constructions – foi moderado nos seus comentários (he was moderated in his comments) = os seus comentários foram moderados (his comments were moderated) = foi moderado (he was moderated) eSPERTo Paraphrases (subset) 10
  • 12. Application 1 – Question-answering 12 • EDGAR has a knowledge base built on question/answer pairs • Explore eSPERTo paraphrases to enrich EDGAR knowledge base  provide all possible ways of rewriting the same question • EDGAR calculates the lexical distance between a user utterance and each question in the knowledge base The question with the shortest distance to the user utterance will trigger the answer • The paraphrase generator allows the same answer to semantically equivalent questions Scenario: EDGAR is a conversational agent that answers visitors questions in a museum (Fialho et al., 2013).
  • 13. Application 1 – Question-answering 13 Onde é que nasceste? Nasceste onde? Qual é que é o seu local de nascimento? O seu local de nascimento é qual? Qual é que é a tua nacionalidade? A tua nacionalidade é qual? És de onde? És daqui? És português? De onde é que és? És de Portugal? És de origem portuguesa? És de nacionalidade portuguesa? Nasci em Portugal, mas sou Inglês, …
  • 14. A1 – Question-answering Evaluation 14 • EDGAR’s KB had originally 848 sentences • eSPERTo matched 2028 times with sequences from these sentences, being 359 unique matches • To avoid looping during the expansion of the knowledge base, some paraphrases such as ingleses / que são ingleses (English / that are English) were discarded Recall Precision F-Measure Baseline 0.7972 0.7889 0.7930 Baseline+eSPERTo 0.8149 0.7763 0.7951
  • 15. • Qual é que é o seu nome [completo  ficar completo] ? – multiword: nome completo – disambiguation: completo, V should be eliminated, leaving just completo, A • [Como  Tomar comida] é que te chamas? – Priority dictionary: como, ADV; como, CONJ • [Vives  Fazer vida] onde? – Vsup should inflect as the original verb A1 – Question-answering Evaluation 15
  • 17. Application 2 – Summarization 17 • Explore eSPERTo paraphrases to identify redundant information  rewrite different phrases that are equivalent with the same paraphrase • Main challenge: identify the best candidate among the equivalent expressions to be used in rewriting the text • eSPERTo was used in the summarization pre-processing phase • Evaluation was done with TeMário, a corpus of 100 newspaper articles in Brazilian Portuguese (Pardo and Rino 2003)  generate different versions of TeMário by using different paraphrasing grammars Scenario: Summarization component (Ribeiro, 2011) of SSNT, a system for selective dissemination of multimedia content (Neto et al., 2003; Trancoso et al., 2003; Amaral et al., 2007)
  • 18. • Evaluation of three different groups of paraphrases: – (i) active/passive – (ii) constructions involving patronymic adjectives  is it better to use the shortest construction (o prefeito carioca) or the one with the equivalent toponym (o prefeito do Rio de Janeiro)? – (iii) simple adverb (rapidamente) / equivalent adjectival (de modo|forma/jeito rápid(a/o)) or nominal (com rapidez) construction Application 2 – Summarization 18
  • 19. Quais seriam as reações desejáveis no campo macroeconômico por parte das autoridades da Europa e do Japão . … Com frequência cada vez maior as ações e os bônus parecem mover-se juntos . Quando os bancos centrais derrubam os preços dos títulos as ações tendem a acompanhá-los mesmo quando o aperto do crédito foi desencadeado pela perspectiva de lucros e produção em alta . … E se vários países atuassem em conjunto Espanha Itália França e Reino Unido uma modesta apreciação do dólar melhoraria a competitividade européia de maneira muito oportuna . Application 2 – Summarization 19 a perspectiva desencadeou o aperto do crédito frequentemente europeias e nipónico oportunamente Example of paraphrasing with shortest constructions
  • 20. Application 2 – Summarization 20 ID Paraphrase Type Documents rewritten Sequences rewritten 1 ADVmente → (de modo | maneira | jeito A) | (com N) 80 215 2 (de modo | maneira | jeito A) → ADVmente 73 322 3 SAN 70 305 4 Passive → Active 7 7 5 Active → Passive 33 58 2,3, 4 Shortest 90 682
  • 21. Application 2 – Summarization 21 System Paraphase type ROUGE-1 Manhattan (SSC = 2) Active → Passive 0.444 Fractional (N = 1.(3), SSC = 2) Active → Passive 0.443 Fractional (N = 1.(3), SSC = 2) ADVmente → (de modo | maneira | jeito A) | (com N) 0.443 Fractional (N = 1.(3), idf, H1.3) Passive → Active 0.443 Fractional (N = 1.(3), SSC = 2) - 0.442 Fractional (N = 1.(3), idf, H1.3) - 0.442 Manhattan (SSC = 2) - 0.442
  • 22. • First impression – Minor improvements in performance of both the conversational agent and of summarization task • Next steps – Analyze thoroughly: • the results of paraphrasing correct problems at the source (eSPERTo) identify domain specific problems and solutions • differences of performance with and without paraphrasing identify best parameterization of resources to parapharase – Adapt eSPERTo resources to each case scenario Conclusions and Future Work 22
  • 23. 23 Thank you! Acknowledgements This research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012.