SlideShare a Scribd company logo
CROSS-LANGUAGE SEMANTIC
RELATIONS BETWEEN ENGLISH
AND PORTUGUESE
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Hugo Gonçalo Oliveira
hroliv@dei.uc.pt
OUTLINE
INTRODUCTION
MOST STUDIED SEMANTIC RELATIONS
& AUTOMATIC ACQUISITION OF MULTILINGUAL LEXICO-SEMANTIC RELATIONS
LINGUISTIC-BASED METHOD TO GENERATE SEMANTIC RELATIONS
LINGUISTIC RESOURCES EMPLOYED
ESTABLISHMENT OF MORPHO-SEMANTICO-SYNTACTIC RELATIONS IN THE
DICTIONARY AND CREATION OF GRAMMARS TO READ DICTIONARY INFORMATION AND
GENERATE NEW SEMANTIC PAIRS
CROSS-LANGUAGE RELATIONS RE-USING GRAMMARS
PRELIMINARY RESULTS
CONCLUSIONS AND FUTURE RESEARCH
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
SEMANTIC RELATIONS
 Synonymy - different lexical items have the same meaning
e.g. car synonym-of automobile
 Homonymy - lexical items have the same orthographic form but
different meanings
e.g. bank, financial institution vs. slope
 Hyponymy - a lexical item is a subclass or a specific kind of
another
e.g. dog hyponym-of mammal
 Meronymy - a lexical item is a part, piece or member of another
e.g. wheel part-of car.
 General and domain-oriented: process-of, result-of, etc.
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
MULTILINGUAL LEXICAL-SEMANTIC
KNOWLEDGE DATABASES CREATED FROM
THE WEB
 UWN/MENTA (de Melo & Weikum 2009, 2010): multilingual knowledge
base with WordNet meaning connections for 1,500,000 words in over 200
languages (http://www.mpi-inf.mpg.de/yago-naga/uwn/query.html)
 PanDictionary (Mausam et al 2010): sense-distinguished, massively
multilingual dictionary with translations in more than 1000 languages,
created after exploiting free dictionaries, and inferring additional translations
with a certain probability
 WikiNet (Nastase et al 2010): multilingual network of concepts obtained
after exploiting Wikipedia, with lexicalizations, in several languages, as well
as connections representing a variety of relations.
 Bergsma & Van Durme 2011: learning word-to-word translations after
exploiting the visual similarity of labeled images on the Web
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
LINGUISTIC RESOURCES
 Eng4NooJ and Port4NooJ – linguistic knowledgeEng4NooJ and Port4NooJ – linguistic knowledge
• OpenLogos dictionary (http://logos-os.dfki.de/)
converted into NooJ format, and enhanced with new properties,
including derivational and morpho-syntactic and semantic relations
• Morphological systems
• Contextual rules and grammars
• Domain specific dictionaries
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
LINGUISTIC-BASED METHOD TO
GENERATE (CROSS-LANGUAGE) SEMANTIC
RELATIONS
Mapping morpho-syntactic and semantically related words
CONCEPTUAL SEMANTIC RELATIONS
Semantico-syntactic Abstract Language (SAL)
 hierarchical taxonomic scheme
 over 1,000 elements or words (expandable)
 organized in Supersets, Sets, and Subsets
 distributed over all parts-of-speech
Noun Supersets
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
 Concrete (CO)
 Mass (MA)
 Animate (AN)
 Place (PL)
 Information, (IN)
 Abstract (AB)
 Process (intransitive) (PI)
 Process (transitive) (PT)
 Measure (ME)
 Time (TI)
 Aspective (AS)
 Unknown (UN)
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Abstract
cancer
circumstance
coma
loneliness
poverty
status
fatherhood
inequality
category
class
kind
make
nature
rank
type
reserve
resource
well-spring
justice
analogy
idea
truth
Non –verbal abstracts
[about persons, things]
ABnonvb 40
general
concepts
ABgen 42
ABnonvb 40 ABprop 609 ABstate 736 ABclass 731 ABorig 723
undifferentiated
non-verbals abs
ABverb 41
verbal abstracts
[about agents, processes]
ABpur 748 ABmeth 733 ABqual 655 ABtime 732 ABcause 602
ABverb 41 ABnegc 764 ABcont 765 ABstrvb 749
[of v’g; to v]
task
problem
objective
function
sources/origins
states/conditions
/relationships
classifications
barroque
color
design
feature
form
likeableness
profile
trait
shape
properties/qual
ities/nature
acting technique
computing task
reading readiness
navigation skill
[for/of v’g]
technique
means
mode
pattern
[of/in/for v’g]
efficiency
alacrity
suitableness
ease
instance
end
holiday
parade
birthday
[to v ; of/for v’g]
basis motivation
sense readiness
talent ability
cost
[of v’g]
danger
threat
risk
catastrophe
disaster
accident
blizzard
navigation error
noise abatement
heat absorption
purpose
method/
procedure
quality of
action/agent
time
event
cause/potential/
disposition
undifferentiated
verbal abstracts
strong verbals
process/result
contrary
event
negative
cause
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
DICTIONARY WITH IMPLICIT
SEMANTIC RELATIONS
impressionar,V+FLX=FALAR+SAL=PVPCpleasetype+EN=impress+VSUP=fazer+VSUP=causar
+DRV=NDRV01:CANÇÃO
adaptar,V+FLX=FALAR+Aux=1+SAL=INOP57+Subset=132+EN=adapt+VSUP=fazer
+DRV=NDRV00:CANÇÃO
azedar,V+FLX=LIMPAR+Aux=1+SAL=OBJTRundif98+Subset=740+EN=sour+VSUP=ficar
+DRV=ADRV00:ALTO
aesthetic,AFLX=NATURAL+SAL=AVstate+PT=estético+DRV=AVDRV03
skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Eng4NooJ Port4NooJ
NDRV04 = <B>ion/Npred
e.g. accelerate > acceleration
NDRV02 = <B>nça/N+Npred
e.g. mudar > mudança
ADRV02 = <B>icable/ADJ
e.g. apply > applicable
ADRV02 = <B2>o/A+Apred
e.g. azedar > azedo
AVDRV01 = <E>ly/ADV
e.g. frequent > frequently
AVDRV00 = <B>zmente/ADV
e.g. veloz > velozmente
AVDRV04 = <B>tically/ADV
e.g. realism > realistically
AVDRV05 = <A> <B>amente/ADV
e.g. rápido > rapidamente
RULES TO TRANSFORM MORPHO-SYNTACTIC AND SEMANTICALLY
RELATED WORDS OF DIFFERENT PoS
TRANSFORMATIONAL RULES
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
ACTION-OF AND RESULT-OF
SEMANTIC RELATIONS
ACTION-OF abolishment IS ACTION OF abolish
abolição É AÇÃO DE abolir
abuse IS ACTION OF abuse
abuso É AÇÃO DE abusar
happening IS ACTION OF happen
acontecimento É AÇÃO DE acontecer
agreement IS ACTION OF agree
acordo É AÇÃO DE acordar
RESULT-OF
lit IS RESULT OF light
aceso É RESULTADO DE acender
stuffed IS RESULT OF stuff
embalsamado É RESULTADO DE embalsamar
rotten IS RESULT OF rotten
podre É RESULTADO DE apodrecer
interdicted IS RESULT OF interdict
interditado É RESULTADO DE interditar
Grammar to recognize
adverbial compounds
and transform them
into equivalent single
adverbs
LOCAL GRAMMARS
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Grammar to generate cross-
language relations between
Portuguese support verb
constructions and equivalent
English single verbs
Mechanism created to read the linguistic information in the dictionaries and
generate the semantic relations
SPIDER (SYSTEM OF PARAPHRASING
IN DOCUMENT EDITING AND
REVISION)Suggestions for general language
linguistic phenomena
Compound adverbs
> single adverbs
Support verb constructions
> single verbs
Relatives
> participial adjectives
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Barreiro, 2011
SEMANTIC EQUIVALENCE IN SPIDER
Synonyms in context (ex: phrasal verbs equivalent expressions)↔
to clear up (weather) = (weather) to become better/brighter
Support verb constructions single verbs stylistic variants↔ ↔
to make a decision = to decide; to make an audit = to perform an audit
Aspectual constructions single verbs↔
to launch an attack = to attack
Adverbials (compounds single adverbs)↔
in a constructive way = constructively; on purpose > purposely = deliberately
Relatives participial adjectives↔
the president that was elected = the president elect
Relatives possessives↔
the role that Europe plays/has = the role of Europe
RelativesRelatives ↔ compound nouns (and vice-versa)compound nouns (and vice-versa)
a container for the milk = a milk container; a bottle made of plastic = a plastic bottlea container for the milk = a milk container; a bottle made of plastic = a plastic bottle
Agentive passives ↔ actives
the man was released by the police officer = the police officer released the man
CICLing 2011 February 20-26, 2011
Anabela Barreiro Tokyo, Japan
$EN
CROSS-LANGUAGE RELATIONS RE-
USING GRAMMARS
a fazer um estágio para dar aulas de / tutor Religião
a fazer um estágio para dar aulas de / lecture Religião
a fazer um estágio para dar aulas de / teach Religião
começa a dar exemplos / exemplify :
sentia-se capaz de dar um murro em / punch quem quisesse detê-lo
gostávamos de lhe dar uma palavrinha / speak .
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
PRELIMINARY RESULTS
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Relation Quantity
Hyponymy 14963
Synonymy (between verbs, nouns, adj and adv) 10395 (5367, 20, 34, 5014)
Action of 3773
Result of 283
Port4NooJ (publicly available at http://www.linguateca.pt/Repositorio/Port4Nooj/):
600 derivational rules
most transforming verbs into predicate nouns (587)
119 productive (nominalizations)
486 verb relations between verbs and autonomous predicate nouns
CONCLUSIONS AND FUTURE
RESEARCH
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
 The linguistic-based method is systematic, expandable (unlimited
possibility to grow and improve)
 Dictionaries and grammars to generate monolingual semantic relations
are easily adaptable and can be reused to generate cross-language
relations, rules can be standardized and often re-used across close
languages.
 Even though the methodology adopted was applied to the OpenLogos
resources, it is compliant with the exploitation of other lexical
resources with semantic relations, for any language besides English and
Portuguese
CONCLUSIONS AND FUTURE
RESEARCH
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
 Semantic relations - highly suitable for paraphrasing and cross-
language tasks, including machine translation
 Need for robust semantic relations quality resources and semantic-
driven NLP applications
 Future work - gather and combine open source available semantic
resources, enhance properties on the existing resources, and enlarge the
linguistic phenomena coverage.
CROSS-LANGUAGE SEMANTIC
RELATIONS BETWEEN ENGLISH
AND PORTUGUESE
ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011
Anabela Barreiro Huelva, Spain
Hugo Gonçalo Oliveira
hroliv@dei.uc.pt
Acknowledgements
Anabela Barreiro was partially supported by the UPV, award 1931, under the program Research Visits for
Renowned Scientists (PAID-02-11).
Hugo Gonçalo Oliveira is supported by the FCT scholarship grant SFRH/BD/44955/2008, co-funded by FSE.

More Related Content

Similar to Cross language semantic relations between English and Portuguese

Typical Development in Bilingual Students: Cross-linguistic inflence
Typical Development in Bilingual Students: Cross-linguistic inflenceTypical Development in Bilingual Students: Cross-linguistic inflence
Typical Development in Bilingual Students: Cross-linguistic inflence
Bilinguistics
 
An Overview on Portuguese Nominalizations
An Overview on Portuguese NominalizationsAn Overview on Portuguese Nominalizations
An Overview on Portuguese Nominalizations
Livy Real
 
Present Perfect for Brazilian Students: a cognitive approach
Present Perfect for Brazilian Students: a cognitive approachPresent Perfect for Brazilian Students: a cognitive approach
Present Perfect for Brazilian Students: a cognitive approach
gislenefernandes
 
Spanish verbs & conjugation
Spanish verbs & conjugationSpanish verbs & conjugation
Spanish verbs & conjugationRebecca Harper
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1Hala Fawzi
 
An Overview on Portuguese Nominalizations
An Overview on Portuguese NominalizationsAn Overview on Portuguese Nominalizations
An Overview on Portuguese Nominalizations
Livy Real
 
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Bilinguistics
 
Lec22
Lec22Lec22
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hourRadio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
Alannah Fitzgerald
 
Applications of CL to FLT
Applications of CL to FLTApplications of CL to FLT
Applications of CL to FLT
Pascual Pérez-Paredes
 
LREC 2010 presentation
LREC 2010 presentationLREC 2010 presentation
LREC 2010 presentation
Stefania Spina
 
Syllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisadoSyllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisado
liznavarro888
 
Syllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisadoSyllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisado
Mery Rivadeneira
 
Project linguistics - Phonetic Component
Project linguistics - Phonetic ComponentProject linguistics - Phonetic Component
Project linguistics - Phonetic ComponentDiana Orjuela Cujabán
 
Formal approaches (3)
Formal approaches (3)Formal approaches (3)
Formal approaches (3)
Elif Güllübudak
 

Similar to Cross language semantic relations between English and Portuguese (17)

Typical Development in Bilingual Students: Cross-linguistic inflence
Typical Development in Bilingual Students: Cross-linguistic inflenceTypical Development in Bilingual Students: Cross-linguistic inflence
Typical Development in Bilingual Students: Cross-linguistic inflence
 
An Overview on Portuguese Nominalizations
An Overview on Portuguese NominalizationsAn Overview on Portuguese Nominalizations
An Overview on Portuguese Nominalizations
 
Present Perfect for Brazilian Students: a cognitive approach
Present Perfect for Brazilian Students: a cognitive approachPresent Perfect for Brazilian Students: a cognitive approach
Present Perfect for Brazilian Students: a cognitive approach
 
Spanish verbs & conjugation
Spanish verbs & conjugationSpanish verbs & conjugation
Spanish verbs & conjugation
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1
 
An Overview on Portuguese Nominalizations
An Overview on Portuguese NominalizationsAn Overview on Portuguese Nominalizations
An Overview on Portuguese Nominalizations
 
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...Ethical Considerations for Culturally and Linguistically Diverse Populations ...
Ethical Considerations for Culturally and Linguistically Diverse Populations ...
 
Lec22
Lec22Lec22
Lec22
 
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hourRadio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour
 
Applications of CL to FLT
Applications of CL to FLTApplications of CL to FLT
Applications of CL to FLT
 
LREC 2010 presentation
LREC 2010 presentationLREC 2010 presentation
LREC 2010 presentation
 
Syllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisadoSyllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisado
 
Syllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisadoSyllabus law i 2015 2016 revisado
Syllabus law i 2015 2016 revisado
 
Parameter setting
Parameter settingParameter setting
Parameter setting
 
Project linguistics - Phonetic Component
Project linguistics - Phonetic ComponentProject linguistics - Phonetic Component
Project linguistics - Phonetic Component
 
Formal approaches (3)
Formal approaches (3)Formal approaches (3)
Formal approaches (3)
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
INESC-ID (Spoken Language Systems Laboratory - L2F)
 

More from INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Nooj2017 cmota-etal
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Cross language semantic relations between English and Portuguese

  • 1. CROSS-LANGUAGE SEMANTIC RELATIONS BETWEEN ENGLISH AND PORTUGUESE ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Hugo Gonçalo Oliveira hroliv@dei.uc.pt
  • 2. OUTLINE INTRODUCTION MOST STUDIED SEMANTIC RELATIONS & AUTOMATIC ACQUISITION OF MULTILINGUAL LEXICO-SEMANTIC RELATIONS LINGUISTIC-BASED METHOD TO GENERATE SEMANTIC RELATIONS LINGUISTIC RESOURCES EMPLOYED ESTABLISHMENT OF MORPHO-SEMANTICO-SYNTACTIC RELATIONS IN THE DICTIONARY AND CREATION OF GRAMMARS TO READ DICTIONARY INFORMATION AND GENERATE NEW SEMANTIC PAIRS CROSS-LANGUAGE RELATIONS RE-USING GRAMMARS PRELIMINARY RESULTS CONCLUSIONS AND FUTURE RESEARCH ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain
  • 3. SEMANTIC RELATIONS  Synonymy - different lexical items have the same meaning e.g. car synonym-of automobile  Homonymy - lexical items have the same orthographic form but different meanings e.g. bank, financial institution vs. slope  Hyponymy - a lexical item is a subclass or a specific kind of another e.g. dog hyponym-of mammal  Meronymy - a lexical item is a part, piece or member of another e.g. wheel part-of car.  General and domain-oriented: process-of, result-of, etc. ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain
  • 4. MULTILINGUAL LEXICAL-SEMANTIC KNOWLEDGE DATABASES CREATED FROM THE WEB  UWN/MENTA (de Melo & Weikum 2009, 2010): multilingual knowledge base with WordNet meaning connections for 1,500,000 words in over 200 languages (http://www.mpi-inf.mpg.de/yago-naga/uwn/query.html)  PanDictionary (Mausam et al 2010): sense-distinguished, massively multilingual dictionary with translations in more than 1000 languages, created after exploiting free dictionaries, and inferring additional translations with a certain probability  WikiNet (Nastase et al 2010): multilingual network of concepts obtained after exploiting Wikipedia, with lexicalizations, in several languages, as well as connections representing a variety of relations.  Bergsma & Van Durme 2011: learning word-to-word translations after exploiting the visual similarity of labeled images on the Web ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain
  • 5. LINGUISTIC RESOURCES  Eng4NooJ and Port4NooJ – linguistic knowledgeEng4NooJ and Port4NooJ – linguistic knowledge • OpenLogos dictionary (http://logos-os.dfki.de/) converted into NooJ format, and enhanced with new properties, including derivational and morpho-syntactic and semantic relations • Morphological systems • Contextual rules and grammars • Domain specific dictionaries ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain LINGUISTIC-BASED METHOD TO GENERATE (CROSS-LANGUAGE) SEMANTIC RELATIONS Mapping morpho-syntactic and semantically related words
  • 6. CONCEPTUAL SEMANTIC RELATIONS Semantico-syntactic Abstract Language (SAL)  hierarchical taxonomic scheme  over 1,000 elements or words (expandable)  organized in Supersets, Sets, and Subsets  distributed over all parts-of-speech Noun Supersets ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain  Concrete (CO)  Mass (MA)  Animate (AN)  Place (PL)  Information, (IN)  Abstract (AB)  Process (intransitive) (PI)  Process (transitive) (PT)  Measure (ME)  Time (TI)  Aspective (AS)  Unknown (UN)
  • 7. ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Abstract cancer circumstance coma loneliness poverty status fatherhood inequality category class kind make nature rank type reserve resource well-spring justice analogy idea truth Non –verbal abstracts [about persons, things] ABnonvb 40 general concepts ABgen 42 ABnonvb 40 ABprop 609 ABstate 736 ABclass 731 ABorig 723 undifferentiated non-verbals abs ABverb 41 verbal abstracts [about agents, processes] ABpur 748 ABmeth 733 ABqual 655 ABtime 732 ABcause 602 ABverb 41 ABnegc 764 ABcont 765 ABstrvb 749 [of v’g; to v] task problem objective function sources/origins states/conditions /relationships classifications barroque color design feature form likeableness profile trait shape properties/qual ities/nature acting technique computing task reading readiness navigation skill [for/of v’g] technique means mode pattern [of/in/for v’g] efficiency alacrity suitableness ease instance end holiday parade birthday [to v ; of/for v’g] basis motivation sense readiness talent ability cost [of v’g] danger threat risk catastrophe disaster accident blizzard navigation error noise abatement heat absorption purpose method/ procedure quality of action/agent time event cause/potential/ disposition undifferentiated verbal abstracts strong verbals process/result contrary event negative cause
  • 8. ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain DICTIONARY WITH IMPLICIT SEMANTIC RELATIONS impressionar,V+FLX=FALAR+SAL=PVPCpleasetype+EN=impress+VSUP=fazer+VSUP=causar +DRV=NDRV01:CANÇÃO adaptar,V+FLX=FALAR+Aux=1+SAL=INOP57+Subset=132+EN=adapt+VSUP=fazer +DRV=NDRV00:CANÇÃO azedar,V+FLX=LIMPAR+Aux=1+SAL=OBJTRundif98+Subset=740+EN=sour+VSUP=ficar +DRV=ADRV00:ALTO aesthetic,AFLX=NATURAL+SAL=AVstate+PT=estético+DRV=AVDRV03 skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02
  • 9. ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Eng4NooJ Port4NooJ NDRV04 = <B>ion/Npred e.g. accelerate > acceleration NDRV02 = <B>nça/N+Npred e.g. mudar > mudança ADRV02 = <B>icable/ADJ e.g. apply > applicable ADRV02 = <B2>o/A+Apred e.g. azedar > azedo AVDRV01 = <E>ly/ADV e.g. frequent > frequently AVDRV00 = <B>zmente/ADV e.g. veloz > velozmente AVDRV04 = <B>tically/ADV e.g. realism > realistically AVDRV05 = <A> <B>amente/ADV e.g. rápido > rapidamente RULES TO TRANSFORM MORPHO-SYNTACTIC AND SEMANTICALLY RELATED WORDS OF DIFFERENT PoS TRANSFORMATIONAL RULES
  • 10. ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain ACTION-OF AND RESULT-OF SEMANTIC RELATIONS ACTION-OF abolishment IS ACTION OF abolish abolição É AÇÃO DE abolir abuse IS ACTION OF abuse abuso É AÇÃO DE abusar happening IS ACTION OF happen acontecimento É AÇÃO DE acontecer agreement IS ACTION OF agree acordo É AÇÃO DE acordar RESULT-OF lit IS RESULT OF light aceso É RESULTADO DE acender stuffed IS RESULT OF stuff embalsamado É RESULTADO DE embalsamar rotten IS RESULT OF rotten podre É RESULTADO DE apodrecer interdicted IS RESULT OF interdict interditado É RESULTADO DE interditar
  • 11. Grammar to recognize adverbial compounds and transform them into equivalent single adverbs LOCAL GRAMMARS ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Grammar to generate cross- language relations between Portuguese support verb constructions and equivalent English single verbs Mechanism created to read the linguistic information in the dictionaries and generate the semantic relations
  • 12. SPIDER (SYSTEM OF PARAPHRASING IN DOCUMENT EDITING AND REVISION)Suggestions for general language linguistic phenomena Compound adverbs > single adverbs Support verb constructions > single verbs Relatives > participial adjectives ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Barreiro, 2011
  • 13. SEMANTIC EQUIVALENCE IN SPIDER Synonyms in context (ex: phrasal verbs equivalent expressions)↔ to clear up (weather) = (weather) to become better/brighter Support verb constructions single verbs stylistic variants↔ ↔ to make a decision = to decide; to make an audit = to perform an audit Aspectual constructions single verbs↔ to launch an attack = to attack Adverbials (compounds single adverbs)↔ in a constructive way = constructively; on purpose > purposely = deliberately Relatives participial adjectives↔ the president that was elected = the president elect Relatives possessives↔ the role that Europe plays/has = the role of Europe RelativesRelatives ↔ compound nouns (and vice-versa)compound nouns (and vice-versa) a container for the milk = a milk container; a bottle made of plastic = a plastic bottlea container for the milk = a milk container; a bottle made of plastic = a plastic bottle Agentive passives ↔ actives the man was released by the police officer = the police officer released the man CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
  • 14. $EN CROSS-LANGUAGE RELATIONS RE- USING GRAMMARS a fazer um estágio para dar aulas de / tutor Religião a fazer um estágio para dar aulas de / lecture Religião a fazer um estágio para dar aulas de / teach Religião começa a dar exemplos / exemplify : sentia-se capaz de dar um murro em / punch quem quisesse detê-lo gostávamos de lhe dar uma palavrinha / speak . ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain
  • 15. PRELIMINARY RESULTS ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Relation Quantity Hyponymy 14963 Synonymy (between verbs, nouns, adj and adv) 10395 (5367, 20, 34, 5014) Action of 3773 Result of 283 Port4NooJ (publicly available at http://www.linguateca.pt/Repositorio/Port4Nooj/): 600 derivational rules most transforming verbs into predicate nouns (587) 119 productive (nominalizations) 486 verb relations between verbs and autonomous predicate nouns
  • 16. CONCLUSIONS AND FUTURE RESEARCH ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain  The linguistic-based method is systematic, expandable (unlimited possibility to grow and improve)  Dictionaries and grammars to generate monolingual semantic relations are easily adaptable and can be reused to generate cross-language relations, rules can be standardized and often re-used across close languages.  Even though the methodology adopted was applied to the OpenLogos resources, it is compliant with the exploitation of other lexical resources with semantic relations, for any language besides English and Portuguese
  • 17. CONCLUSIONS AND FUTURE RESEARCH ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain  Semantic relations - highly suitable for paraphrasing and cross- language tasks, including machine translation  Need for robust semantic relations quality resources and semantic- driven NLP applications  Future work - gather and combine open source available semantic resources, enhance properties on the existing resources, and enlarge the linguistic phenomena coverage.
  • 18. CROSS-LANGUAGE SEMANTIC RELATIONS BETWEEN ENGLISH AND PORTUGUESE ICL: Workshop on Iberian Cross-Language NLP Tasks 07 September, 2011 Anabela Barreiro Huelva, Spain Hugo Gonçalo Oliveira hroliv@dei.uc.pt Acknowledgements Anabela Barreiro was partially supported by the UPV, award 1931, under the program Research Visits for Renowned Scientists (PAID-02-11). Hugo Gonçalo Oliveira is supported by the FCT scholarship grant SFRH/BD/44955/2008, co-funded by FSE.

Editor's Notes

  1. Good afternoon! My name is Anabela Barreiro from L2F INESC-Lisbon. Today, I will present some work in collaboration with my colleague Hugo Gonçalo Oliveira from the Centre for Informatics and Systems of the University of Coimbra. The presentation describes a first attempt to extract semantic relations (SR), namely cross-language SR between English and Portuguese.
  2. This short presentation is divided in 3 main parts: a short introduction, a content part and the conclusions. describes the state of the art in automatic acquisition of distinct types of lexico-semantic relations. presents the base linguistic resources used to attain semantic relations. describes the relations of synonymy, hyponymy, action-of, and result-of. presents the method for the extraction of the semantic relations. It describes, in particular, the morpho-syntactic and semantic relations established in the dictionary, how the grammars read this linguistic information, and how they use it to generate semantic pairs. This latter section also shows how to expand from monolingual to cross-language relations with minimal change in the local grammars. Section 6 presents some preliminary results. And finally, section 7 presents the conclusions and guidelines for future research work. ------------------------------------------------------------------------- In the introduction, I will describe paraphrases in general and emphasize their importance in language and communication. In the second part of this presentation, I will discuss paraphrasing functions, make a few comments on the importance of paraphrases in NLP tasks and the practical need for paraphrases in the industrial context, for professional purposes, including those of pedagogical nature. I will continue with some considerations on the importance of paraphrases for translation, and tools that use paraphrases. The core of the presentation is a new tool, called SPIDER, which is a system of paraphrasing in document editing and revision. Finally, I will present the conclusions. And I will finalize my presentation with a few notes of future applications and research.
  3. Some of the most studied lexico-semantic relations are: But in certain domains, the semantic relations: process-of, result-of, among others are very common.
  4. Interesting work is being developed on cross-language/multilingual lexical-semantic knowledge databases created from the web.
  5. Some of these SR were extracted from the lexical resources of the OpenLogos machine translation system. In combination with these resources, new resources were created, namely derivational rules and grammars to recognize and generate morpho-syntactic and semantically related words and multiword units. Semantic relations, obtained by means of local grammars developed within NooJ linguistic environment, cover larger number of items and can be extracted in a simple and easy way. This paper aims at showing how these resources combined can be used in cross-language tasks. This linguistic-based method of automatic generation of (cross-language) semantic relations consists of mapping morpho-syntactic and semantically related words The semantic relations were generated from 2 new linguistic knowledge resources: Eng4NooJ and Port4NooJ. These resources were built with the NooJ linguistic environment. Part of the linguistic knowledge on these resources came from the OpenLogos machine translation system. The adapted and enhanced linguistic knowledge consists of dictionary and local grammars, which include new properties and transformations allowing for the establishment of morpho-syntactic and semantic relations. Local grammars apply the new dictionary properties, permitting the transformations required in paraphrasing and MT.
  6. The conceptual semantic relations were symbolically represented in the OpenLogos lexicon as a hierarchical taxonomic scheme called semantico-syntactic abstraction language (SAL), with over 1,000 elements or words (expandable), organized in Supersets, Sets, and Subsets, distributed over all parts-of-speech. For example, SAL has 12 supersets for nouns: Semantic relations - generated automatically, based on the linguistic information associated with each lexical entry. This info was used to generate hierarchical hyponymy and hypernymy relations, among others.
  7. This slide shows the Abstract noun Superset. In the Abstract noun Superset, there are two principal Sets: the non-verbal Abstract nouns, and the verbal Abstract nouns, both with their own Subsets. The Subset Classifications is a member of the non-verbal Abstract noun Set. It includes nouns such as category, class, kind, make, nature, rank, type, among others. The Subset Methods/ Procedures is a member of the verbal Abstract noun Set. It includes nouns such as technique, means, mode, pattern, among others. The complete taxonomy can be viewed at the Logos Archives website http://logossystemarchives.homestead.com/.
  8. EXTRACTION OF CONCEPTUAL SEMANTIC RELATIONS FROM THE RESOURCES - Action-of, result-of, and synonymy relations between multiword units and single words, where there is a morpho-syntactic and semantic relation between words of distinct PoS.
  9. Local grammars - mechanism created to read the linguistic information in the dictionaries and generate the semantic relations. These SR have been used in paraphrasing and machine translation.
  10. SPIDER uses linguistically based automated paraphrasing and text-editing mechanisms to help users with their writing needs by providing suggestions for customized text authoring, so it can be used word processing applications. It also generates word and phrasal usage data to help guide decision-making. This slide illustrates SPIDER’s suggestions for several general language linguistic phenomena: Compound adverbs &amp;gt; single adverbs Relatives &amp;gt; participial adjectives SVCs &amp;gt; single verbs
  11. SPIDER’s linguistic knowledge allows extensive transformation and re-writing, including recognition and rewriting of: words or multi-word units into their synonyms or paraphrases, in the appropriate contexts (to clear up (weather) &amp;gt; to become better/brighter); support verb constructions into single verbs (to make a decision &amp;gt; to decide; to give support to N(AN) &amp;gt; to support N(AN); to go V-ing &amp;gt; to continue V-ing; to get into contact with &amp;gt; to contact; to turn on N(light) &amp;gt; to extinguish N; to become acid &amp;gt; to acidify); support verb constructions into their stylistic variants (to make an audit &amp;gt; to perform an audit; to make an impression &amp;gt; to cause an impression); aspectual constructions into verbs, (to launch an attack &amp;gt; to attack); Multiword adverbs and adverbial phrases into single adverbs (in a constructive way &amp;gt; constructively; on purpose &amp;gt; purposely / deliberately); Relatives into possessives (the position that the Church defends &amp;gt; the position of the Church; the role that the politicians play &amp;gt; the role of the politicians); Relatives into participial adjectives (the president that was elected &amp;gt; the president elected); Relatives into compound nouns (a container for the milk &amp;gt; a milk container); phrases with “made of” (a bottle made of plastic &amp;gt; a plastic bottle); Agentive passives into actives (the young man is released by the police officer &amp;gt; the police officer releases the young man).
  12. ReWriter was inspired in ParaMT, a prototype of a multilingual paraphraser (or translation system). ParaMT uses a similar methodology, except for that it provides an equivalent in a language different from the one of the source text (paraphrases across language). ParaMT can be used directly in machine translation. At the current stage of development, ParaMT aims at translating multi-word units efficiently, handling considerably well the translation of Portuguese support verb constructions into English verbs, as illustrated in the table on this slide. Because Eng4NooJ resources contain Portuguese transfers for each entry, any grammar used to obtain monolingual transformations can be reused to obtain bilingual (or multilingual) transformations. The recycling of grammars is minimal, since the only parameter that needs to be added is the specification of the output language, as $EN for English or $PT for Portuguese (meaning, “retrieve the output in English”, etc.). For monolingual transformations, no output language is specified.
  13. In theory, the exploitation of the lexicon in combination with SAL allows the establishment of numerous relations between words and expressions. For the current paper, we focused only on a few of those relations which cover a larger number of items and could be extracted in a simple and easy way. The result of extraction for Portuguese (not yet reviewed) is publicly available. Currently, Port4NooJ contains more than 30,000 morpho-syntactic relations between semantically related elements. Table 6 presents some preliminary results, which do not refer to paraphrasing capabilities, but simply to relations between lexical items. The total results for paraphrasing are significantly higher. Local grammars, applied to information (properties) described in the dictionary, enable the recognition and analysis of expressions such as de (um) modo rápido, de (uma) forma/maneira rápida (in a fast/quick way) (which could be considered as relations between an adjective and an adverb, but which were not considered), and also inflected forms such as dar uns passeios (go for some walks), etc. http://www.linguateca.pt/Repositorio/ Port4NooJ/relacoes_semanticas_explicitas/. Port4NooJ contains approximately 600 derivational rules, most of them transforming verbs into predicate nouns (587). 119 of these rules are productive, covering nominalizations. 486 rules correspond to verb relations between verbs and autonomous predicate nouns. Rules were only superficially evaluated.
  14. This paper presented semantic relations, namely domain-independent semantico-syntactic and ontological relations, suitable for paraphrasing and cross-language tasks, including machine translation. We have demonstrated that given the appropriate linguistic resources, the generation of semantic relations can become very systematic. Any grammar to generate monolingual semantic relations can be reused to generate cross-language relations, rules can be standardizes and often re-used across close languages, etc. Even though the methodology adopted was applied to the OpenLogos resources, it is compliant with the exploitation of other lexical resources with semantic relations, for any language besides English and Portuguese, studied in this research. Future work would gather and combine open source available semantic resources, enhance properties on the existing resources, and enlarge the linguistic phenomena coverage.
  15. Semantic relations, namely domain-independent semantico-syntactic and ontological relations, are highly suitable for paraphrasing and cross-language tasks, including machine translation. We have demonstrated that given the appropriate linguistic resources, the generation of semantic relations can become very systematic. Any grammar to generate monolingual semantic relations can be reused to generate cross-language relations, rules can be standardizes and often re-used across close languages, etc. Even though the methodology adopted was applied to the OpenLogos resources, it is compliant with the exploitation of other lexical resources with semantic relations, for any language besides English and Portuguese, studied in this research. Future work would gather and combine open source available semantic resources, enhance properties on the existing resources, and enlarge the linguistic phenomena coverage.