SlideShare a Scribd company logo
1 of 24
Download to read offline
PROPOR2014 - Intl. Conference on Computational Processing of Portuguese 
October 6-8, 2014, ICMC, São Carlos, SP, Brazil 
Body part nouns and Whole-Part Relations 
in Portuguese 
Ilia Markov123, Nuno Mamede23, Jorge Baptista123 
1 U. Algarve/CECL 2 U. Lisboa/IST 3 INESC-ID Lisboa/L2F 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 1
Objectives 
• Improve the automatic extraction of semantic relations 
between textual elements in a existing NLP system, 
STRING 
! 
• Part-whole relations (meronymy) 
! 
• Human body-part nouns (Nbp) 
! 
O Pedro partiu o braço 
‘Pedro broke the arm’ 
WHOLE-PART(Pedro,braço) 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 2
Objectives (cont.) 
! 
• Development of a rule-base meronymy detection 
module for Human-Nbp relations 
• Implementation in STRING (Mamede et al., 2012) 
! 
! 
STRING: a hybrid, statistical and rule-based, Natural 
Language Processing (NLP) system for Portuguese 
string.l2f.inesc-id.pt 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 3
Motivation 
Semantic relations are a device for structuring texts: 
contribute to cohesion and coherence of a text. 
Automatic extraction of semantic relations is useful for 
some NLP tasks: 
• Anaphora Resolution 
O Pedro lavou a cara 
‘Pedro washed the face’ 
WHOLE-PART(Pedro,cara) 
O Pedro lavou a sua cara 
‘Pedro washed his face’ 
WHOLE-PART(sua,cara) & ANTECEDENT(?,sua) 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 4
Motivation (cont.) 
• Semantic Role Labeling 
O Pedro partiu um braço 
‘Pedro broke an arm’ 
WHOLE-PART(Pedro,braço) 
➢ Pedro is an experiencer. 
O Pedro partiu o braço do João 
‘Pedro broke João’s arm’ 
WHOLE-PART(João,braço) 
➢ Pedro is an agent. 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 5
Motivation (cont.) 
• Opinion mining 
! 
É um bom hotel: o quarto era limpo, as camas eram feitas 
de lavado todos os dias, e os pequenos-almoços eram 
opíparos 
‘It is a nice hotel: the room was clean, the beds (bed 
sheets) were changed everyday, and the breakfast was 
sumptuous’ 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 6
Related Work 
In NLP, various information extraction techniques have 
been developed in order to capture part-whole relations 
from texts: 
• Hearst, 1992 
Lexico-syntactic patterns to capture hyponymic (type-of) relations 
• Girju et al., 2003, 2006 
The method semi-automatically identifies patterns that encode part-whole 
relations and learns automatically the classification rules 
needed for the extraction of part-whole relations from these 
patterns. The authors report an overall average precision of 80.95% 
and recall of 75.91%. 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 7
Related Work (cont.) 
• Van Hage et al., 2006 
A method for learning part-whole relations from vocabularies and 
text sources; the authors were able to acquire 503 part-whole pairs 
from the AGROVOC Thesaurus to learn 91 reliable part-whole 
patterns. 
! 
• Pantel and Pennacchiotti, 2006 
The Espresso algorithm: takes as input a few seed instances of a 
particular relation and learns surface patterns to extract more 
instances. The algorithm obtains a precision of 80%. 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 8
Related Work (cont.) 
• Lexical ontologies for Portuguese: 
- WordNet.PT 
- PAPEL 
- Onto.PT 
! 
• Parsers of Portuguese: 
- The PALAVRAS parser (Bick, 2000), using 
the Visual Interactive Syntax Learning (VISL) environment; 
- LX Semantic Role Labeler (Branco & Costa, 2010). 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 9
Dependency Rule in STRING 
O Pedro partiu o braço do João 
‘Pedro broke João’s arm’ 
IF( MOD[POST](#2[UMB-Anatomical-human],#1[human]) & 
PREPD(#1,?[lemma:de]) & 
CDIR[POST](#3,#2) & ~WHOLE-PART(#1,#2) 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 10 
) 
WHOLE-PART(#1,#2) 
WHOLE-PART(João,braço)
Fixed Phrases and Frozen Sentences 
involving Nbp 
‣400 semi-automatically crafted rules, 
based on available lexicon-grammar of European Portuguese idioms 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 11
Other phenomena 
• DET=um and bilateral symmetry 
O Pedro partiu um braço 
‘Pedro broke an arm’ 
• relations between 2 Nbp 
A Ana pinta as unhas dos pés 
‘Ana paints the nails of the feet’ 
• part-of Nbp 
O Pedro tocou com a ponta da língua no gelado 
‘Pedro touched with the tip of the tongue on the ice cream’ 
• “hidden” Nbp with disease nouns 
O Pedro tem uma gastrite (estômago) 
‘Pedro has gastritis (stomach)’ 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 12
Evaluation 
• First fragment of the CETEMPúblico corpus (Rocha & Santos, 
2000): 14.7 M tokens; 6.3 M simple words; and 300 K sentences. 
• Using a Nbp lexicon (151 lemmas); 16,746 sentences with Nbp 
were extracted. 
• A random stratified sample of 1,000 sentences with Nbp, 
keeping the proportion of their total frequency in the source 
corpus. 
• Divided between 4 annotators – 4 subsets of 225 sentences 
each, with a common set of 100 sentences to assess inter-annotator 
agreement. 
‣WHOLE-PART, FIXED, nothing 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 12
Inter-annotator Agreement 
Inter-annotatorA Avegrargeee Pmairewniste Percent Agreement 
Fleiss’ Kappa 
Average Pairwise Cohen’s Kappa 
http://dfreelon.org/utils/recalfront/recal3/ 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 13
Results 
(1st evaluation) 
ResSulytsstem’s performance for Nbp 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 14
Error Analysis 
false-positives 
• Disambiguation of Nbp in context 
- língua ‘tonge/language’ 
- língua portuguesa ‘Portuguese language’ 
- língua de Camões ‘language of Camões’ 
• New idioms have been encoded in the lexicon 
- abrir o coração a ‘to open one’s heart to sb.’ 
- fazer face a ‘to face sth./to deal with’ 
• Nbp used figuratively 
Além disso, a nova face desta Igreja chilena… 
‘Moreover, the new face of this Chilean Church…’ 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 15
Error Analysis 
false-negatives 
• The whole and the part are not syntactically related and may 
be quite far away from each other: 
! 
O facto do corpo ter sido encontrado na cozinha, leva os bombeiros a 
suspeitar que a vítima, com graves problemas de saúde, tenha 
desmaiado e caído à lareira, o que poderá ter estado na origem do 
incêndio. 
‘The fact that the body was found in the kitchen, makes the firefighters to suspect 
that the victim with serious health problems fainted and fallen into the hearth, 
which may have been the origin of the fire.’ 
WHOLE-PART(vítima,corpo) 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 16
Error Analysis 
false-negatives (cont.) 
• Some human nouns and all pronouns (including personal, 
relative and demonstrative) are unmarked with the human 
feature (even if anaphora resolution performs ok); 
Segundo o responsável do hospital, o doente – que também sofreu 
graves ferimentos na cabeça – poderia ser ainda sujeito a uma segunda 
intervenção cirúrgica 
‘According to the head of the hospital, the patient - who also suffered 
serious head injuries – could still be subjected to a second surgical 
intervention’ 
ANTECEDENT(doente,que)! 
PART-WHOLE(que,cabeça)! 
‣inheritance of features and relative placing of AR and WP 
modules within STRING architecture 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 17
Error Analysis 
false-negatives (cont.) 
• A modifier of a noun or an adjective (and not a verb): 
! 
Um mágico com um barrete (enfiado) na cabeça 
‘A magician with a hat (stuck) in the head’ 
! 
WHOLE-PART(mágico,cabeça) 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 18
Results 
(2nd evaluation) 
System’s performance for Nbp 
+0.13 +0.11 +0.12 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 19
Thank you! 
echo "O Pedro penteou o cabelo do filho com os dedos" | xip/string.sh 
TOP 
+------------+----------+----------------+-------------------+ 
| | | | | 
NP VF NP PP PP 
+-------+ + +-------+ +----+-------+ +----+-------+ 
| | | | | | | | | | | 
ART NOUN VERB ART NOUN PREP ART NOUN PREP ART NOUN 
+ +- +- +- + + + +- +- + +- 
| | | | | | | | | | | 
O Pedro penteou o cabelo de o filho com os dedos 
MAIN(penteou) 
MOD_POST(cabelo,filho) 
MOD_POST(penteou,dedos) 
SUBJ_PRE(penteou,Pedro) 
CDIR_POST(penteou,cabelo) 
WHOLE-PART(filho,cabelo) 
WHOLE-PART(Pedro,dedos) 
string.l2f.inesc-id.pt 0>TOP{NP{O Pedro} VF{penteou} NP{o cabelo} PP{de o filho} PP{com os dedos}} 
Questions please! 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 21
References 
Berland, M. and Charniak, E. 1999. Finding parts in very large corpora. In Proceedings 
of the 37th annual meeting of the Association for Computational Linguistics on 
Computational Linguistics, pages 57–64. Morristown, NJ, USA. Association for 
Computational Linguistics. 
Bick, E. 2000. The Parsing System "Palavras": Automatic Grammatical Analysis of 
Portuguese in a Constraint Grammar Framework. Dr.phil. thesis. Aarhus University. 
Aarhus, Denmark: Aarhus University Press. November 2000. 
Branco, A. and Costa, F. 2010. A Deep Linguistic Processing Grammar for Portuguese. 
In Pardo et al. (eds.), Computational Processing of Portuguese, LNAI 6001, Springer, 
pp. 86–89. 
Girju,R., Badulescu A., and Moldovan, D. 2006. Automatic discovery of part-whole 
relations. Computational Linguistics, 21(1):83–135. 
Nascimento, M., Veloso, R., Marrafa, P., Pereira, L., Ribeiro, R., and Wittmann, L. 1998. 
LE-PAROLE: do Corpus à Modelização da Informação Lexical num Sistema-multifunção. 
Actas do XIII Encontro Nacional da Associação Portuguesa de 
Linguística, 2:115–134. 
Mamede, N., Baptista, J., Diniz, C. and Cabarrão, V. 2012. STRING: An hybrid statistical 
and rule-based natural language processing chain for portuguese. http:// 
www.propor2012.org/demos/DemoSTRING.pdf 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 20
References (cont.) 
Pantel, P. and Pennacchiotti, M. 2006. Espresso: Leveraging generic patterns for 
automatically harvesting semantic relations. In Proceedings of Conference on 
Computational Linguistics / Association for Computational Linguistics (COLING/ 
ACL-06), pages 113–120. Sydney, Australia. 
Rocha,P. and Santos, D. 2000. "CETEMPúblico: Um corpus de grandes dimensões de 
linguagem jornalística portuguesa". In Maria das Graças Volpe Nunes (ed.), V 
Encontro para o processamento computacional da língua portuguesa escrita e falada 
(PROPOR 2000) (São Paulo, Brasil, 19-22 de Novembro de 2000), São Paulo: 
ICMC/USP, pp. 131-140. 
Widlöcher, A. and Mathet, Y. 2012. The Glozz Platform: a Corpus Annotation and Mining 
Tool. In Proceedings of the 2012 Association for Computational Liguistics Symposium 
on Document Engineering, DocEng ’12, pages 171–180, Paris, France. Telecom 
ParisTech, Association for Computational Liguistics. 
Winston, M., Chaffin, R. and Herrmann, D.1987. A Taxonomy of Part-Whole Relations. 
Cognitive Science, 11:417–444. 
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa 
technology 
from seed 
L2 F - Spoken Language Systems Laboratory 21
technology 
from seed 
L2 F - Spoken Language Systems Laboratory

More Related Content

Similar to Body-Part Nouns and Whole-Part Relations in Portuguese

Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingSeonghyun Kim
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer scienceFelienne Hermans
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS PosterTaha Merghani
 
Language Lab Resources
Language Lab ResourcesLanguage Lab Resources
Language Lab Resourcesorrosado
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part IDelip Rao
 

Similar to Body-Part Nouns and Whole-Part Relations in Portuguese (20)

eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Parameter setting
Parameter settingParameter setting
Parameter setting
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in PortugueseAutomatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
Putting the science in computer science
Putting the science in computer sciencePutting the science in computer science
Putting the science in computer science
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic Perspective
 
Merghani-SACNAS Poster
Merghani-SACNAS PosterMerghani-SACNAS Poster
Merghani-SACNAS Poster
 
Bird05 nltk-intro
Bird05 nltk-introBird05 nltk-intro
Bird05 nltk-intro
 
Language Lab Resources
Language Lab ResourcesLanguage Lab Resources
Language Lab Resources
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part I
 
Pargram2011
Pargram2011Pargram2011
Pargram2011
 

Recently uploaded

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 

Recently uploaded (20)

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 

Body-Part Nouns and Whole-Part Relations in Portuguese

  • 1. PROPOR2014 - Intl. Conference on Computational Processing of Portuguese October 6-8, 2014, ICMC, São Carlos, SP, Brazil Body part nouns and Whole-Part Relations in Portuguese Ilia Markov123, Nuno Mamede23, Jorge Baptista123 1 U. Algarve/CECL 2 U. Lisboa/IST 3 INESC-ID Lisboa/L2F Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 1
  • 2. Objectives • Improve the automatic extraction of semantic relations between textual elements in a existing NLP system, STRING ! • Part-whole relations (meronymy) ! • Human body-part nouns (Nbp) ! O Pedro partiu o braço ‘Pedro broke the arm’ WHOLE-PART(Pedro,braço) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 2
  • 3. Objectives (cont.) ! • Development of a rule-base meronymy detection module for Human-Nbp relations • Implementation in STRING (Mamede et al., 2012) ! ! STRING: a hybrid, statistical and rule-based, Natural Language Processing (NLP) system for Portuguese string.l2f.inesc-id.pt Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 3
  • 4. Motivation Semantic relations are a device for structuring texts: contribute to cohesion and coherence of a text. Automatic extraction of semantic relations is useful for some NLP tasks: • Anaphora Resolution O Pedro lavou a cara ‘Pedro washed the face’ WHOLE-PART(Pedro,cara) O Pedro lavou a sua cara ‘Pedro washed his face’ WHOLE-PART(sua,cara) & ANTECEDENT(?,sua) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 4
  • 5. Motivation (cont.) • Semantic Role Labeling O Pedro partiu um braço ‘Pedro broke an arm’ WHOLE-PART(Pedro,braço) ➢ Pedro is an experiencer. O Pedro partiu o braço do João ‘Pedro broke João’s arm’ WHOLE-PART(João,braço) ➢ Pedro is an agent. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 5
  • 6. Motivation (cont.) • Opinion mining ! É um bom hotel: o quarto era limpo, as camas eram feitas de lavado todos os dias, e os pequenos-almoços eram opíparos ‘It is a nice hotel: the room was clean, the beds (bed sheets) were changed everyday, and the breakfast was sumptuous’ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 6
  • 7. Related Work In NLP, various information extraction techniques have been developed in order to capture part-whole relations from texts: • Hearst, 1992 Lexico-syntactic patterns to capture hyponymic (type-of) relations • Girju et al., 2003, 2006 The method semi-automatically identifies patterns that encode part-whole relations and learns automatically the classification rules needed for the extraction of part-whole relations from these patterns. The authors report an overall average precision of 80.95% and recall of 75.91%. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 7
  • 8. Related Work (cont.) • Van Hage et al., 2006 A method for learning part-whole relations from vocabularies and text sources; the authors were able to acquire 503 part-whole pairs from the AGROVOC Thesaurus to learn 91 reliable part-whole patterns. ! • Pantel and Pennacchiotti, 2006 The Espresso algorithm: takes as input a few seed instances of a particular relation and learns surface patterns to extract more instances. The algorithm obtains a precision of 80%. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 8
  • 9. Related Work (cont.) • Lexical ontologies for Portuguese: - WordNet.PT - PAPEL - Onto.PT ! • Parsers of Portuguese: - The PALAVRAS parser (Bick, 2000), using the Visual Interactive Syntax Learning (VISL) environment; - LX Semantic Role Labeler (Branco & Costa, 2010). Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 9
  • 10. Dependency Rule in STRING O Pedro partiu o braço do João ‘Pedro broke João’s arm’ IF( MOD[POST](#2[UMB-Anatomical-human],#1[human]) & PREPD(#1,?[lemma:de]) & CDIR[POST](#3,#2) & ~WHOLE-PART(#1,#2) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 10 ) WHOLE-PART(#1,#2) WHOLE-PART(João,braço)
  • 11. Fixed Phrases and Frozen Sentences involving Nbp ‣400 semi-automatically crafted rules, based on available lexicon-grammar of European Portuguese idioms Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 11
  • 12. Other phenomena • DET=um and bilateral symmetry O Pedro partiu um braço ‘Pedro broke an arm’ • relations between 2 Nbp A Ana pinta as unhas dos pés ‘Ana paints the nails of the feet’ • part-of Nbp O Pedro tocou com a ponta da língua no gelado ‘Pedro touched with the tip of the tongue on the ice cream’ • “hidden” Nbp with disease nouns O Pedro tem uma gastrite (estômago) ‘Pedro has gastritis (stomach)’ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 12
  • 13. Evaluation • First fragment of the CETEMPúblico corpus (Rocha & Santos, 2000): 14.7 M tokens; 6.3 M simple words; and 300 K sentences. • Using a Nbp lexicon (151 lemmas); 16,746 sentences with Nbp were extracted. • A random stratified sample of 1,000 sentences with Nbp, keeping the proportion of their total frequency in the source corpus. • Divided between 4 annotators – 4 subsets of 225 sentences each, with a common set of 100 sentences to assess inter-annotator agreement. ‣WHOLE-PART, FIXED, nothing Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 12
  • 14. Inter-annotator Agreement Inter-annotatorA Avegrargeee Pmairewniste Percent Agreement Fleiss’ Kappa Average Pairwise Cohen’s Kappa http://dfreelon.org/utils/recalfront/recal3/ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 13
  • 15. Results (1st evaluation) ResSulytsstem’s performance for Nbp Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 14
  • 16. Error Analysis false-positives • Disambiguation of Nbp in context - língua ‘tonge/language’ - língua portuguesa ‘Portuguese language’ - língua de Camões ‘language of Camões’ • New idioms have been encoded in the lexicon - abrir o coração a ‘to open one’s heart to sb.’ - fazer face a ‘to face sth./to deal with’ • Nbp used figuratively Além disso, a nova face desta Igreja chilena… ‘Moreover, the new face of this Chilean Church…’ Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 15
  • 17. Error Analysis false-negatives • The whole and the part are not syntactically related and may be quite far away from each other: ! O facto do corpo ter sido encontrado na cozinha, leva os bombeiros a suspeitar que a vítima, com graves problemas de saúde, tenha desmaiado e caído à lareira, o que poderá ter estado na origem do incêndio. ‘The fact that the body was found in the kitchen, makes the firefighters to suspect that the victim with serious health problems fainted and fallen into the hearth, which may have been the origin of the fire.’ WHOLE-PART(vítima,corpo) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 16
  • 18. Error Analysis false-negatives (cont.) • Some human nouns and all pronouns (including personal, relative and demonstrative) are unmarked with the human feature (even if anaphora resolution performs ok); Segundo o responsável do hospital, o doente – que também sofreu graves ferimentos na cabeça – poderia ser ainda sujeito a uma segunda intervenção cirúrgica ‘According to the head of the hospital, the patient - who also suffered serious head injuries – could still be subjected to a second surgical intervention’ ANTECEDENT(doente,que)! PART-WHOLE(que,cabeça)! ‣inheritance of features and relative placing of AR and WP modules within STRING architecture Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 17
  • 19. Error Analysis false-negatives (cont.) • A modifier of a noun or an adjective (and not a verb): ! Um mágico com um barrete (enfiado) na cabeça ‘A magician with a hat (stuck) in the head’ ! WHOLE-PART(mágico,cabeça) Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 18
  • 20. Results (2nd evaluation) System’s performance for Nbp +0.13 +0.11 +0.12 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 19
  • 21. Thank you! echo "O Pedro penteou o cabelo do filho com os dedos" | xip/string.sh TOP +------------+----------+----------------+-------------------+ | | | | | NP VF NP PP PP +-------+ + +-------+ +----+-------+ +----+-------+ | | | | | | | | | | | ART NOUN VERB ART NOUN PREP ART NOUN PREP ART NOUN + +- +- +- + + + +- +- + +- | | | | | | | | | | | O Pedro penteou o cabelo de o filho com os dedos MAIN(penteou) MOD_POST(cabelo,filho) MOD_POST(penteou,dedos) SUBJ_PRE(penteou,Pedro) CDIR_POST(penteou,cabelo) WHOLE-PART(filho,cabelo) WHOLE-PART(Pedro,dedos) string.l2f.inesc-id.pt 0>TOP{NP{O Pedro} VF{penteou} NP{o cabelo} PP{de o filho} PP{com os dedos}} Questions please! Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 21
  • 22. References Berland, M. and Charniak, E. 1999. Finding parts in very large corpora. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 57–64. Morristown, NJ, USA. Association for Computational Linguistics. Bick, E. 2000. The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Dr.phil. thesis. Aarhus University. Aarhus, Denmark: Aarhus University Press. November 2000. Branco, A. and Costa, F. 2010. A Deep Linguistic Processing Grammar for Portuguese. In Pardo et al. (eds.), Computational Processing of Portuguese, LNAI 6001, Springer, pp. 86–89. Girju,R., Badulescu A., and Moldovan, D. 2006. Automatic discovery of part-whole relations. Computational Linguistics, 21(1):83–135. Nascimento, M., Veloso, R., Marrafa, P., Pereira, L., Ribeiro, R., and Wittmann, L. 1998. LE-PAROLE: do Corpus à Modelização da Informação Lexical num Sistema-multifunção. Actas do XIII Encontro Nacional da Associação Portuguesa de Linguística, 2:115–134. Mamede, N., Baptista, J., Diniz, C. and Cabarrão, V. 2012. STRING: An hybrid statistical and rule-based natural language processing chain for portuguese. http:// www.propor2012.org/demos/DemoSTRING.pdf Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 20
  • 23. References (cont.) Pantel, P. and Pennacchiotti, M. 2006. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ ACL-06), pages 113–120. Sydney, Australia. Rocha,P. and Santos, D. 2000. "CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa". In Maria das Graças Volpe Nunes (ed.), V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000) (São Paulo, Brasil, 19-22 de Novembro de 2000), São Paulo: ICMC/USP, pp. 131-140. Widlöcher, A. and Mathet, Y. 2012. The Glozz Platform: a Corpus Annotation and Mining Tool. In Proceedings of the 2012 Association for Computational Liguistics Symposium on Document Engineering, DocEng ’12, pages 171–180, Paris, France. Telecom ParisTech, Association for Computational Liguistics. Winston, M., Chaffin, R. and Herrmann, D.1987. A Taxonomy of Part-Whole Relations. Cognitive Science, 11:417–444. Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L2 F - Spoken Language Systems Laboratory 21
  • 24. technology from seed L2 F - Spoken Language Systems Laboratory