SlideShare a Scribd company logo
Embedding NomLex-BR nominalizations into
OpenWordnet-PT
Livy Maria Real Coelho1 Alexandre Rademaker2,5
Valeria de Paiva3 Gerard de Melo4
UFP
IBM Research
Nuance Comms.
Tsinghua University
FGV/EMAp

February 1, 2014
The English NomLex
NomLex (cont.)

a dictionary of English
nominalizations, under
Catherine Macleod.
relate the nominal complements
to the arguments of the
corresponding verb.
1025 entries of several types of
lexical nominalizations.
Alexander’s destruction of the
city happened in 330 BC.

first version on January 15,
1999, latest version October
2001 downloadable from
http://bit.ly/1aZWQmh
Nomlex (cont.)

( nom : o r t h ” p r o m o t i o n ”
: v e r b ” promote ”
: nom−type ( ( verb−nom ) )
: v e r b − s u b j ( ( n−n−mod) ( d e t − p o s s ) )
: v e r b − s u b c ( ( nom−np : o b j e c t ( ( d e t − p o s s ) ( n−n−mod ) ( pp−of ) ) )
( nom−np−as−np : o b j e c t ( ( d e t − p o s s ) ( pp−of ) ) )
( nom−possing : nom−subc ( ( p − p o s s i n g : p v a l ( ” o f ” ) ) ) )
( nom−np−pp : o b j e c t ( ( d e t − p o s s ) (n−n−mod) ( pp−of ) )
: p v a l ( ” i n t o ” ” from ” ” f o r ” ” t o ” ) )
( nom−np−pp−pp : o b j e c t ( ( d e t − p o s s ) (n−n−mod) ( pp−of ) )
: p v a l ( ” f o r ” ” i n t o ” ” t o ” ) : p v a l 2 ( ” from ” ) ) ) )
Related Works
Nominalizations have been studied for more than 4 decades
(Chomsky, 1970).
NomLex-Plus (Meyers et al., 2004). Extension of NomLex with 7.050
nominalizations.
The NomBank Project (Meyer, 2007) http://bit.ly/1d5G7L9.
“ mark the sets of arguments that co-occur with nouns in the
PropBank Corpus, just as PropBank records such information for
verbs... firmly on the shoulders of NOMLEX...”
Berkeley FrameNet (https://framenet.icsi.berkeley.edu/).
11600 lexical units based on frame semantics supported by corpus
evidence. Deverbal nominalizations are annotated as events (in the
frame of verbs) or entities/results (diff. semantic frame).
FrameNet-Brazil, http://www.ufjf.br/framenetbr/.
Using for NLP (IE)

To write maps bettween IE patterns for active clauses to IE patterns
for nominalizations.
Active clause: “IBM appointed Alice Smith as vice president”.
Passive clause: “IBM’s appointment of Alice Smith as vice president”
and “Alice Smith’s appointment as vice president”.
Main use for NLP (IE) (cont.)
The Proteus Extraction System starts with:
np(C-company) vg(appoint) np(C-person) "as" np(C-position)
Meta rules to produce passive clause pattern:
np(C-person) vg-pass(appoint) "as" np(C-position) "by"
np(C-company)
When a pattern matches the input, the pieces corresponding to its
constituents are used to build a semantic representation of the patter (e.g.
logical form).
vg = verb group (plus auxiliares). vg-pass = passive verb group.
Project Motivation: DHBB

7.5K entries Brazilian Historical
Biographic Dictionary (DHBB).
Enrich the structure (semantics).
Uniform data treatment (standards and
interlinks between collections).
NLP of DHBB entries: (1) word sense
disambiguation with openWordnet-PT;
and (2) named entity recognition to
make links. (133K proper names)
We need grammars, lexical resources, ontologies, KBs, automated theorem
provers etc to reason about knowledge extracted from text. This will
empower QA, KE, MT, personal assistents and other systems.
Nominalizations in Portuguese

Nominalizations: difficult to deal with in KR systems, harder to
obtain the arguments of nominal predicate;
NOMLEX project (Macleod et al., 1998) provides a well-established,
open access baseline;
nominalizations with the suffixes -¸˜o/-ion, -mento/-ment and
ca
-or/-er, which work well in Portuguese;
E.g. constru¸˜o (construction), adiamento (adjournment) and
ca
escritor (writer );
90% of the original resource easily manually translated.
How we expanded it

We translate both noun/verb by looking up in extractions from the EN
and PT Wiktionary dumps, generating all combination of noun/verb
translations. Filter to compare the noun and verb translations to see if
they are similar enough to be morphologically related.
Other experiments with DHBB and openWordnet-PT.
NomLex-BR
a dictionary of Portuguese nominalizations
Relate nominals to corresponding verbs
Over 2,539 entries of several types of lexical nominalizations
first version of NOMLEX-BR in 2011, much expanded 2013
Freely available for download and embeded in openWordnet-PT.
A RDF vocabulary to describe nominalizations. Future extensions to
cover more information from COMLEX and COMNOM (extension
from NomBank).
URI for the schema,
http://arademaker.github.com/nomlex/schema/! Need a better
and stable URI.
“Constru¸˜o da rodovia Transamazˆnica, na d´cada de 70, pelo governo
ca
o
e
Medici, uma das obras faraˆnicas da ditadura militar.”
o
Embedding in openWordnet-PT

But nomlex:noun and nomlex:verb should point to wn30:WordSense
not wn30:Word! Future work!
By Provenance
See http://bit.ly/Mohmni
select ?prov (count(?x) as ?total) {
?x a nomlex:Nominalization ;
dc:provenance ?prov .
}
group by ?prov
provenance
nomlex
wiktionary-pt
wiktionary-en
framenet
nomage
dhbb
openWordnet-PT
linguateca

total
1032
61
91
142
262
159
82
484
By suffix

See:
http://bit.ly/LmAXn4; and
http://bit.ly/1fKEnKr.
Result:
suffix
mento
¸˜o
ca
or

total
329
660
891

Some other cases http://bit.ly/1fyia3a.
Results
Extension of OpenWN-PT aims at incorporating links to connect
deverbal nouns with their corresponding verbs.
The integration into OpenWN-PT will facilitate their use for linguistic
research as well as information extraction
Incorporating NOMLEX-BR data into OpenWN-PT has shown itself
useful in pinpointing some issues with the coherence and richness of
OpenWN-PT.
the word abasement corresponds in NOMLEX to the verb abase,
and thus we would like a similar correspondence between the
Portuguese noun “aviltamento” and the verb “aviltar” (suggested
translations). OpenWN-PT simply has two synsets “humilhar,
abaixar” and “humilhar, rebaixar”. The more common verb humilhar
is repeated, while the uncommon aviltar was left out.
Next Steps

Finish to embed Nomlex-BR into OpenWN-PT (anchor floating
words, http://bit.ly/1aQdpkr).
Work with Claudia Freitas and Hugo Gon¸alvez on leveraging
c
Linguatecas PAPEL, Cart˜o, ACDC and Floresta Sint´(c)tica.
a
a
Lists from Linguateca’s resources complement NomLex-BR using
corpora and make sure our resource is not simply a translation.
Adding the Portuguese terms that satisfy different relations?
OpenVerbNet-PT? Glosses? Classification of nominalizations?
We are developing our own web interface for browsing and
collaborative editing. Most important pending issue!
Use and test the accuracy of the resource! More applications!
Conclusion
We presented NomLex-BR, an lexicon
of nominalizations in Brazilian
Portuguese.
NomLex-BR is embedded into
OpenWordNet-PT and shares its RDF
representation.
Recent improvements include better
coverage: newer suffixes and Nomage
incorporation.
The work with Nomlex-BR helped us to
improve openWordnet-PT (new words,
senses).
The data is freely available from
http://github.com/arademaker/wordnet-br/ and a SPARQL
Endpoint at http://logics.emap.fgv.br:10035.
Obrigado!
Multilingual Wordnet 1.0

1/26/14, 8:21 AM

Synset 01146493-a
Danish
English
Finnish
French
Galician
Indonesian
Italian

taknemmelig
thankful, grateful
kiitollinen
reconnaissant
grato, agradecido
bersyukur, berterima kasih, tanda terima kasih, terhutang budi
grato, riconoscente

Japanese

忝い, 有り難い, 感謝を感じた, 幸甚, ありがたい, 有難い, 感謝を表した

Bokmål
takknemlig
Portuguese reconhecido, grato, agradecido
Thai
ซึ่งสำนึกในบุญคุณ
bersyukur, berterima kasih, tanda terima kasih, menampakkan tanda kesyukuran,
Malaysian
memperlihatkan tanda kesyukuran, terhutang budi
Eng: feeling or showing gratitude; "a grateful heart"; "grateful for the tree's shade"; "a thankful
smile";
Similar to: appreciative glad

More Related Content

Similar to Embedding NomLex-BR nominalizations into OpenWordnet-PT

Embedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PTEmbedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PT
Valeria de Paiva
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
Alexandre Rademaker
 
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
Lifeng (Aaron) Han
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
Sean Golliher
 
W17 5406
W17 5406W17 5406
W17 5406
bonbon93
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for Portuguese
Valeria de Paiva
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Jimmy Lai
 
How do we generate spoken words This issue is a fasci-natin.docx
How do we generate spoken words This issue is a fasci-natin.docxHow do we generate spoken words This issue is a fasci-natin.docx
How do we generate spoken words This issue is a fasci-natin.docx
wellesleyterresa
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
captainmactavish1996
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
Tobias Wunner
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
Cornelius Puschmann
 
WORD_FORMATION_PROCESS.pdf
WORD_FORMATION_PROCESS.pdfWORD_FORMATION_PROCESS.pdf
WORD_FORMATION_PROCESS.pdf
AmaraSoomro1
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
JemalNesre1
 
grammer genration
grammer genration grammer genration
grammer genration
shakeelAsghar6
 
Language tools bne-5-10-2011
Language tools bne-5-10-2011Language tools bne-5-10-2011
Language tools bne-5-10-2011
IMPACT Centre of Competence
 
IMPACT Final Conference - Katrien Depuydt
IMPACT Final Conference - Katrien DepuydtIMPACT Final Conference - Katrien Depuydt
IMPACT Final Conference - Katrien Depuydt
IMPACT Centre of Competence
 

Similar to Embedding NomLex-BR nominalizations into OpenWordnet-PT (20)

Embedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PTEmbedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PT
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
W17 5406
W17 5406W17 5406
W17 5406
 
Lexical Resources for Portuguese
Lexical Resources  for PortugueseLexical Resources  for Portuguese
Lexical Resources for Portuguese
 
Nltk natural language toolkit overview and application @ PyCon.tw 2012
Nltk  natural language toolkit overview and application @ PyCon.tw 2012Nltk  natural language toolkit overview and application @ PyCon.tw 2012
Nltk natural language toolkit overview and application @ PyCon.tw 2012
 
How do we generate spoken words This issue is a fasci-natin.docx
How do we generate spoken words This issue is a fasci-natin.docxHow do we generate spoken words This issue is a fasci-natin.docx
How do we generate spoken words This issue is a fasci-natin.docx
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
WORD_FORMATION_PROCESS.pdf
WORD_FORMATION_PROCESS.pdfWORD_FORMATION_PROCESS.pdf
WORD_FORMATION_PROCESS.pdf
 
Chapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdfChapter 2 Text Operation and Term Weighting.pdf
Chapter 2 Text Operation and Term Weighting.pdf
 
grammer genration
grammer genration grammer genration
grammer genration
 
Language tools bne-5-10-2011
Language tools bne-5-10-2011Language tools bne-5-10-2011
Language tools bne-5-10-2011
 
IMPACT Final Conference - Katrien Depuydt
IMPACT Final Conference - Katrien DepuydtIMPACT Final Conference - Katrien Depuydt
IMPACT Final Conference - Katrien Depuydt
 
Namespace.pdf
Namespace.pdfNamespace.pdf
Namespace.pdf
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Embedding NomLex-BR nominalizations into OpenWordnet-PT

  • 1. Embedding NomLex-BR nominalizations into OpenWordnet-PT Livy Maria Real Coelho1 Alexandre Rademaker2,5 Valeria de Paiva3 Gerard de Melo4 UFP IBM Research Nuance Comms. Tsinghua University FGV/EMAp February 1, 2014
  • 3. NomLex (cont.) a dictionary of English nominalizations, under Catherine Macleod. relate the nominal complements to the arguments of the corresponding verb. 1025 entries of several types of lexical nominalizations. Alexander’s destruction of the city happened in 330 BC. first version on January 15, 1999, latest version October 2001 downloadable from http://bit.ly/1aZWQmh
  • 4. Nomlex (cont.) ( nom : o r t h ” p r o m o t i o n ” : v e r b ” promote ” : nom−type ( ( verb−nom ) ) : v e r b − s u b j ( ( n−n−mod) ( d e t − p o s s ) ) : v e r b − s u b c ( ( nom−np : o b j e c t ( ( d e t − p o s s ) ( n−n−mod ) ( pp−of ) ) ) ( nom−np−as−np : o b j e c t ( ( d e t − p o s s ) ( pp−of ) ) ) ( nom−possing : nom−subc ( ( p − p o s s i n g : p v a l ( ” o f ” ) ) ) ) ( nom−np−pp : o b j e c t ( ( d e t − p o s s ) (n−n−mod) ( pp−of ) ) : p v a l ( ” i n t o ” ” from ” ” f o r ” ” t o ” ) ) ( nom−np−pp−pp : o b j e c t ( ( d e t − p o s s ) (n−n−mod) ( pp−of ) ) : p v a l ( ” f o r ” ” i n t o ” ” t o ” ) : p v a l 2 ( ” from ” ) ) ) )
  • 5. Related Works Nominalizations have been studied for more than 4 decades (Chomsky, 1970). NomLex-Plus (Meyers et al., 2004). Extension of NomLex with 7.050 nominalizations. The NomBank Project (Meyer, 2007) http://bit.ly/1d5G7L9. “ mark the sets of arguments that co-occur with nouns in the PropBank Corpus, just as PropBank records such information for verbs... firmly on the shoulders of NOMLEX...” Berkeley FrameNet (https://framenet.icsi.berkeley.edu/). 11600 lexical units based on frame semantics supported by corpus evidence. Deverbal nominalizations are annotated as events (in the frame of verbs) or entities/results (diff. semantic frame). FrameNet-Brazil, http://www.ufjf.br/framenetbr/.
  • 6. Using for NLP (IE) To write maps bettween IE patterns for active clauses to IE patterns for nominalizations. Active clause: “IBM appointed Alice Smith as vice president”. Passive clause: “IBM’s appointment of Alice Smith as vice president” and “Alice Smith’s appointment as vice president”.
  • 7. Main use for NLP (IE) (cont.) The Proteus Extraction System starts with: np(C-company) vg(appoint) np(C-person) "as" np(C-position) Meta rules to produce passive clause pattern: np(C-person) vg-pass(appoint) "as" np(C-position) "by" np(C-company) When a pattern matches the input, the pieces corresponding to its constituents are used to build a semantic representation of the patter (e.g. logical form). vg = verb group (plus auxiliares). vg-pass = passive verb group.
  • 8. Project Motivation: DHBB 7.5K entries Brazilian Historical Biographic Dictionary (DHBB). Enrich the structure (semantics). Uniform data treatment (standards and interlinks between collections). NLP of DHBB entries: (1) word sense disambiguation with openWordnet-PT; and (2) named entity recognition to make links. (133K proper names) We need grammars, lexical resources, ontologies, KBs, automated theorem provers etc to reason about knowledge extracted from text. This will empower QA, KE, MT, personal assistents and other systems.
  • 9. Nominalizations in Portuguese Nominalizations: difficult to deal with in KR systems, harder to obtain the arguments of nominal predicate; NOMLEX project (Macleod et al., 1998) provides a well-established, open access baseline; nominalizations with the suffixes -¸˜o/-ion, -mento/-ment and ca -or/-er, which work well in Portuguese; E.g. constru¸˜o (construction), adiamento (adjournment) and ca escritor (writer ); 90% of the original resource easily manually translated.
  • 10. How we expanded it We translate both noun/verb by looking up in extractions from the EN and PT Wiktionary dumps, generating all combination of noun/verb translations. Filter to compare the noun and verb translations to see if they are similar enough to be morphologically related. Other experiments with DHBB and openWordnet-PT.
  • 11. NomLex-BR a dictionary of Portuguese nominalizations Relate nominals to corresponding verbs Over 2,539 entries of several types of lexical nominalizations first version of NOMLEX-BR in 2011, much expanded 2013 Freely available for download and embeded in openWordnet-PT. A RDF vocabulary to describe nominalizations. Future extensions to cover more information from COMLEX and COMNOM (extension from NomBank). URI for the schema, http://arademaker.github.com/nomlex/schema/! Need a better and stable URI. “Constru¸˜o da rodovia Transamazˆnica, na d´cada de 70, pelo governo ca o e Medici, uma das obras faraˆnicas da ditadura militar.” o
  • 12. Embedding in openWordnet-PT But nomlex:noun and nomlex:verb should point to wn30:WordSense not wn30:Word! Future work!
  • 13. By Provenance See http://bit.ly/Mohmni select ?prov (count(?x) as ?total) { ?x a nomlex:Nominalization ; dc:provenance ?prov . } group by ?prov provenance nomlex wiktionary-pt wiktionary-en framenet nomage dhbb openWordnet-PT linguateca total 1032 61 91 142 262 159 82 484
  • 15. Results Extension of OpenWN-PT aims at incorporating links to connect deverbal nouns with their corresponding verbs. The integration into OpenWN-PT will facilitate their use for linguistic research as well as information extraction Incorporating NOMLEX-BR data into OpenWN-PT has shown itself useful in pinpointing some issues with the coherence and richness of OpenWN-PT. the word abasement corresponds in NOMLEX to the verb abase, and thus we would like a similar correspondence between the Portuguese noun “aviltamento” and the verb “aviltar” (suggested translations). OpenWN-PT simply has two synsets “humilhar, abaixar” and “humilhar, rebaixar”. The more common verb humilhar is repeated, while the uncommon aviltar was left out.
  • 16. Next Steps Finish to embed Nomlex-BR into OpenWN-PT (anchor floating words, http://bit.ly/1aQdpkr). Work with Claudia Freitas and Hugo Gon¸alvez on leveraging c Linguatecas PAPEL, Cart˜o, ACDC and Floresta Sint´(c)tica. a a Lists from Linguateca’s resources complement NomLex-BR using corpora and make sure our resource is not simply a translation. Adding the Portuguese terms that satisfy different relations? OpenVerbNet-PT? Glosses? Classification of nominalizations? We are developing our own web interface for browsing and collaborative editing. Most important pending issue! Use and test the accuracy of the resource! More applications!
  • 17. Conclusion We presented NomLex-BR, an lexicon of nominalizations in Brazilian Portuguese. NomLex-BR is embedded into OpenWordNet-PT and shares its RDF representation. Recent improvements include better coverage: newer suffixes and Nomage incorporation. The work with Nomlex-BR helped us to improve openWordnet-PT (new words, senses). The data is freely available from http://github.com/arademaker/wordnet-br/ and a SPARQL Endpoint at http://logics.emap.fgv.br:10035.
  • 18. Obrigado! Multilingual Wordnet 1.0 1/26/14, 8:21 AM Synset 01146493-a Danish English Finnish French Galician Indonesian Italian taknemmelig thankful, grateful kiitollinen reconnaissant grato, agradecido bersyukur, berterima kasih, tanda terima kasih, terhutang budi grato, riconoscente Japanese 忝い, 有り難い, 感謝を感じた, 幸甚, ありがたい, 有難い, 感謝を表した Bokmål takknemlig Portuguese reconhecido, grato, agradecido Thai ซึ่งสำนึกในบุญคุณ bersyukur, berterima kasih, tanda terima kasih, menampakkan tanda kesyukuran, Malaysian memperlihatkan tanda kesyukuran, terhutang budi Eng: feeling or showing gratitude; "a grateful heart"; "grateful for the tree's shade"; "a thankful smile"; Similar to: appreciative glad