SlideShare a Scribd company logo
1 of 33
Download to read offline
+

Lexical Resources
for Portuguese
Valeria de Paiva
(joint work with Alexandre
Rademaker, Gerard de Melo
and Livy Real)
+

WordNet?

http://wordnetweb.princeton.edu/
+

Why this talk?...
+

WordNet…

n 

WordNet created at Princeton University under
George A. Miller, since 1985. A lexical database for
English: groups English words into sets of synonyms
called synsets, provides short, general definitions, and
records the various semantic relations between these.

n 

This produces a combination of dictionary and
thesaurus that is intuitive, usable, and supports
automatic text analysis and artificial intelligence
applications. Released under a BSD style license, can be
downloaded and used freely.

n 

WordNet is the most commonly used computational
lexicon of English.

n 

Some complaints that WordNet encodes sense
distinctions that are too fine-grained even for humans.
The granularity issue has been tackled by proposing
clustering methods that automatically group together
similar senses of the same word.
+

Global WordNet Association
n 

Christiane Fellbaum and Piek Vossen
(EuroWordNet 1996-1999, GWA since)

n 

The Global WordNet Association (GWA)
is a free, public and non-commercial
organization that provides a platform for
discussing, sharing and connecting
wordnets for all languages in the world.

n 

Global WordNet Grid since 2006. Open
Multilingual Wordnet
http://casta-net.jp/~kuribayashi/multi/
Francis Bond

n 

A simple user interface: Welcome to the
Open Multi-lingual Wordnet (1.0)
http://casta-net.jp/~kuribayashi/cgi-bin/
wn-multi.cgi
+

Multilingual Wordnet 1.0
+ OpenWordnet-PT?
(aren’t all wordnets open?)
Previous work: WordNet.PT and WordNet.PT
global (Lisboa), MultiWordNet.PT and Brazilian
WordNet by Bento Dias.

We need a Portuguese
Wordnet for our work, but
none of the previous projects
is openly available.
+

Previous Portuguese WordNets…
n 

WordNet.PT (P. Marrafa) since 1999, part
of EuroWordNet, 19K expressions,
manually curated, online consulting
only. Some domains

n 

MWN.PT - MultiWordnet of Portuguese
(A. Branco), since 2008, part of MWN,
over 17,200 manually validated
concepts/synsets, not free

n 

WN.Br (B. Dias da Silva) since 2000, not
open, not available online, REBECA 2010
only ‘wheeled vehicles’….
+ OpenWN-PT: What?
n 

Leverage EuroWordNet, MultiWordNet, Global
WordNet experience

n 

Recruited Gerard de Melo for project

n 

Leverage YAGO, UWN/Menta experience…

n 

UWN/MENTA (de Melo/Weikum) A large-scale
multilingual lexical knowledge base built using
statistical methods, transforming WordNet into a
massively multilingual resource (over 1 million words
and several million named entities in a single large
multilingual taxonomy)

n 

Portuguese `projection’ of UWN/Menta is the basis of
automated version of a OpenWordNet-PT, publicly
available.
https://github.com/arademaker/wordnet-br
+ OpenWN-PT: the basis…
Combined the following data:

Princeton WordNet 3.0 used to obtain English
glosses and English terms for synset IDs.
The unreleased 2010-12 version UWN and MENTA
provided candidate terms in Portuguese, candidate
glosses in Portuguese (from Wikipedia), and
candidate terms in Spanish.
The EuroWordNet base concept list (5000_bc.xml)
provides the base concept numbers.
The original file was mapped from WordNet 2.0 to
3.0 using the mappings from WN-Map. When
multiple mappings for a WordNet 2.0 synset existed,
all possible WordNet 3.0 synsets were kept.
http://nlp.lsi.upc.edu/web/index.php?option=com_content&task=view&id=21&Itemid=57

https://github.com/arademaker/wordnet-br
+

OpenWN-PT: the method
n 

a two-tiered methodology: high precision for the
more frequent words of the language, but also high
to cover a wide range of words in the long tail

n 

Translation dictionaries to map the English
members of a synset to possible Portuguese
translation candidates. To disambiguate and
choose the correct translations, feature vectors for
possible translations are created by computing
graph-based statistics in the graph of words,
translations, and synsets. Monolingual wordnets
and parallel corpora used to enrich this graph.
Statistical learning techniques used to iteratively
refine this information and build an output graph
connecting Portuguese words to synsets.

n 

Wikipedia pages are then linked to relevant
WordNet synsets by learning from similar graphbased features as well as gloss similarity scores.
+

More method…
n 

To have high precision for the most
important concepts of a language,
rely on human annotators.

n 

Set of 4689 “Common Base Concepts”
GWA

n 

2,498 manually entered sense-word
pairs as well as an additional 1,299
manually written Portuguese synset
glosses.

n 

Does it work?
+

OpenWN-PT: some numbers…

Column (3) synsets with portuguese words
+

OpenWN-PT: some numbers…

But how good are these entries?
How to measure? How to improve?
+ OpenWN-PT: what does it look like?
n 

Typical good entry with minor manual improvements.

n 

Automatic produces candidate Portuguese words for each
of some of WN3.0 synsets.

n 

Check suggested words and add Portuguese gloss and
examples.
+ OpenWN-PT: what does it look like?
Not very useful

Good automatically suggestion
+

OpenWN-PT: some issues…

Capitalized items, plurals, duplicates,
a few gender issues, missing items…
+ OpenWN-PT: true lexical gaps?...
+ OpenWN-PT: manual revisions

Native speakers, but
not linguists…
Plenty of errors…
+

OpenWN-PT: RDF Representation

n 

Why? To address the issue of interoperability between
wordnets. To be able to rely on Linked Data and
Semantic Web standards such as RDF and OWL.

n 

The emergence of Linked Data projects for lexical and
reasoning resources make OpenWN-PT encoded and
distributed in RDF/OWL.

n 

Standards allow both the data model and the actual data
in the same format. Plus range of existing data
processing tools, including databases (“triple stores”)
with SQL-like query interfaces (SPARQL).

n 

Standard W3C encoding of WordNet in RDF since 2006.
OpenWN-PT is is modelled after and fully interoperable
with Princeton WordNet.

n 

This means that one can easily find Portuguese
equivalents for specific English word senses and vice
versa. Also means OpenWN-PT is part of a large
ecosystem of compatible resources, including
domain identifiers and mappings to Wikipedia.
+

Progress Report
n 

Checking is much easier than starting from scratch..

n 

But long and tedious work to check even the initial 5k synsets
suggested by GWA (not done, yet!), let alone all synsets in
OpenWN-PT

n 

Necessary? YES! Lexical gaps of all sorts

n 

But resource is being used, warts and all…

n 

Improving the resource: new data from Bond/Foster and some
manual additions
+

Use Cases: FreeLing
n 

Word Sense Disambiguation
via FreeLing 3.0 An Open
Source Suite of Language
Analyzers

n 

OpenWN-PT has been
incorporated into FreeLing
(Padro’ and Stanilovsky, 2012)

n 

Using Freeling’s word sense
disambiguation framework, a
given Portuguese text can
automatically be annotated
with word senses.

n 

UPC, Barcelona
+

Use Cases: Sentiment Analysis
n 

Sentiment Analysis, using tweets
about soccer games

n 

OpenWN-PT and SentiWordNet
to compare the
MachineLearning-based
sentiment analysis integrated
into IBM InfoSphere Streams
(ISS) platform.

n 

1 million tweets, 4 friendly
matches Brazilian team in 2013,
7 classes of positivity

n 

IBM Research, BR
+

Use Cases: NomLex-Br (Livy Real)
n 

extension of OpenWN-PT aims at incorporating links to connect
deverbal nouns with their corresponding verbs.

n 

For English, NOMLEX (Macleod et al., 1998) has
provided extensive descriptions of nominal- izations
via extensions of initial core.

n 

NOMLEX was constructed starting out with
nominalizations with the suffixes -ion, -ment and -er,
taking samples of the most frequent words first in a
list of nouns from a combination of the Brown Corpus
and the Wall Street Journal (about 1 million words of
each).

n 

NOMLEX-BR Translation of initial core, plus French
Nomage

n 

Overall, we have created over 2,000 entries. These
have been integrated into OpenWN-PT, will facilitate
their use for linguistic research as well as information
extraction

The destruction of the city by Alexander in 330BC…
+

Use Cases: NomLex-Br (Livy Real)

n 

Incorporating NOMLEX-BR data into OpenWN-PT
has shown itself useful in pinpointing some issues
with the coherence and richness of OpenWN-PT.

n 

the word abasement corresponds in NOMLEX to
the verb abase, and thus we would like a similar
correspondence between the Portuguese noun
aviltamento and the verb aviltar (our suggested
translations). OpenWN-PT simply has two synsets
humilhar, abaixar and humilhar, rebaixar. The more
common verb humilhar is repeated, while the
uncommon aviltar was left out.

n 

Other useful kinds of relationships between parts
of speech (say the connections between adjectives and adverbs) are likely to also help to
improve the accuracy and richness
+

Miscellaneous Experiments
n 

Accuracy: choose six relations:
hypernymOf, memberHolonymOf,
instanceOf, substanceHolonymOf,
entails and causes.

n 

For each of these relations, we
randomly chose 30 pairs of synsets
and then random words from each
synset. We ended up with 180
random sentences for manual
verification.

n 

The linguist marked each
sentence as “correct”, “wrong” or
“dubious”. Obtained 150
sentences correct (83% of the
sentences), 17 marked as wrong, 13
marked as dubious.

n 

Need more systematic effort. But
results were encouraging

n 

Coverage: Using DHBB to
complete NOMLEX-BR.

n 

Other paper…
+ Conclusions
n 

We discussed the implementation and some
applications of OpenWordNet-PT, an open WordNet for Brazilian Portuguese.

n 

Recent improvements include better coverage
and nominalization links connecting nouns and
verbs.

n 

The resource has been used in developing a
high-throughput commercial system as well as
in a cultural heritage project, and we anticipate
that numerous further applications will follow.

n 

The data is freely available from http://
github.com/ arademaker/wordnet-br/ and a
SPARQL Endpoint at logics.emap.fgv.br:10035.

n 

Browsing via Open Multilingual Wordnet //
www.casta-net.jp/ ~kuribayashi/ cgi-bin/wnmulti.cgi is fun
+ OpenWN-PT: next steps?..
n 

First finish translating the “core” synsets in the Princeton
WordNet to Portuguese.

n 

Increase number of relations in OpenWN-PT as a way of
improving adequacy and coherence.

n 

Adding the Portuguese terms that satisfy different relations?

n 

OpenVerbNet-PT?

n 

Since we have a first target corpus, the Brazilian Historical
Biographic Dictionary, we can also calculate word frequency to
prioritize expansion of the OpenWN-PT and go back to the
ontology building...
+
Thanks!
+

References
Revisiting a Brazilian Wordnet. Valeria de Paiva, Alexandre
Rademaker,  (2012)
Proceedings of Global Wordnet Conference, Global Wordnet
Association, Matsue.
OpenWordNet-PT: An Open Brazilian WordNet For Reasoning.
de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. In
Proceedings of the 24th International Conference On Computational
Linguistics. http://hdl.handle.net/10438/10274.
OpenWordNet-PT: A Project Report. Alexandre Rademaker, Valeria
de Paiva, Gerard de Melo, Livy Real and Maira Gatti.
Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia.
Global Wordnet Association, 2014.
Embedding NomLex-BR Nominalizations Into OpenWordnet-PT.
Coelho, Livy Maria Real, Alexandre Rademaker, Valeria De Paiva, and
Gerard de Melo. 2014. In Proceedings of the 7th Global WordNet
Conference. Tartu, Estonia
+

Other stuff to add in?…
n 

Onto.PT, ES wordnet?

n 

Editing interfaces?

n 

BabelNet?

n 

NER issues?

n 

Temporal issues?

n 

Work with Claudia Freitas?…Leonel?

n 

Work on implicatives/factives in Portuguese?

n 

FOIS workshop
+

References
Towards a Universal Wordnet by Learning from Combined
Evidence  Gerard de Melo, Gerhard Weikum (2009)
18th ACM Conference on Information and Knowledge Management
(CIKM 2009), Hong Kong, China.
Bridges from Language to Logic:  Concepts, Contexts and
Ontologies Valeria de Paiva (2010)
Logical and Semantic Frameworks with Applications, LSFA'10, Natal,
Brazil, 2010.
`A Basic Logic for Textual inference", AAAI Workshop on Inference for
Textual Question Answering, 2005.
``Textual Inference Logic: Take Two", CONTEXT 2007.
``Precision-focused Textual Inference", Workshop on Textual
Entailment and Paraphrasing, 2007.
PARC's Bridge and Question Answering System Proceedings of
Grammar Engineering Across Frameworks, 2007.
+ Simplifying the PARC’s Bridge Architecture
Text

Parsing

Inference
Engines

KR Mapping

F-structure

semantics

KR

Sources
Assertions

Query

Question

Grammar
Stanford Parser

Term rewriting
OpenWN-PT
SUMO-PT
KR mapping rules

Textual Inference
logics

Idea: Simplify and reproduce components in PORTUGUESE

More Related Content

Viewers also liked

Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingValeria de Paiva
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseValeria de Paiva
 
Contexts 4 quantification (CommonSense2013)
Contexts 4 quantification (CommonSense2013)Contexts 4 quantification (CommonSense2013)
Contexts 4 quantification (CommonSense2013)Valeria de Paiva
 
Glue Semantics for Proof Theorists
Glue Semantics for Proof TheoristsGlue Semantics for Proof Theorists
Glue Semantics for Proof TheoristsValeria de Paiva
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 
To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...Valeria de Paiva
 
Category Theory for All (NASSLLI 2012)
Category Theory for All (NASSLLI 2012)Category Theory for All (NASSLLI 2012)
Category Theory for All (NASSLLI 2012)Valeria de Paiva
 
Intuitionistic Modal Logic: fifteen years later
Intuitionistic Modal Logic: fifteen years laterIntuitionistic Modal Logic: fifteen years later
Intuitionistic Modal Logic: fifteen years laterValeria de Paiva
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Smart Cities, Smart Citizens and Smart Decisions
Smart Cities, Smart Citizens and Smart DecisionsSmart Cities, Smart Citizens and Smart Decisions
Smart Cities, Smart Citizens and Smart DecisionsMartha Russell
 
Garbage Collection en el JVM
Garbage Collection en el JVMGarbage Collection en el JVM
Garbage Collection en el JVMsuperserch
 

Viewers also liked (13)

Logics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese UnderstandingLogics and Ontologies for Portuguese Understanding
Logics and Ontologies for Portuguese Understanding
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for Portuguese
 
Contexts 4 quantification (CommonSense2013)
Contexts 4 quantification (CommonSense2013)Contexts 4 quantification (CommonSense2013)
Contexts 4 quantification (CommonSense2013)
 
Glue Semantics for Proof Theorists
Glue Semantics for Proof TheoristsGlue Semantics for Proof Theorists
Glue Semantics for Proof Theorists
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...To Infinite Possibilities and Beyond...
To Infinite Possibilities and Beyond...
 
Category Theory for All (NASSLLI 2012)
Category Theory for All (NASSLLI 2012)Category Theory for All (NASSLLI 2012)
Category Theory for All (NASSLLI 2012)
 
Intuitionistic Modal Logic: fifteen years later
Intuitionistic Modal Logic: fifteen years laterIntuitionistic Modal Logic: fifteen years later
Intuitionistic Modal Logic: fifteen years later
 
Modal Type Theory
Modal Type TheoryModal Type Theory
Modal Type Theory
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Smart Cities, Smart Citizens and Smart Decisions
Smart Cities, Smart Citizens and Smart DecisionsSmart Cities, Smart Citizens and Smart Decisions
Smart Cities, Smart Citizens and Smart Decisions
 
Garbage Collection en el JVM
Garbage Collection en el JVMGarbage Collection en el JVM
Garbage Collection en el JVM
 

Similar to Lexical Resources for Portuguese

Embedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PTEmbedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PTValeria de Paiva
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNetsAlexandre Rademaker
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 
Experiments on the construction and enrichment of a Portuguese wordnet
Experiments on the construction and enrichment of a Portuguese wordnetExperiments on the construction and enrichment of a Portuguese wordnet
Experiments on the construction and enrichment of a Portuguese wordnetCarlos Valcarcel Riveiro
 
Ontologies and Semantics for Portuguese
Ontologies and Semantics for PortugueseOntologies and Semantics for Portuguese
Ontologies and Semantics for PortugueseValeria de Paiva
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
FrameNet development for Latvian
FrameNet development for LatvianFrameNet development for Latvian
FrameNet development for LatvianNormunds Grūzītis
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTAlexandre Rademaker
 
transLectures fact sheet
transLectures fact sheettransLectures fact sheet
transLectures fact sheettransLectures
 
Ingles tecnico i para informática 2013 en oficio
Ingles tecnico i para informática 2013  en oficioIngles tecnico i para informática 2013  en oficio
Ingles tecnico i para informática 2013 en oficioParalafakyou Mens
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceGerard de Melo
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
An Extensible Multilingual Open Source Lemmatizer
An Extensible Multilingual Open Source LemmatizerAn Extensible Multilingual Open Source Lemmatizer
An Extensible Multilingual Open Source LemmatizerCOMRADES project
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 

Similar to Lexical Resources for Portuguese (20)

Embedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PTEmbedding Nomlex-BR into OpenWN-PT
Embedding Nomlex-BR into OpenWN-PT
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNets
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
OWN-PT: Taking Stock
OWN-PT: Taking Stock OWN-PT: Taking Stock
OWN-PT: Taking Stock
 
Experiments on the construction and enrichment of a Portuguese wordnet
Experiments on the construction and enrichment of a Portuguese wordnetExperiments on the construction and enrichment of a Portuguese wordnet
Experiments on the construction and enrichment of a Portuguese wordnet
 
W17 5406
W17 5406W17 5406
W17 5406
 
Ontologies and Semantics for Portuguese
Ontologies and Semantics for PortugueseOntologies and Semantics for Portuguese
Ontologies and Semantics for Portuguese
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
FrameNet development for Latvian
FrameNet development for LatvianFrameNet development for Latvian
FrameNet development for Latvian
 
The dutch aat
The dutch aatThe dutch aat
The dutch aat
 
Cyflwyniad Bloc
Cyflwyniad BlocCyflwyniad Bloc
Cyflwyniad Bloc
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PT
 
transLectures fact sheet
transLectures fact sheettransLectures fact sheet
transLectures fact sheet
 
FinalReport
FinalReportFinalReport
FinalReport
 
Ingles tecnico i para informática 2013 en oficio
Ingles tecnico i para informática 2013  en oficioIngles tecnico i para informática 2013  en oficio
Ingles tecnico i para informática 2013 en oficio
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
An Extensible Multilingual Open Source Lemmatizer
An Extensible Multilingual Open Source LemmatizerAn Extensible Multilingual Open Source Lemmatizer
An Extensible Multilingual Open Source Lemmatizer
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 

More from Valeria de Paiva

Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical ConstructionsValeria de Paiva
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021Valeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories RevisitedValeria de Paiva
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceValeria de Paiva
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its roleValeria de Paiva
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoValeria de Paiva
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesValeria de Paiva
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseValeria de Paiva
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical SystemValeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTValeria de Paiva
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit SubstitutionsValeria de Paiva
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogValeria de Paiva
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Valeria de Paiva
 

More from Valeria de Paiva (20)

Dialectica Comonoids
Dialectica ComonoidsDialectica Comonoids
Dialectica Comonoids
 
Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical Constructions
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories Revisited
 
PLN para Tod@s
PLN para Tod@sPLN para Tod@s
PLN para Tod@s
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its role
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Dialectica Petri Nets
Dialectica Petri NetsDialectica Petri Nets
Dialectica Petri Nets
 
The importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in PortugueseThe importance of Being Erneast: Open datasets in Portuguese
The importance of Being Erneast: Open datasets in Portuguese
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical System
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Semantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACTSemantics and Reasoning for NLP, AI and ACT
Semantics and Reasoning for NLP, AI and ACT
 
NLCS 2013 opening slides
NLCS 2013 opening slidesNLCS 2013 opening slides
NLCS 2013 opening slides
 
Dialectica Comonads
Dialectica ComonadsDialectica Comonads
Dialectica Comonads
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit Substitutions
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
 

Recently uploaded

4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
physiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxphysiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxAneriPatwari
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressMaria Paula Aroca
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 

Recently uploaded (20)

4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
physiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptxphysiotherapy in Acne condition.....pptx
physiotherapy in Acne condition.....pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian Congress
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 

Lexical Resources for Portuguese

  • 1. + Lexical Resources for Portuguese Valeria de Paiva (joint work with Alexandre Rademaker, Gerard de Melo and Livy Real)
  • 4. + WordNet… n  WordNet created at Princeton University under George A. Miller, since 1985. A lexical database for English: groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these. n  This produces a combination of dictionary and thesaurus that is intuitive, usable, and supports automatic text analysis and artificial intelligence applications. Released under a BSD style license, can be downloaded and used freely. n  WordNet is the most commonly used computational lexicon of English. n  Some complaints that WordNet encodes sense distinctions that are too fine-grained even for humans. The granularity issue has been tackled by proposing clustering methods that automatically group together similar senses of the same word.
  • 5. + Global WordNet Association n  Christiane Fellbaum and Piek Vossen (EuroWordNet 1996-1999, GWA since) n  The Global WordNet Association (GWA) is a free, public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. n  Global WordNet Grid since 2006. Open Multilingual Wordnet http://casta-net.jp/~kuribayashi/multi/ Francis Bond n  A simple user interface: Welcome to the Open Multi-lingual Wordnet (1.0) http://casta-net.jp/~kuribayashi/cgi-bin/ wn-multi.cgi
  • 7. + OpenWordnet-PT? (aren’t all wordnets open?) Previous work: WordNet.PT and WordNet.PT global (Lisboa), MultiWordNet.PT and Brazilian WordNet by Bento Dias. We need a Portuguese Wordnet for our work, but none of the previous projects is openly available.
  • 8. + Previous Portuguese WordNets… n  WordNet.PT (P. Marrafa) since 1999, part of EuroWordNet, 19K expressions, manually curated, online consulting only. Some domains n  MWN.PT - MultiWordnet of Portuguese (A. Branco), since 2008, part of MWN, over 17,200 manually validated concepts/synsets, not free n  WN.Br (B. Dias da Silva) since 2000, not open, not available online, REBECA 2010 only ‘wheeled vehicles’….
  • 9. + OpenWN-PT: What? n  Leverage EuroWordNet, MultiWordNet, Global WordNet experience n  Recruited Gerard de Melo for project n  Leverage YAGO, UWN/Menta experience… n  UWN/MENTA (de Melo/Weikum) A large-scale multilingual lexical knowledge base built using statistical methods, transforming WordNet into a massively multilingual resource (over 1 million words and several million named entities in a single large multilingual taxonomy) n  Portuguese `projection’ of UWN/Menta is the basis of automated version of a OpenWordNet-PT, publicly available. https://github.com/arademaker/wordnet-br
  • 10. + OpenWN-PT: the basis… Combined the following data: Princeton WordNet 3.0 used to obtain English glosses and English terms for synset IDs. The unreleased 2010-12 version UWN and MENTA provided candidate terms in Portuguese, candidate glosses in Portuguese (from Wikipedia), and candidate terms in Spanish. The EuroWordNet base concept list (5000_bc.xml) provides the base concept numbers. The original file was mapped from WordNet 2.0 to 3.0 using the mappings from WN-Map. When multiple mappings for a WordNet 2.0 synset existed, all possible WordNet 3.0 synsets were kept. http://nlp.lsi.upc.edu/web/index.php?option=com_content&task=view&id=21&Itemid=57 https://github.com/arademaker/wordnet-br
  • 11. + OpenWN-PT: the method n  a two-tiered methodology: high precision for the more frequent words of the language, but also high to cover a wide range of words in the long tail n  Translation dictionaries to map the English members of a synset to possible Portuguese translation candidates. To disambiguate and choose the correct translations, feature vectors for possible translations are created by computing graph-based statistics in the graph of words, translations, and synsets. Monolingual wordnets and parallel corpora used to enrich this graph. Statistical learning techniques used to iteratively refine this information and build an output graph connecting Portuguese words to synsets. n  Wikipedia pages are then linked to relevant WordNet synsets by learning from similar graphbased features as well as gloss similarity scores.
  • 12. + More method… n  To have high precision for the most important concepts of a language, rely on human annotators. n  Set of 4689 “Common Base Concepts” GWA n  2,498 manually entered sense-word pairs as well as an additional 1,299 manually written Portuguese synset glosses. n  Does it work?
  • 13. + OpenWN-PT: some numbers… Column (3) synsets with portuguese words
  • 14. + OpenWN-PT: some numbers… But how good are these entries? How to measure? How to improve?
  • 15. + OpenWN-PT: what does it look like? n  Typical good entry with minor manual improvements. n  Automatic produces candidate Portuguese words for each of some of WN3.0 synsets. n  Check suggested words and add Portuguese gloss and examples.
  • 16. + OpenWN-PT: what does it look like? Not very useful Good automatically suggestion
  • 17. + OpenWN-PT: some issues… Capitalized items, plurals, duplicates, a few gender issues, missing items…
  • 18. + OpenWN-PT: true lexical gaps?...
  • 19. + OpenWN-PT: manual revisions Native speakers, but not linguists… Plenty of errors…
  • 20. + OpenWN-PT: RDF Representation n  Why? To address the issue of interoperability between wordnets. To be able to rely on Linked Data and Semantic Web standards such as RDF and OWL. n  The emergence of Linked Data projects for lexical and reasoning resources make OpenWN-PT encoded and distributed in RDF/OWL. n  Standards allow both the data model and the actual data in the same format. Plus range of existing data processing tools, including databases (“triple stores”) with SQL-like query interfaces (SPARQL). n  Standard W3C encoding of WordNet in RDF since 2006. OpenWN-PT is is modelled after and fully interoperable with Princeton WordNet. n  This means that one can easily find Portuguese equivalents for specific English word senses and vice versa. Also means OpenWN-PT is part of a large ecosystem of compatible resources, including domain identifiers and mappings to Wikipedia.
  • 21. + Progress Report n  Checking is much easier than starting from scratch.. n  But long and tedious work to check even the initial 5k synsets suggested by GWA (not done, yet!), let alone all synsets in OpenWN-PT n  Necessary? YES! Lexical gaps of all sorts n  But resource is being used, warts and all… n  Improving the resource: new data from Bond/Foster and some manual additions
  • 22. + Use Cases: FreeLing n  Word Sense Disambiguation via FreeLing 3.0 An Open Source Suite of Language Analyzers n  OpenWN-PT has been incorporated into FreeLing (Padro’ and Stanilovsky, 2012) n  Using Freeling’s word sense disambiguation framework, a given Portuguese text can automatically be annotated with word senses. n  UPC, Barcelona
  • 23. + Use Cases: Sentiment Analysis n  Sentiment Analysis, using tweets about soccer games n  OpenWN-PT and SentiWordNet to compare the MachineLearning-based sentiment analysis integrated into IBM InfoSphere Streams (ISS) platform. n  1 million tweets, 4 friendly matches Brazilian team in 2013, 7 classes of positivity n  IBM Research, BR
  • 24. + Use Cases: NomLex-Br (Livy Real) n  extension of OpenWN-PT aims at incorporating links to connect deverbal nouns with their corresponding verbs. n  For English, NOMLEX (Macleod et al., 1998) has provided extensive descriptions of nominal- izations via extensions of initial core. n  NOMLEX was constructed starting out with nominalizations with the suffixes -ion, -ment and -er, taking samples of the most frequent words first in a list of nouns from a combination of the Brown Corpus and the Wall Street Journal (about 1 million words of each). n  NOMLEX-BR Translation of initial core, plus French Nomage n  Overall, we have created over 2,000 entries. These have been integrated into OpenWN-PT, will facilitate their use for linguistic research as well as information extraction The destruction of the city by Alexander in 330BC…
  • 25. + Use Cases: NomLex-Br (Livy Real) n  Incorporating NOMLEX-BR data into OpenWN-PT has shown itself useful in pinpointing some issues with the coherence and richness of OpenWN-PT. n  the word abasement corresponds in NOMLEX to the verb abase, and thus we would like a similar correspondence between the Portuguese noun aviltamento and the verb aviltar (our suggested translations). OpenWN-PT simply has two synsets humilhar, abaixar and humilhar, rebaixar. The more common verb humilhar is repeated, while the uncommon aviltar was left out. n  Other useful kinds of relationships between parts of speech (say the connections between adjectives and adverbs) are likely to also help to improve the accuracy and richness
  • 26. + Miscellaneous Experiments n  Accuracy: choose six relations: hypernymOf, memberHolonymOf, instanceOf, substanceHolonymOf, entails and causes. n  For each of these relations, we randomly chose 30 pairs of synsets and then random words from each synset. We ended up with 180 random sentences for manual verification. n  The linguist marked each sentence as “correct”, “wrong” or “dubious”. Obtained 150 sentences correct (83% of the sentences), 17 marked as wrong, 13 marked as dubious. n  Need more systematic effort. But results were encouraging n  Coverage: Using DHBB to complete NOMLEX-BR. n  Other paper…
  • 27. + Conclusions n  We discussed the implementation and some applications of OpenWordNet-PT, an open WordNet for Brazilian Portuguese. n  Recent improvements include better coverage and nominalization links connecting nouns and verbs. n  The resource has been used in developing a high-throughput commercial system as well as in a cultural heritage project, and we anticipate that numerous further applications will follow. n  The data is freely available from http:// github.com/ arademaker/wordnet-br/ and a SPARQL Endpoint at logics.emap.fgv.br:10035. n  Browsing via Open Multilingual Wordnet // www.casta-net.jp/ ~kuribayashi/ cgi-bin/wnmulti.cgi is fun
  • 28. + OpenWN-PT: next steps?.. n  First finish translating the “core” synsets in the Princeton WordNet to Portuguese. n  Increase number of relations in OpenWN-PT as a way of improving adequacy and coherence. n  Adding the Portuguese terms that satisfy different relations? n  OpenVerbNet-PT? n  Since we have a first target corpus, the Brazilian Historical Biographic Dictionary, we can also calculate word frequency to prioritize expansion of the OpenWN-PT and go back to the ontology building...
  • 30. + References Revisiting a Brazilian Wordnet. Valeria de Paiva, Alexandre Rademaker,  (2012) Proceedings of Global Wordnet Conference, Global Wordnet Association, Matsue. OpenWordNet-PT: An Open Brazilian WordNet For Reasoning. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. In Proceedings of the 24th International Conference On Computational Linguistics. http://hdl.handle.net/10438/10274. OpenWordNet-PT: A Project Report. Alexandre Rademaker, Valeria de Paiva, Gerard de Melo, Livy Real and Maira Gatti. Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia. Global Wordnet Association, 2014. Embedding NomLex-BR Nominalizations Into OpenWordnet-PT. Coelho, Livy Maria Real, Alexandre Rademaker, Valeria De Paiva, and Gerard de Melo. 2014. In Proceedings of the 7th Global WordNet Conference. Tartu, Estonia
  • 31. + Other stuff to add in?… n  Onto.PT, ES wordnet? n  Editing interfaces? n  BabelNet? n  NER issues? n  Temporal issues? n  Work with Claudia Freitas?…Leonel? n  Work on implicatives/factives in Portuguese? n  FOIS workshop
  • 32. + References Towards a Universal Wordnet by Learning from Combined Evidence  Gerard de Melo, Gerhard Weikum (2009) 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China. Bridges from Language to Logic:  Concepts, Contexts and Ontologies Valeria de Paiva (2010) Logical and Semantic Frameworks with Applications, LSFA'10, Natal, Brazil, 2010. `A Basic Logic for Textual inference", AAAI Workshop on Inference for Textual Question Answering, 2005. ``Textual Inference Logic: Take Two", CONTEXT 2007. ``Precision-focused Textual Inference", Workshop on Textual Entailment and Paraphrasing, 2007. PARC's Bridge and Question Answering System Proceedings of Grammar Engineering Across Frameworks, 2007.
  • 33. + Simplifying the PARC’s Bridge Architecture Text Parsing Inference Engines KR Mapping F-structure semantics KR Sources Assertions Query Question Grammar Stanford Parser Term rewriting OpenWN-PT SUMO-PT KR mapping rules Textual Inference logics Idea: Simplify and reproduce components in PORTUGUESE