+
Ontologies and semantics for
Portuguese
Valeria de Paiva
Visiting Prof DI PUC-RJ &
Topos Institute, Berkeley
Thanks,
Giancarlo!
+
Predicting the future…
http://openwordnet-pt.org
 Alan Kay: The best way to predict the future is to invent
it!
 What do I want: Build knowledge resources e.g. a
Knowledge Graph (KG) for Brazilian Culture
 Why? Portuguese is the 6th most spoken language in the
planet. BUT
 We have no lexical/semantic/KR resources enough to deal
with Understanding Portuguese!
A proxima decada de pesquisa em ontologias: desafios e
oportunidades…
+
A grand plan (2018 Lux AI Summit)
keynote
+
Which Resources?
 OpenWordnet-PT (From 2012)
 Used by Google Translate, BabelNet, OpenWordNet
 Code in https://github.com/own-pt/openWordnet-PT, user interface
http://openwordnet-pt.org
 Sweet spot between coverage and accuracy. Need to make it better.
NomLex-BR only started
 UD (Universal Dependencies) for Portuguese 2017 Bosque
https://github.com/UniversalDependencies/UD_Portuguese-Bosque
 SICK-BR corpus to test inference
 New projects with PUC-Rio
Needed a Portuguese wordnet, but none was
openly available. We built it.
+
Challenges & Opportunities
 ML: Biggest opportunity, if we can incorporate it and make it
explainable and generalizable
 ML: Biggest challenge as many believe ‘fairy dust’ will do
the work
 Traditional trade-off: expressivity of logic vs difficulty of
computation
 Representation and implementation are different. Different
use cases support different computational time limits. (Pease)
+
Thanks!
+
+
The Future of Ontology
In the future people will use one or a small number of large ontologies defined in
expressive logical languages. A dictionary of one page isn't very useful. To define shared
terms for software integration or discourse, a large corpus is needed. A dictionary that
just allows for stating relationships among terms, but not their definitions also isn't very
useful. We'll use languages that are formal and computable and also don't limit us from
the things we'd want to say in a human language definition. Much current work in
ontology uses languages that are limited in the logical expressiveness of what they can
say, in order to conform to a particular use case in computation, with technology available
at the time the language was created. But the power of logical inference systems has
advanced with every passing decade, and the choices made 20 years ago are not so
relevant today. Definition (or representation) and implementation are different. Different
use cases support different computational time limits. Choosing very limited languages
ensures that use cases that provide more time for execution will not have available the
definitions and constraints needed to capture what concepts mean in the world. In the
future of ontology, logical inference will advance considerably from where it is today, and
ontologies written to take advantage of the logical inference systems of the future will be
the ones that get used.
Adam Pease, Nov 2020

Ontologies and Semantics for Portuguese

  • 1.
    + Ontologies and semanticsfor Portuguese Valeria de Paiva Visiting Prof DI PUC-RJ & Topos Institute, Berkeley Thanks, Giancarlo!
  • 2.
    + Predicting the future… http://openwordnet-pt.org Alan Kay: The best way to predict the future is to invent it!  What do I want: Build knowledge resources e.g. a Knowledge Graph (KG) for Brazilian Culture  Why? Portuguese is the 6th most spoken language in the planet. BUT  We have no lexical/semantic/KR resources enough to deal with Understanding Portuguese! A proxima decada de pesquisa em ontologias: desafios e oportunidades…
  • 3.
    + A grand plan(2018 Lux AI Summit) keynote
  • 4.
    + Which Resources?  OpenWordnet-PT(From 2012)  Used by Google Translate, BabelNet, OpenWordNet  Code in https://github.com/own-pt/openWordnet-PT, user interface http://openwordnet-pt.org  Sweet spot between coverage and accuracy. Need to make it better. NomLex-BR only started  UD (Universal Dependencies) for Portuguese 2017 Bosque https://github.com/UniversalDependencies/UD_Portuguese-Bosque  SICK-BR corpus to test inference  New projects with PUC-Rio Needed a Portuguese wordnet, but none was openly available. We built it.
  • 5.
    + Challenges & Opportunities ML: Biggest opportunity, if we can incorporate it and make it explainable and generalizable  ML: Biggest challenge as many believe ‘fairy dust’ will do the work  Traditional trade-off: expressivity of logic vs difficulty of computation  Representation and implementation are different. Different use cases support different computational time limits. (Pease)
  • 6.
  • 7.
  • 8.
    + The Future ofOntology In the future people will use one or a small number of large ontologies defined in expressive logical languages. A dictionary of one page isn't very useful. To define shared terms for software integration or discourse, a large corpus is needed. A dictionary that just allows for stating relationships among terms, but not their definitions also isn't very useful. We'll use languages that are formal and computable and also don't limit us from the things we'd want to say in a human language definition. Much current work in ontology uses languages that are limited in the logical expressiveness of what they can say, in order to conform to a particular use case in computation, with technology available at the time the language was created. But the power of logical inference systems has advanced with every passing decade, and the choices made 20 years ago are not so relevant today. Definition (or representation) and implementation are different. Different use cases support different computational time limits. Choosing very limited languages ensures that use cases that provide more time for execution will not have available the definitions and constraints needed to capture what concepts mean in the world. In the future of ontology, logical inference will advance considerably from where it is today, and ontologies written to take advantage of the logical inference systems of the future will be the ones that get used. Adam Pease, Nov 2020

Editor's Notes

  • #2  http://en.wikipedia.org/wiki/Paul_Pelliot http://en.wikipedia.org/wiki/Dunhuang_manuscripts
  • #7 Anyone working in the field will know that we are still far away from having models that can perform NLI in a robust, generalizable, and dataset-independent way.