SlideShare a Scribd company logo
1 of 31
Download to read offline
1/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
The Importance of Being Earnest:
Open Datasets in Portuguese
Valeria de Paiva
OPENCOR 2021
Dec 2021
Valeria de Paiva OpenCor2021
2/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Thanks, Livy!
Valeria de Paiva OpenCor2021
3/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Personal Stories
I’m an AI scientist, a mathematician, a computational semanticist
and a category theorist.
I work in Silicon Valley, have done so for the last 22 years, applying
pure mathematics to computing, in surprising ways.
Valeria de Paiva OpenCor2021
4/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
In the Valley
Valeria de Paiva OpenCor2021
5/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Personal stories: Now
Valeria de Paiva OpenCor2021
6/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
How?
Valeria de Paiva OpenCor2021
7/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Samsung Research America (2019-2020)
Dialogue and Knowledge Representation Lab, SRA, Mountain
View
project: systems to make Bixby (voice personal assistant)
communicate well with home appliances via SmartHome
Samsung acquired Viv in Oct 2016, had acquired SmartThings
in 2014, need to integrate stacks, grow Bixby
SmartThings: leading open platform for the smart home and
the consumer Internet of Things in 2014.
opensource project: develop ontology of smart devices, based on
WikiData and costumer-facing functionality.
Valeria de Paiva OpenCor2021
8/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Nuance Communications (2012-2018)
AI and NLP Lab in Sunnyvale
Nuance had the best voice recognition software system in
2012. needed to add AI to make sounds into knowledge. big
effort in several labs: Montreal, Boston, Sunnyvale.
application areas: health systems, automotive, law, CRM,
banks, insurance, etc
our lab projects: personal assistant for Living Room (TV 2nd
screen), PA for automotive companies
Building small smartness into conventional search, e.g. ‘find
allergy medication near me’
opensource projects: voice interfaces to WikiData? student’s
internships?
Valeria de Paiva OpenCor2021
9/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Rearden Commerce (2011-2012)
AI/KR Lab in Foster City
a white-labelling shop for travel and expenses/procurement
systems. application areas: air travel tickets, hotels, shows&
sports, restaurants, ground transportation, parking, etc
our lab project: a Groupon-like app as RC acquired HomeRun
Using ontologies to discover what hotel reviewers really valued
opensource possible projects: projection of WikiData adapted to
Brazil, for ‘skills sets’, for Brazilian culture, etc
Valeria de Paiva OpenCor2021
10/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Cuil (2008-2010)
Start-up search company in Menlo Park
a Google-competitor created by ex-googlers
all sorts of tasks, from baby-sitting servers to dealing with
costumers
Learning to rank algorithms
PARC Forum talk: Adventures in Searchland
opensource possible project: timelines in Portuguese
Valeria de Paiva OpenCor2021
11/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
PARC, XLE and Bridge
Valeria de Paiva OpenCor2021
12/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
How to think about Conversational Assistants?
Valeria de Paiva OpenCor2021
13/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
How to think about Conversational Assistants?
Valeria de Paiva OpenCor2021
14/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Several Conversational Assistants: applications
(AI Summit 2018, Luxembourg)
Valeria de Paiva OpenCor2021
15/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Several Conversational Assistants
(AI Summit 2018, Luxembourg)
Valeria de Paiva OpenCor2021
16/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Natural Language Inference (NLI)
Shock: work of almost nine years at PARC was out of reach
when I left in 2008
I gave a talk at SRI proposing to redo it all, open source
(Bridges, ENTCS2011)
Pleased to report that almost all of it is now available
open-source, redone from scratch, using new techniques
Katerina Kalouli, ex-PhD student at Konstanz, now assistant
professor in Munich GKR Demo:
https://cis.lmu.de/ kalouli/resources.html
Valeria de Paiva OpenCor2021
17/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
What about Portuguese?
Alexandre Rademaker and I started OpenWordNet-PT in 2012:
There was no opensource WN of Portuguese then
Sources in GitHub
OWN-PT originally obtained from Universal WordNet (Weikum
and de Melo)
RDF distribution from the beginning
openwordnet-pt.org
Valeria de Paiva OpenCor2021
18/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT Data
Valeria de Paiva OpenCor2021
19/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT Examples
Valeria de Paiva OpenCor2021
20/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT Basic Stats
1. OWN-PT is big, around 50K synsets.
2. PWN is much bigger, 117K synsets.
3. we have more than 7K synsets of verbs–definitely not enough,
but one can start to play
4. More than twice as big as Russian WordNet, bigger than
Spanish, only slightly smaller than French
5. and issues, many issues
Valeria de Paiva OpenCor2021
21/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT Papers
1. Papers trying to clean up the database
2. Nominalizations and their issues (Livy Real)
3. Using corpora to extend our vocabulary (Claudia Freitas)
4. Interfaces for progress (Fabricio Chalub)
5. Two papers on verb lexicon
6. Two papers on Historical archives (DHBB)
7. WordNets themselves (Hugo & Alberto)
8. Gentilics, Adverbs
9. Temporal expressions
10. Morpholinks
etc..
Valeria de Paiva OpenCor2021
22/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT
We were doing so well...
GoogleTranslate, Open MultiLingual WordNet, BabelNet, Freeling
used our OWN-PT.
https://translate.google.com/intl/en/about/license/ still says:
But then Transformers arrived! with them a series of new
challenges.
Valeria de Paiva OpenCor2021
23/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Valeria de Paiva OpenCor2021
24/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
OpenWordNet-PT Questions
1. One can try to carry on cleaning the data, using the
lexicographers files (to the left). Is it worth doing it?
2. Should we instead grow, not worrying much about precision?
3. I wish we had glosses in Portuguese. Alberto Simoẽs produced
them for us, but we never implemented/added them to the
database, as the quality of the Portuguese text wasn’t great.
4. This data is open source, anyone who can get it and make it
better. and let us have it. Or not: our license is very broad
5. In a long term project eventually goals diverge and people want
to try other things. The beauty of github is being able to keep the
version you want
In any case a high quality Portuguese WordNet is simply one
lexical resource we need. We need others and we have been
working on those in parallel.
Valeria de Paiva OpenCor2021
25/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Valeria de Paiva OpenCor2021
26/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Beginning UD-PT
1. Corpus Bosque: traditional news corpus (EU-PT and BR-PT),
mangled by several versions and conversions.
2. PALAVRAS (Bick): a rule-based Constraint Grammar CG
system designed for Portuguese. It produces deep linguistic
analyses, with tags at the morphological, syntactic (dependency)
and semantic levels. (not open source)
3. First version of our data, UD 1.4 compliant, included in UD
release 1.4 as UD Portuguese-Bosque. not too bad!
4. Then we ”accepted the challenge”of updating UD-PT-Bosque
to UD 2.0 guidelines and replacing the previous UD Portuguese
corpus. Phew!
Valeria de Paiva OpenCor2021
27/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Issues of UD-PT
1. Gender: underspecified gender. grande (big) or feliz (happy)
2. MWEs: changing from Palavras to UD1.x to UD2.x was
complicated. MWE still are.
3. Participles: verbs or adjectives?
4. ellipses: changes from UD1 to UD2, plus ellipses are difficult
5. Clitics, also all the things that ”se, que”can be.
6. Non-explicit subjects (sujeito oculto and others) see excellent
new work of Freitas, de Souza.
7. Negation (UD changed its mind) and negation is hard
8. Appositives vs. nmod PT had a diff opinion
9. Auxiliary verbs
10. xcomp-that, ccomp-to
See http://medialab.di.unipi.it/depling/assets/docs/
day2/02_demo2.pdf for status in 2017. Now a meeting group.
Valeria de Paiva OpenCor2021
28/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Valeria de Paiva OpenCor2021
29/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
SICK-BR
e.g. https://www.ime.usp.br/~bruna/SICK_PT.pdf
1. A big group from GLIC, USP.
2. An easy to obtain, state-of-the-art automated translation (Milos
Stanojevic)
3. Lots of human work correcting automated translation to get
4. SICK-BR, a Brazilian Portuguese corpus annotated with
inference relations and semantic relatedness between pairs of
sentences
5. SICK-BR is a translation and adaptation of the original SICK, a
corpus of English sentences used in several semantic evaluations
6. SICK-BR around 10k sentence pairs annotated for
neutral/contradiction/entailment relations and for semantic
relatedness
Valeria de Paiva OpenCor2021
30/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
SICK-BR
https://www.ime.usp.br/~bruna/SICK_PT.pdf
1. Basic idea: logic is kind of universal, works the same in different
languages
2. NLI is very important in the new style of Natural Language
Understanding. Hence ASSIN, ASSIN2, SICK-BR.
3. But many difficulties of translation, even for simple sentences as
in SICK
3. Difficult to decide if the difficulties of translation are simply that
4. phenomena described in SICK seems universal enough, but
language is structured differently
5. Much more work to do...
Valeria de Paiva OpenCor2021
31/31
Introduction
Silicon Valley
PARC, XLE, Bridge
Applications
Conclusions
Open source datasets are important
Not only for English!
BenderRule: Say the language you’re dealing with, always.
Also document your datasets properly!
A video worth watching ”Data Statements for Natural Language
Processing: Toward Mitigating System Bias and Enabling Better
Science”https://vimeo.com/359686057 only 19 min
But indeed we have our work cutout for us! Thanks!
Valeria de Paiva OpenCor2021

More Related Content

What's hot

Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseValeria de Paiva
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 
If I Had a Hammer...
If I Had a Hammer...If I Had a Hammer...
If I Had a Hammer...Kevlin Henney
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Standardising on C++
Standardising on C++Standardising on C++
Standardising on C++Kevlin Henney
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Valeria de Paiva
 
Dialectica Categories: the Relevant version, Valeria de Paiva
Dialectica Categories: the Relevant version, Valeria de PaivaDialectica Categories: the Relevant version, Valeria de Paiva
Dialectica Categories: the Relevant version, Valeria de PaivaValeria de Paiva
 

What's hot (9)

Seeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for PortugueseSeeing is Correcting:Linked Open Data for Portuguese
Seeing is Correcting:Linked Open Data for Portuguese
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 
Presentación en IDEAL 2008
Presentación en IDEAL 2008Presentación en IDEAL 2008
Presentación en IDEAL 2008
 
If I Had a Hammer...
If I Had a Hammer...If I Had a Hammer...
If I Had a Hammer...
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
Standardising on C++
Standardising on C++Standardising on C++
Standardising on C++
 
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
Intuitive Semantics for Full Intuitionistic Linear Logic (2014)
 
Dialectica Categories: the Relevant version, Valeria de Paiva
Dialectica Categories: the Relevant version, Valeria de PaivaDialectica Categories: the Relevant version, Valeria de Paiva
Dialectica Categories: the Relevant version, Valeria de Paiva
 

Similar to The importance of Being Erneast: Open datasets in Portuguese

FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Mobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech TrendsMobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech TrendsMobile Monday Yangon
 
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)Open Knowledge Belgium
 
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USP
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USPTelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USP
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USPMarcio
 
Smau Milano 2016 - Michele Finelli
Smau Milano 2016 - Michele FinelliSmau Milano 2016 - Michele Finelli
Smau Milano 2016 - Michele FinelliSMAU
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011David F. Flanders
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021Shane Coughlan
 
LOR Characteristics and Considerations
LOR Characteristics and ConsiderationsLOR Characteristics and Considerations
LOR Characteristics and ConsiderationsScott Leslie
 
ASPECT WP2 Lisbon 2010
ASPECT WP2 Lisbon 2010ASPECT WP2 Lisbon 2010
ASPECT WP2 Lisbon 2010Joris Klerkx
 
DevOps: buzzword o potenzialità?
DevOps: buzzword o potenzialità?DevOps: buzzword o potenzialità?
DevOps: buzzword o potenzialità?festival ICT 2016
 
PuppetConf track overview: Modern Infrastructure
PuppetConf track overview: Modern InfrastructurePuppetConf track overview: Modern Infrastructure
PuppetConf track overview: Modern InfrastructurePuppet
 
Virtual Reality: History & State of the Art
Virtual Reality: History & State of the Art Virtual Reality: History & State of the Art
Virtual Reality: History & State of the Art Robin de Lange
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceValeria de Paiva
 
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioCulturaItalia
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021Shane Coughlan
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731jeffreylancaster
 
Working in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsWorking in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsZachary S. Brown
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Asa Letourneau
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Julien PLU
 

Similar to The importance of Being Erneast: Open datasets in Portuguese (20)

FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Mobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech TrendsMobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech Trends
 
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
 
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USP
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USPTelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USP
TelaSocial Presentation and Lessons Learned with the Pilot Case at ICMC-USP
 
Smau Milano 2016 - Michele Finelli
Smau Milano 2016 - Michele FinelliSmau Milano 2016 - Michele Finelli
Smau Milano 2016 - Michele Finelli
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021
 
LOR Characteristics and Considerations
LOR Characteristics and ConsiderationsLOR Characteristics and Considerations
LOR Characteristics and Considerations
 
ASPECT WP2 Lisbon 2010
ASPECT WP2 Lisbon 2010ASPECT WP2 Lisbon 2010
ASPECT WP2 Lisbon 2010
 
DevOps: buzzword o potenzialità?
DevOps: buzzword o potenzialità?DevOps: buzzword o potenzialità?
DevOps: buzzword o potenzialità?
 
PuppetConf track overview: Modern Infrastructure
PuppetConf track overview: Modern InfrastructurePuppetConf track overview: Modern Infrastructure
PuppetConf track overview: Modern Infrastructure
 
Virtual Reality: History & State of the Art
Virtual Reality: History & State of the Art Virtual Reality: History & State of the Art
Virtual Reality: History & State of the Art
 
Networked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better ScienceNetworked Mathematics: NLP tools for Better Science
Networked Mathematics: NLP tools for Better Science
 
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021
 
4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
Working in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language ModelsWorking in NLP in the Age of Large Language Models
Working in NLP in the Age of Large Language Models
 
Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104Lodlam presentation v1.0 final al20151104
Lodlam presentation v1.0 final al20151104
 
Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?Can Deep Learning Techniques Improve Entity Linking?
Can Deep Learning Techniques Improve Entity Linking?
 

More from Valeria de Paiva

Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical ConstructionsValeria de Paiva
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021Valeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories RevisitedValeria de Paiva
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its roleValeria de Paiva
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoValeria de Paiva
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesValeria de Paiva
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical SystemValeria de Paiva
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear LogicsValeria de Paiva
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit SubstitutionsValeria de Paiva
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogValeria de Paiva
 
Dialectica and Kolmogorov Problems
Dialectica and Kolmogorov ProblemsDialectica and Kolmogorov Problems
Dialectica and Kolmogorov ProblemsValeria de Paiva
 
Gender Gap in Computing 2014
Gender Gap in Computing 2014Gender Gap in Computing 2014
Gender Gap in Computing 2014Valeria de Paiva
 
Categorical Proof Theory for Everyone
Categorical Proof Theory for EveryoneCategorical Proof Theory for Everyone
Categorical Proof Theory for EveryoneValeria de Paiva
 

More from Valeria de Paiva (20)

Dialectica Comonoids
Dialectica ComonoidsDialectica Comonoids
Dialectica Comonoids
 
Dialectica Categorical Constructions
Dialectica Categorical ConstructionsDialectica Categorical Constructions
Dialectica Categorical Constructions
 
Logic & Representation 2021
Logic & Representation 2021Logic & Representation 2021
Logic & Representation 2021
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
Dialectica Categories Revisited
Dialectica Categories RevisitedDialectica Categories Revisited
Dialectica Categories Revisited
 
PLN para Tod@s
PLN para Tod@sPLN para Tod@s
PLN para Tod@s
 
Going Without: a modality and its role
Going Without: a modality and its roleGoing Without: a modality and its role
Going Without: a modality and its role
 
Problemas de Kolmogorov-Veloso
Problemas de Kolmogorov-VelosoProblemas de Kolmogorov-Veloso
Problemas de Kolmogorov-Veloso
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Dialectica Petri Nets
Dialectica Petri NetsDialectica Petri Nets
Dialectica Petri Nets
 
Negation in the Ecumenical System
Negation in the Ecumenical SystemNegation in the Ecumenical System
Negation in the Ecumenical System
 
Constructive Modal and Linear Logics
Constructive Modal and Linear LogicsConstructive Modal and Linear Logics
Constructive Modal and Linear Logics
 
NLCS 2013 opening slides
NLCS 2013 opening slidesNLCS 2013 opening slides
NLCS 2013 opening slides
 
Dialectica Comonads
Dialectica ComonadsDialectica Comonads
Dialectica Comonads
 
Categorical Explicit Substitutions
Categorical Explicit SubstitutionsCategorical Explicit Substitutions
Categorical Explicit Substitutions
 
Logic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for DialogLogic and Probabilistic Methods for Dialog
Logic and Probabilistic Methods for Dialog
 
Dialectica and Kolmogorov Problems
Dialectica and Kolmogorov ProblemsDialectica and Kolmogorov Problems
Dialectica and Kolmogorov Problems
 
Constructive Modalities
Constructive ModalitiesConstructive Modalities
Constructive Modalities
 
Gender Gap in Computing 2014
Gender Gap in Computing 2014Gender Gap in Computing 2014
Gender Gap in Computing 2014
 
Categorical Proof Theory for Everyone
Categorical Proof Theory for EveryoneCategorical Proof Theory for Everyone
Categorical Proof Theory for Everyone
 

Recently uploaded

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Recently uploaded (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

The importance of Being Erneast: Open datasets in Portuguese

  • 1. 1/31 Introduction Silicon Valley PARC, XLE, Bridge Applications The Importance of Being Earnest: Open Datasets in Portuguese Valeria de Paiva OPENCOR 2021 Dec 2021 Valeria de Paiva OpenCor2021
  • 2. 2/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Thanks, Livy! Valeria de Paiva OpenCor2021
  • 3. 3/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Personal Stories I’m an AI scientist, a mathematician, a computational semanticist and a category theorist. I work in Silicon Valley, have done so for the last 22 years, applying pure mathematics to computing, in surprising ways. Valeria de Paiva OpenCor2021
  • 4. 4/31 Introduction Silicon Valley PARC, XLE, Bridge Applications In the Valley Valeria de Paiva OpenCor2021
  • 5. 5/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Personal stories: Now Valeria de Paiva OpenCor2021
  • 6. 6/31 Introduction Silicon Valley PARC, XLE, Bridge Applications How? Valeria de Paiva OpenCor2021
  • 7. 7/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Samsung Research America (2019-2020) Dialogue and Knowledge Representation Lab, SRA, Mountain View project: systems to make Bixby (voice personal assistant) communicate well with home appliances via SmartHome Samsung acquired Viv in Oct 2016, had acquired SmartThings in 2014, need to integrate stacks, grow Bixby SmartThings: leading open platform for the smart home and the consumer Internet of Things in 2014. opensource project: develop ontology of smart devices, based on WikiData and costumer-facing functionality. Valeria de Paiva OpenCor2021
  • 8. 8/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Nuance Communications (2012-2018) AI and NLP Lab in Sunnyvale Nuance had the best voice recognition software system in 2012. needed to add AI to make sounds into knowledge. big effort in several labs: Montreal, Boston, Sunnyvale. application areas: health systems, automotive, law, CRM, banks, insurance, etc our lab projects: personal assistant for Living Room (TV 2nd screen), PA for automotive companies Building small smartness into conventional search, e.g. ‘find allergy medication near me’ opensource projects: voice interfaces to WikiData? student’s internships? Valeria de Paiva OpenCor2021
  • 9. 9/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Rearden Commerce (2011-2012) AI/KR Lab in Foster City a white-labelling shop for travel and expenses/procurement systems. application areas: air travel tickets, hotels, shows& sports, restaurants, ground transportation, parking, etc our lab project: a Groupon-like app as RC acquired HomeRun Using ontologies to discover what hotel reviewers really valued opensource possible projects: projection of WikiData adapted to Brazil, for ‘skills sets’, for Brazilian culture, etc Valeria de Paiva OpenCor2021
  • 10. 10/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Cuil (2008-2010) Start-up search company in Menlo Park a Google-competitor created by ex-googlers all sorts of tasks, from baby-sitting servers to dealing with costumers Learning to rank algorithms PARC Forum talk: Adventures in Searchland opensource possible project: timelines in Portuguese Valeria de Paiva OpenCor2021
  • 11. 11/31 Introduction Silicon Valley PARC, XLE, Bridge Applications PARC, XLE and Bridge Valeria de Paiva OpenCor2021
  • 12. 12/31 Introduction Silicon Valley PARC, XLE, Bridge Applications How to think about Conversational Assistants? Valeria de Paiva OpenCor2021
  • 13. 13/31 Introduction Silicon Valley PARC, XLE, Bridge Applications How to think about Conversational Assistants? Valeria de Paiva OpenCor2021
  • 14. 14/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Several Conversational Assistants: applications (AI Summit 2018, Luxembourg) Valeria de Paiva OpenCor2021
  • 15. 15/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Several Conversational Assistants (AI Summit 2018, Luxembourg) Valeria de Paiva OpenCor2021
  • 16. 16/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Natural Language Inference (NLI) Shock: work of almost nine years at PARC was out of reach when I left in 2008 I gave a talk at SRI proposing to redo it all, open source (Bridges, ENTCS2011) Pleased to report that almost all of it is now available open-source, redone from scratch, using new techniques Katerina Kalouli, ex-PhD student at Konstanz, now assistant professor in Munich GKR Demo: https://cis.lmu.de/ kalouli/resources.html Valeria de Paiva OpenCor2021
  • 17. 17/31 Introduction Silicon Valley PARC, XLE, Bridge Applications What about Portuguese? Alexandre Rademaker and I started OpenWordNet-PT in 2012: There was no opensource WN of Portuguese then Sources in GitHub OWN-PT originally obtained from Universal WordNet (Weikum and de Melo) RDF distribution from the beginning openwordnet-pt.org Valeria de Paiva OpenCor2021
  • 18. 18/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT Data Valeria de Paiva OpenCor2021
  • 19. 19/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT Examples Valeria de Paiva OpenCor2021
  • 20. 20/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT Basic Stats 1. OWN-PT is big, around 50K synsets. 2. PWN is much bigger, 117K synsets. 3. we have more than 7K synsets of verbs–definitely not enough, but one can start to play 4. More than twice as big as Russian WordNet, bigger than Spanish, only slightly smaller than French 5. and issues, many issues Valeria de Paiva OpenCor2021
  • 21. 21/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT Papers 1. Papers trying to clean up the database 2. Nominalizations and their issues (Livy Real) 3. Using corpora to extend our vocabulary (Claudia Freitas) 4. Interfaces for progress (Fabricio Chalub) 5. Two papers on verb lexicon 6. Two papers on Historical archives (DHBB) 7. WordNets themselves (Hugo & Alberto) 8. Gentilics, Adverbs 9. Temporal expressions 10. Morpholinks etc.. Valeria de Paiva OpenCor2021
  • 22. 22/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT We were doing so well... GoogleTranslate, Open MultiLingual WordNet, BabelNet, Freeling used our OWN-PT. https://translate.google.com/intl/en/about/license/ still says: But then Transformers arrived! with them a series of new challenges. Valeria de Paiva OpenCor2021
  • 23. 23/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Valeria de Paiva OpenCor2021
  • 24. 24/31 Introduction Silicon Valley PARC, XLE, Bridge Applications OpenWordNet-PT Questions 1. One can try to carry on cleaning the data, using the lexicographers files (to the left). Is it worth doing it? 2. Should we instead grow, not worrying much about precision? 3. I wish we had glosses in Portuguese. Alberto Simoẽs produced them for us, but we never implemented/added them to the database, as the quality of the Portuguese text wasn’t great. 4. This data is open source, anyone who can get it and make it better. and let us have it. Or not: our license is very broad 5. In a long term project eventually goals diverge and people want to try other things. The beauty of github is being able to keep the version you want In any case a high quality Portuguese WordNet is simply one lexical resource we need. We need others and we have been working on those in parallel. Valeria de Paiva OpenCor2021
  • 25. 25/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Valeria de Paiva OpenCor2021
  • 26. 26/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Beginning UD-PT 1. Corpus Bosque: traditional news corpus (EU-PT and BR-PT), mangled by several versions and conversions. 2. PALAVRAS (Bick): a rule-based Constraint Grammar CG system designed for Portuguese. It produces deep linguistic analyses, with tags at the morphological, syntactic (dependency) and semantic levels. (not open source) 3. First version of our data, UD 1.4 compliant, included in UD release 1.4 as UD Portuguese-Bosque. not too bad! 4. Then we ”accepted the challenge”of updating UD-PT-Bosque to UD 2.0 guidelines and replacing the previous UD Portuguese corpus. Phew! Valeria de Paiva OpenCor2021
  • 27. 27/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Issues of UD-PT 1. Gender: underspecified gender. grande (big) or feliz (happy) 2. MWEs: changing from Palavras to UD1.x to UD2.x was complicated. MWE still are. 3. Participles: verbs or adjectives? 4. ellipses: changes from UD1 to UD2, plus ellipses are difficult 5. Clitics, also all the things that ”se, que”can be. 6. Non-explicit subjects (sujeito oculto and others) see excellent new work of Freitas, de Souza. 7. Negation (UD changed its mind) and negation is hard 8. Appositives vs. nmod PT had a diff opinion 9. Auxiliary verbs 10. xcomp-that, ccomp-to See http://medialab.di.unipi.it/depling/assets/docs/ day2/02_demo2.pdf for status in 2017. Now a meeting group. Valeria de Paiva OpenCor2021
  • 28. 28/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Valeria de Paiva OpenCor2021
  • 29. 29/31 Introduction Silicon Valley PARC, XLE, Bridge Applications SICK-BR e.g. https://www.ime.usp.br/~bruna/SICK_PT.pdf 1. A big group from GLIC, USP. 2. An easy to obtain, state-of-the-art automated translation (Milos Stanojevic) 3. Lots of human work correcting automated translation to get 4. SICK-BR, a Brazilian Portuguese corpus annotated with inference relations and semantic relatedness between pairs of sentences 5. SICK-BR is a translation and adaptation of the original SICK, a corpus of English sentences used in several semantic evaluations 6. SICK-BR around 10k sentence pairs annotated for neutral/contradiction/entailment relations and for semantic relatedness Valeria de Paiva OpenCor2021
  • 30. 30/31 Introduction Silicon Valley PARC, XLE, Bridge Applications SICK-BR https://www.ime.usp.br/~bruna/SICK_PT.pdf 1. Basic idea: logic is kind of universal, works the same in different languages 2. NLI is very important in the new style of Natural Language Understanding. Hence ASSIN, ASSIN2, SICK-BR. 3. But many difficulties of translation, even for simple sentences as in SICK 3. Difficult to decide if the difficulties of translation are simply that 4. phenomena described in SICK seems universal enough, but language is structured differently 5. Much more work to do... Valeria de Paiva OpenCor2021
  • 31. 31/31 Introduction Silicon Valley PARC, XLE, Bridge Applications Conclusions Open source datasets are important Not only for English! BenderRule: Say the language you’re dealing with, always. Also document your datasets properly! A video worth watching ”Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science”https://vimeo.com/359686057 only 19 min But indeed we have our work cutout for us! Thanks! Valeria de Paiva OpenCor2021