2. LEXICAL RESOURCES FOR PORTUGUESE?
• 6th mostly spoken language in the world (Ethnologue) or 7th (Wikipedia)
• Very few open source resources for PT, almost no connections between them
• Discuss:
• Initial developments
• NOMLEX-PT
• Applications
• Next steps
3. FOLK WISDOM
Linguistic resources are very easy to start working on, very
hard to improve on and extremely difficult to maintain, as
funding usually only works for new resources.
• Trying to buckle the trend
• Review of work in the last 8 years…
4. OPENWORDNET-PT
• Wordnet is the most paradigmatic resource for English NLP
• Want a Portuguese Wordnet that is open access, downloadable and updateable, so that
it can be improved by the community
• Especially interested in NLP for KR and automated deduction (our team)
• But also word sense disambiguation, information retrieval, automatic text classification,
automatic text summarization, question answering, etc….
5. A BIT OF HISTORY
• Initially a transformation and extension of data from the UniversalWordnet/MENTA
(UWN/MENTA)
• machine learning to construct relationships between graphs from Wikipedia in several
languages Plus machine readable dictionaries
• continuously improved through linguistically motivated additions and removals, either manual
or semi-automatic, making use of large Portuguese corpora (DHBB, Bosque, …)
• two-tiered methodology: high precision for popular words, high recall for long tail
• Could be used for other languages: best for languages well represented on the internet and
with reasonably large Wikipedia.
6. NOMLEX-PT
• useful for linguistic research as well as for information extraction, basic example
destruction/destroy
• an extension of OWN-PT, with links connecting deverbal nouns with their
corresponding verbs.
• Bootstrapped manually, created c 2,000 entries via translation of the English NOMLEX
• Useful to check issues with the coherence and richness of OpenWN-PT, e.g.
aviltar/aviltamento
•
7. SOCIAL INTERFACE
• new social and collaborative interface implemented and deployed in 2016,“Seeing is
Correcting”
• OWN-PT part of Open Multilingual Wordnet, and Global WordNet Foundation
• Simple interfaceè content perspicuous
• Many experiments, described in the website including
• Verb lexicon improvements, gentilics, morpholinks (many not finished, yet)
• OWN-PT part of FreeLing, Google Translate, BabelNet, Onto.PT
8. APPLICATIONS
• Freeling
• Tweets for football
• DHBB, recently open-sourced. Biographical data is very interesting, but requires good NER
• Comparison of wordnet-like resources for PT
• Lexical resources do not thrive in a vacuum, they need other resources to interact.
• Universal Dependencies,
• Linked open Data, how to exploit?
9. CONCLUSION
• Despite:
• Very distributed team
• Different timelines and expectations
• No official project for all
• Quite a lot achieved:
• Use in main international projects like BabelNet, Google Translate, Freeling,…
• 18 main publications, at least as many in different stages of preparedness
• New development plan 2018-2020 from Oxford
10. SOME REFERENCES
• Valeria de Paiva,Alexandre Rademaker, and Gerard de Melo. OpenWordNet-PT:An open
Brazilian Wordnet for reasoning. In Proceedings of COLING 2012, Mumbai, India,
• Fabricio Chalub, Livy Real,Alexandre Rademaker, andValeria de Paiva. Semantic links for
Portuguese. In 10th Edition of (LREC), Portoroz, Slovenia, May 2016.
• Valeria de Paiva, Livy Real,Alexandre Rademaker, and Gerard de Melo. Nomlex-pt:A lexicon of
Portuguese nominalizations. LREC 2014 Reykjavik, Iceland, May 2014.
• Pedro Delfino, Bruno Cuconato, Guilherme Paulino Passos, Gerson Zaverucha, and Alexandre
Rademaker. Using openwordnet-pt for question answering on legal domain. In Global Wordnet
Conference 2018, Singapore, January 2018
11. MORE REFERENCES
• Lluis Padro and Evgeny Stanilovsky. Freeling 3.0: Towards Wider Multi- linguality. In
Proceedings of the Language Resources and Evaluation Con- ference (LREC 2012),
• Valeria De Paiva, Dario Oliveira, Suemi Higuchi, Alexandre Rademaker, and Gerard De Melo.
Exploratory information extraction from a historical dictionary e-Science (e- Science), volume
2, pages 11–18. IEEE, October 2014.
• Alexandre Rademaker, Fabricio Chalub, Livy Real, Claudia Freitas, Eckhard Bick, and Valeria
de Paiva Universal Dependencies for Portuguese. (Depling), pages 197–206, Pisa, Italy,
September 2017.
• Livy Real, Fabricio Chalub, Valeria de Paiva, Claudia Freitas, and Alexan- dre Rademaker.
Seeing is correcting: curating lexical resources using social interfaces. - Fourth Workshop on
Linked Data in Linguistic Resources and Applications (LDL 2015), Beijing, China