Embedding Nomlex-BR into OpenWN-PT


  1. 1. + Embedding NomLex-BR into OpenWN-PT Valeria de Paiva (joint work with Alexandre Rademaker, Gerard de Melo and Livy Real)
  2. 2. + NomLex? http://nlp.cs.nyu.edu/nomlex/index.html
  3. 3. + NomLex n  a dictionary of English nominalizations, under Catherine Macleod n  relate the nominal complements to the arguments of the corresponding verb n  1025 entries of several types of lexical nominalizations n  first version on January 15, 1999, latest version October 2001 n  Developed into NomLex-Plus and NomBank n  downloadable from http:// nlp.cs.nyu.edu/nomlex/index.html Alexander’s destruction of the city happened in 330 BC.
  4. 4. + NOMLEX-BR? n  a dictionary of Portuguese nominalizations n  Relate nominals to corresponding verbs n  Over 1000 entries of several types of lexical nominalizations n  first version of NOMLEX-BR in 2011, much expanded 2013 n  downloadable https://github.com/ arademaker/nomlex-br Construção da rodovia Transamazônica, na década de 70, pelo governo Medici, uma das obras faraonicas da ditadura militar.
  5. 5. + Nominalizations in Portuguese n  Nominalizations are difficult to deal in KR systems, as it is harder to obtain the arguments of the nominal predicate n  NOMLEX project (Macleod et al., 1998) provides a well-established, open access baseline n  nominalizations with the suffixes -ion, -ment and -er, which work well in Portuguese n  E.g. construction/ construcao, adjournment/ adiamento and writer/escritor n  90% of the original resource easily manually translated.
  6. 6. + Into OpenWordnet-PT? Why? We need a Portuguese Wordnet for our work, as complete and accurate as we can get it. Nomlex-BR helps completenes and accuracy of OpenWN-PT
  7. 7. + OpenWordNet-PT… n  data is freely available n  correspondence with Princeton WordNet n  From Universal WordNet(de Melo and Weikum, 2009) high recall with high precision for the more salient words n  Useful embedding: checking nominalizations from the Portuguese NOMLEX were related to the corresponding verbs showed issues in OpenWN-PT. https://github.com/arademaker/wordnet-br
  8. 8. + OpenWN-PT: what does it look like? n  Typical good entry with minor manual improvements. n  Automatic produces candidate Portuguese words for each of some of WN3.0 synsets. n  Check suggested words and add Portuguese gloss and examples.
  9. 9. + OpenWN-PT: what does it look like? Not very useful, but sense exists No single verb in Portuguese for this synset…
  10. 10. + OpenWN-PT: some issues… Capitalized items, plurals, duplicates, a few gender issues, missing items…
  11. 11. + OpenWN-PT: RDF Representation n  OpenWN-PT encoded and distributed in RDF/ OWL. n  Both data model and actual data in the same format. Plus existing data processing tools, including databases (“triple stores”) with SQLlike query interfaces (SPARQL). n  Standard W3C encoding of WordNet in RDF since 2006. OpenWN-PT is modelled after and fully interoperable with Princeton WordNet. n  find Portuguese equivalents for specific English word senses and vice versa. n  OpenWN-PT is part of a large ecosystem of compatible resources, including domain identifiers and mappings to Wikipedia.
  12. 12. + A small Experiment… n  Accuracy: Since the lexicon was manually created, it is mostly accurate. Minor typos and bugs are checked when comparing to OpenWN-PT. n  Coverage: Using DHBB to complete NOMLEX-BR, completed after submission n  Need more systematic effort. But results were encouraging
  13. 13. + Conclusions n  We presented NomLex-BR, an lexicon of nominalizations in Brazilian Portuguese. n  NomLex-BR is embedded into OpenWordNetPT and shares its RDF representation. n  Recent improvements include better coverage: newer suffixes and Nomage incorporation. n  The data is freely available from http:// github.com/ arademaker/wordnet-br/ and a SPARQL Endpoint at logics.emap.fgv.br: 10035. n  Browsing via Open Multilingual Wordnet // www.casta-net.jp/ ~kuribayashi/ cgi-bin/wnmulti.cgi is fun
  14. 14. + NomLex-BR: next steps?.. n  Work with Claudia Freitas on leveraging Linguateca’s PAPEL, ACDC and Floresta Sintá(c)tica. n  Lists from Linguateca’s resources complement NomLex-BR using corpora and make sure our resource is not simply a translation. n  Classification of nominalizations? n  Adding the Portuguese terms that satisfy different relations? OpenVerbNet-PT? n  Glosses?
  15. 15. + Thanks!
  16. 16. + References Revisiting a Brazilian Wordnet. Valeria de Paiva, Alexandre Rademaker,  (2012) Proceedings of Global Wordnet Conference, Global Wordnet Association, Matsue. OpenWordNet-PT: An Open Brazilian WordNet For Reasoning. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. In Proceedings of the 24th International Conference On Computational Linguistics. http://hdl.handle.net/10438/10274. OpenWordNet-PT: A Project Report. Alexandre Rademaker, Valeria de Paiva, Gerard de Melo, Livy Real and Maira Gatti. Proceedings of the 7th Global Wordnet Conference, Tartu, Estonia. Global Wordnet Association, 2014. Embedding NomLex-BR Nominalizations Into OpenWordnet-PT. Coelho, Livy Maria Real, Alexandre Rademaker, Valeria De Paiva, and Gerard de Melo. 2014. In Proceedings of the 7th Global WordNet Conference. Tartu, Estonia
  17. 17. + OpenWN-PT: true lexical gaps?...
  18. 18. + Other stuff to add in?… n  Onto.PT, ES wordnet? n  Editing interfaces? n  BabelNet? n  NER issues? n  Temporal issues? n  Work with Claudia Freitas?…Leonel? n  Work on implicatives/factives in Portuguese? n  FOIS workshop
  19. 19. + References Towards a Universal Wordnet by Learning from Combined Evidence  Gerard de Melo, Gerhard Weikum (2009) 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China. Bridges from Language to Logic:  Concepts, Contexts and Ontologies Valeria de Paiva (2010) Logical and Semantic Frameworks with Applications, LSFA'10, Natal, Brazil, 2010. `A Basic Logic for Textual inference", AAAI Workshop on Inference for Textual Question Answering, 2005. ``Textual Inference Logic: Take Two", CONTEXT 2007. ``Precision-focused Textual Inference", Workshop on Textual Entailment and Paraphrasing, 2007. PARC's Bridge and Question Answering System Proceedings of Grammar Engineering Across Frameworks, 2007.
  20. 20. + Simplifying the PARC’s Bridge Architecture Text Parsing Inference Engines KR Mapping F-structure semantics KR Sources Assertions Query Question Grammar Stanford Parser Term rewriting OpenWN-PT SUMO-PT KR mapping rules Textual Inference logics Idea: Simplify and reproduce components in PORTUGUESE