Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multilingual Fine-grained Entity Typing

228 views

Published on

Presented at LDK2017 Galway, 20 June 2017
http://ldk2017.org/
Accompanying paper: https://link.springer.com/chapter/10.1007/978-3-319-59888-8_23

Published in: Science
  • Be the first to comment

  • Be the first to like this

Multilingual Fine-grained Entity Typing

  1. 1. Multilingual Fine-grained Entity Typing Marieke van Erp Piek Vossen
  2. 2. Take-home message • Fine-grained entity typing is valuable for further downstream NLP tasks • Wikipedia text + DBpedia taxonomy + embeddings enable distantly supervised fine-grained entity typing • Experiments for Dutch and Spanish • Code and experiments available at: https://github.com/ cltl/multilingual-finegrained-entity-typing
  3. 3. Why fine-grained entity typing? • Traditional NERC approaches discern limited number of types: • CoNLL: Person, Organisation, Location, Miscellaneous • ACE: Person, Organisation, Location, Facility, Weapon, Vehicle and Geo-Political Entity • Downstream NLP tasks may benefit from more specific entity types, e.g.: • relation extraction, coreference resolution, entity linking
  4. 4. Why fine-grained entity typing? Paul Noonan (Singer/songwriter) Paul Noonan (Failure Analysis Engineer)
  5. 5. Approach El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Model ?, swansea, Xxxxxxx place Labelled text from Wikipedia Entity types from DBpedia Feature vectors Test instance Predicted label
  6. 6. Approach • Wikipedia links provide entity mentions • Surrounding text provides context to these entity mentions • DBpedia provides type information to entity mentions • FIGER (Ling & Weld, 2012) & GFT (Gillick et al. 2014) map types to Freebase via Wikipedia categories: error prone El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. Labelled text from Wikipedia Entity types from DBpedia
  7. 7. Approach • Context + entity mention + type information are used to generate feature vectors • Features based on previous work for English place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Feature vectors
  8. 8. Approach • A model is trained using the Facebook fastText algorithm • Inspired by word2vec cbow model • Incorporates character n-grams: useful for morphologically rich languages (such as Dutch and Spanish) Model
  9. 9. Approach • The model is tested using a held-out dataset • 1/3 of all generated data Model ?, swansea, Xxxxxxx place Test instance Predicted label
  10. 10. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  11. 11. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  12. 12. Fine-grained entity types ← Ling & Weld, 2012 113 types listed 45 present in test data including /livingthing/animal, /living_thing and /transportation/road Gillick et al., 2014 → 89 types listed 39 present in test data
  13. 13. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  14. 14. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  15. 15. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  16. 16. • Sample errors Mention Gold Standard Prediction Kaeso Fabius Person OfficeHolder Lebeuville Municipality Settlement Lodewijk Bruckman Artist Writer Hyde Park Corner MetroStation Park K.I.M. University Organisation Francine Van Assche Athlete Actor Haackgreerius miopus Reptile Insect Wahab Akbar Politician Family Baureihe 211 Locomotive Train Congrier Municipality Settlement SPA-Viberti AS.42 MilitaryVehicle Automobile Moissey Kogan Artist FictionalCharacter aluminiumplaat ChemicalElement ChemicalCompound Jacob Black FictionalCharacter MusicalArtist Sotla River Mountain Nowa Deba PopulatedPlace TelevisionShow Dean Woods Cyclist Actor Abdullah the Butcher Wrestler FictionalCharacter Ratislav Mores SoccerPlayer MusicalArtist Christophe Laborie Cyclist PoliticalParty
  17. 17. DBpedia Type Coverage • Not all 685 DBpedia classes are present in type file: • 269 in Dutch DBpedia • 143 Spanish DBpedia • Type file only contains most specific class: • http://nl.dbpedia.org/resource/Cheddar_(kaas) has type “dbo:Cheese” in type file, “dbo:Food” needs to be inferred (work in progress) • Cultural differences: • College sports are almost entirely absent in the Netherlands, thus unlikely to find mentions of type “dbo:NationalCollegiateAthleticAssociationAthlete”
  18. 18. Types and Roles • DBpedia ontology adheres to single type per entity • dbpedia:Arnold_Schwarzenegger is dbo:OfficeHolder • yago:Actor, yago:BodyBuilder, yago:Emigrant • Trade-off: • multiple types/roles can facilitate contextual typing • may also introduce noise in the training data
  19. 19. Conclusions and future work • Despite incomplete type coverage, Wikipedia + DBpedia form a good basis for fine-grained entity typing • Links between English and Dutch and Spanish DBpedia versions may be leveraged to increase coverage • DBpedia hierarchy is useful in generic setting • But still has coverage gaps such as ‘cuisine’ and ‘education’ • Explore other hierarchies
  20. 20. https://github.com/cltl/multilingual-finegrained-entity-typing
  21. 21. References • Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014) • Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012) • Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)
  22. 22. Image sources • Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/ Haeckel_arbol_bn.png • Paul Noonan (singer/songrwiter): http://images.entertainment.ie/ images_content/rectangle/620x372/paulnoonan.jpg • Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/ page_49377.html • SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/ 2/23/AS42-1.gif/250px-AS42-1.gif • Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/ H1kAAOSwbopZPupi/s-l200.jpg

×