Multilingual Fine-grained
Entity Typing
Marieke van Erp
Piek Vossen
Take-home message
• Fine-grained entity typing is valuable for further
downstream NLP tasks
• Wikipedia text + DBpedia taxonomy + embeddings
enable distantly supervised fine-grained entity typing
• Experiments for Dutch and Spanish
• Code and experiments available at: https://github.com/
cltl/multilingual-finegrained-entity-typing
Why fine-grained entity typing?
• Traditional NERC approaches discern limited number of
types:
• CoNLL: Person, Organisation, Location, Miscellaneous
• ACE: Person, Organisation, Location, Facility, Weapon,
Vehicle and Geo-Political Entity
• Downstream NLP tasks may benefit from more specific
entity types, e.g.:
• relation extraction, coreference resolution, entity linking
Why fine-grained entity typing?
Paul Noonan
(Singer/songwriter)
Paul Noonan
(Failure Analysis Engineer)
Approach
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff. place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Model
?, swansea, Xxxxxxx
place
Labelled text from Wikipedia
Entity types from DBpedia
Feature vectors
Test instance
Predicted label
Approach
• Wikipedia links provide entity mentions
• Surrounding text provides context to
these entity mentions
• DBpedia provides type information to
entity mentions
• FIGER (Ling & Weld, 2012) & GFT (Gillick
et al. 2014) map types to Freebase via
Wikipedia categories: error prone
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff.
Labelled text from Wikipedia
Entity types from DBpedia
Approach
• Context + entity mention + type information are used to
generate feature vectors
• Features based on previous work for English
place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Feature vectors
Approach
• A model is trained using the Facebook
fastText algorithm
• Inspired by word2vec cbow model
• Incorporates character n-grams:
useful for morphologically rich
languages (such as Dutch and
Spanish)
Model
Approach
• The model is tested using a held-out
dataset
• 1/3 of all generated data
Model
?, swansea, Xxxxxxx
place
Test instance
Predicted label
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Fine-grained entity types
← Ling & Weld, 2012
113 types listed
45 present in test data including
/livingthing/animal, /living_thing and
/transportation/road
Gillick et al., 2014 →
89 types listed
39 present in test data
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
• Sample errors
Mention Gold Standard Prediction
Kaeso Fabius Person OfficeHolder
Lebeuville Municipality Settlement
Lodewijk Bruckman Artist Writer
Hyde Park Corner MetroStation Park
K.I.M. University Organisation
Francine Van Assche Athlete Actor
Haackgreerius miopus Reptile Insect
Wahab Akbar Politician Family
Baureihe 211 Locomotive Train
Congrier Municipality Settlement
SPA-Viberti AS.42 MilitaryVehicle Automobile
Moissey Kogan Artist FictionalCharacter
aluminiumplaat ChemicalElement ChemicalCompound
Jacob Black FictionalCharacter MusicalArtist
Sotla River Mountain
Nowa Deba PopulatedPlace TelevisionShow
Dean Woods Cyclist Actor
Abdullah the Butcher Wrestler FictionalCharacter
Ratislav Mores SoccerPlayer MusicalArtist
Christophe Laborie Cyclist PoliticalParty
DBpedia Type Coverage
• Not all 685 DBpedia classes are present in type file:
• 269 in Dutch DBpedia
• 143 Spanish DBpedia
• Type file only contains most specific class:
• http://nl.dbpedia.org/resource/Cheddar_(kaas) has type
“dbo:Cheese” in type file, “dbo:Food” needs to be inferred
(work in progress)
• Cultural differences:
• College sports are almost entirely absent in the Netherlands,
thus unlikely to find mentions of type
“dbo:NationalCollegiateAthleticAssociationAthlete”
Types and Roles
• DBpedia ontology adheres to single type per entity
• dbpedia:Arnold_Schwarzenegger is
dbo:OfficeHolder
• yago:Actor, yago:BodyBuilder, yago:Emigrant
• Trade-off:
• multiple types/roles can facilitate contextual typing
• may also introduce noise in the training data
Conclusions and future work
• Despite incomplete type coverage, Wikipedia + DBpedia
form a good basis for fine-grained entity typing
• Links between English and Dutch and Spanish DBpedia
versions may be leveraged to increase coverage
• DBpedia hierarchy is useful in generic setting
• But still has coverage gaps such as ‘cuisine’ and
‘education’
• Explore other hierarchies
https://github.com/cltl/multilingual-finegrained-entity-typing
References
• Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.:
Context-dependent fine-grained entity type tagging. arXiv (2014)
• Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI
(2012)
• Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine
grained entity type classification. In: Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language
Processing (ACL-IJCNLP 2015), Short papers, Bejing, China,
26–31 July 2015, pp. 291–296. Association for Computational
Linguistics (2015)
Image sources
• Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/
Haeckel_arbol_bn.png
• Paul Noonan (singer/songrwiter): http://images.entertainment.ie/
images_content/rectangle/620x372/paulnoonan.jpg
• Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/
page_49377.html
• SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/
2/23/AS42-1.gif/250px-AS42-1.gif
• Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/
H1kAAOSwbopZPupi/s-l200.jpg

Multilingual Fine-grained Entity Typing

  • 1.
  • 2.
    Take-home message • Fine-grainedentity typing is valuable for further downstream NLP tasks • Wikipedia text + DBpedia taxonomy + embeddings enable distantly supervised fine-grained entity typing • Experiments for Dutch and Spanish • Code and experiments available at: https://github.com/ cltl/multilingual-finegrained-entity-typing
  • 3.
    Why fine-grained entitytyping? • Traditional NERC approaches discern limited number of types: • CoNLL: Person, Organisation, Location, Miscellaneous • ACE: Person, Organisation, Location, Facility, Weapon, Vehicle and Geo-Political Entity • Downstream NLP tasks may benefit from more specific entity types, e.g.: • relation extraction, coreference resolution, entity linking
  • 4.
    Why fine-grained entitytyping? Paul Noonan (Singer/songwriter) Paul Noonan (Failure Analysis Engineer)
  • 5.
    Approach El rodaje tuvolugar en varios lugares, entre ellos Londres y Cardiff. place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Model ?, swansea, Xxxxxxx place Labelled text from Wikipedia Entity types from DBpedia Feature vectors Test instance Predicted label
  • 6.
    Approach • Wikipedia linksprovide entity mentions • Surrounding text provides context to these entity mentions • DBpedia provides type information to entity mentions • FIGER (Ling & Weld, 2012) & GFT (Gillick et al. 2014) map types to Freebase via Wikipedia categories: error prone El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. Labelled text from Wikipedia Entity types from DBpedia
  • 7.
    Approach • Context +entity mention + type information are used to generate feature vectors • Features based on previous work for English place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Feature vectors
  • 8.
    Approach • A modelis trained using the Facebook fastText algorithm • Inspired by word2vec cbow model • Incorporates character n-grams: useful for morphologically rich languages (such as Dutch and Spanish) Model
  • 9.
    Approach • The modelis tested using a held-out dataset • 1/3 of all generated data Model ?, swansea, Xxxxxxx place Test instance Predicted label
  • 10.
    Results Comparative results onEnglish GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 11.
    Results Comparative results onEnglish GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 12.
    Fine-grained entity types ←Ling & Weld, 2012 113 types listed 45 present in test data including /livingthing/animal, /living_thing and /transportation/road Gillick et al., 2014 → 89 types listed 39 present in test data
  • 13.
    Results Comparative results onEnglish GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 14.
    Results Comparative results onEnglish GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 15.
    Results Comparative results onEnglish GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 16.
    • Sample errors MentionGold Standard Prediction Kaeso Fabius Person OfficeHolder Lebeuville Municipality Settlement Lodewijk Bruckman Artist Writer Hyde Park Corner MetroStation Park K.I.M. University Organisation Francine Van Assche Athlete Actor Haackgreerius miopus Reptile Insect Wahab Akbar Politician Family Baureihe 211 Locomotive Train Congrier Municipality Settlement SPA-Viberti AS.42 MilitaryVehicle Automobile Moissey Kogan Artist FictionalCharacter aluminiumplaat ChemicalElement ChemicalCompound Jacob Black FictionalCharacter MusicalArtist Sotla River Mountain Nowa Deba PopulatedPlace TelevisionShow Dean Woods Cyclist Actor Abdullah the Butcher Wrestler FictionalCharacter Ratislav Mores SoccerPlayer MusicalArtist Christophe Laborie Cyclist PoliticalParty
  • 17.
    DBpedia Type Coverage •Not all 685 DBpedia classes are present in type file: • 269 in Dutch DBpedia • 143 Spanish DBpedia • Type file only contains most specific class: • http://nl.dbpedia.org/resource/Cheddar_(kaas) has type “dbo:Cheese” in type file, “dbo:Food” needs to be inferred (work in progress) • Cultural differences: • College sports are almost entirely absent in the Netherlands, thus unlikely to find mentions of type “dbo:NationalCollegiateAthleticAssociationAthlete”
  • 18.
    Types and Roles •DBpedia ontology adheres to single type per entity • dbpedia:Arnold_Schwarzenegger is dbo:OfficeHolder • yago:Actor, yago:BodyBuilder, yago:Emigrant • Trade-off: • multiple types/roles can facilitate contextual typing • may also introduce noise in the training data
  • 19.
    Conclusions and futurework • Despite incomplete type coverage, Wikipedia + DBpedia form a good basis for fine-grained entity typing • Links between English and Dutch and Spanish DBpedia versions may be leveraged to increase coverage • DBpedia hierarchy is useful in generic setting • But still has coverage gaps such as ‘cuisine’ and ‘education’ • Explore other hierarchies
  • 20.
  • 21.
    References • Gillick, D.,Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014) • Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012) • Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)
  • 22.
    Image sources • Treeof Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/ Haeckel_arbol_bn.png • Paul Noonan (singer/songrwiter): http://images.entertainment.ie/ images_content/rectangle/620x372/paulnoonan.jpg • Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/ page_49377.html • SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/ 2/23/AS42-1.gif/250px-AS42-1.gif • Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/ H1kAAOSwbopZPupi/s-l200.jpg