SlideShare a Scribd company logo
Multilingual Fine-grained
Entity Typing
Marieke van Erp
Piek Vossen
Take-home message
• Fine-grained entity typing is valuable for further
downstream NLP tasks
• Wikipedia text + DBpedia taxonomy + embeddings
enable distantly supervised fine-grained entity typing
• Experiments for Dutch and Spanish
• Code and experiments available at: https://github.com/
cltl/multilingual-finegrained-entity-typing
Why fine-grained entity typing?
• Traditional NERC approaches discern limited number of
types:
• CoNLL: Person, Organisation, Location, Miscellaneous
• ACE: Person, Organisation, Location, Facility, Weapon,
Vehicle and Geo-Political Entity
• Downstream NLP tasks may benefit from more specific
entity types, e.g.:
• relation extraction, coreference resolution, entity linking
Why fine-grained entity typing?
Paul Noonan
(Singer/songwriter)
Paul Noonan
(Failure Analysis Engineer)
Approach
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff. place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Model
?, swansea, Xxxxxxx
place
Labelled text from Wikipedia
Entity types from DBpedia
Feature vectors
Test instance
Predicted label
Approach
• Wikipedia links provide entity mentions
• Surrounding text provides context to
these entity mentions
• DBpedia provides type information to
entity mentions
• FIGER (Ling & Weld, 2012) & GFT (Gillick
et al. 2014) map types to Freebase via
Wikipedia categories: error prone
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff.
Labelled text from Wikipedia
Entity types from DBpedia
Approach
• Context + entity mention + type information are used to
generate feature vectors
• Features based on previous work for English
place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Feature vectors
Approach
• A model is trained using the Facebook
fastText algorithm
• Inspired by word2vec cbow model
• Incorporates character n-grams:
useful for morphologically rich
languages (such as Dutch and
Spanish)
Model
Approach
• The model is tested using a held-out
dataset
• 1/3 of all generated data
Model
?, swansea, Xxxxxxx
place
Test instance
Predicted label
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Fine-grained entity types
← Ling & Weld, 2012
113 types listed
45 present in test data including
/livingthing/animal, /living_thing and
/transportation/road
Gillick et al., 2014 →
89 types listed
39 present in test data
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
• Sample errors
Mention Gold Standard Prediction
Kaeso Fabius Person OfficeHolder
Lebeuville Municipality Settlement
Lodewijk Bruckman Artist Writer
Hyde Park Corner MetroStation Park
K.I.M. University Organisation
Francine Van Assche Athlete Actor
Haackgreerius miopus Reptile Insect
Wahab Akbar Politician Family
Baureihe 211 Locomotive Train
Congrier Municipality Settlement
SPA-Viberti AS.42 MilitaryVehicle Automobile
Moissey Kogan Artist FictionalCharacter
aluminiumplaat ChemicalElement ChemicalCompound
Jacob Black FictionalCharacter MusicalArtist
Sotla River Mountain
Nowa Deba PopulatedPlace TelevisionShow
Dean Woods Cyclist Actor
Abdullah the Butcher Wrestler FictionalCharacter
Ratislav Mores SoccerPlayer MusicalArtist
Christophe Laborie Cyclist PoliticalParty
DBpedia Type Coverage
• Not all 685 DBpedia classes are present in type file:
• 269 in Dutch DBpedia
• 143 Spanish DBpedia
• Type file only contains most specific class:
• http://nl.dbpedia.org/resource/Cheddar_(kaas) has type
“dbo:Cheese” in type file, “dbo:Food” needs to be inferred
(work in progress)
• Cultural differences:
• College sports are almost entirely absent in the Netherlands,
thus unlikely to find mentions of type
“dbo:NationalCollegiateAthleticAssociationAthlete”
Types and Roles
• DBpedia ontology adheres to single type per entity
• dbpedia:Arnold_Schwarzenegger is
dbo:OfficeHolder
• yago:Actor, yago:BodyBuilder, yago:Emigrant
• Trade-off:
• multiple types/roles can facilitate contextual typing
• may also introduce noise in the training data
Conclusions and future work
• Despite incomplete type coverage, Wikipedia + DBpedia
form a good basis for fine-grained entity typing
• Links between English and Dutch and Spanish DBpedia
versions may be leveraged to increase coverage
• DBpedia hierarchy is useful in generic setting
• But still has coverage gaps such as ‘cuisine’ and
‘education’
• Explore other hierarchies
https://github.com/cltl/multilingual-finegrained-entity-typing
References
• Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.:
Context-dependent fine-grained entity type tagging. arXiv (2014)
• Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI
(2012)
• Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine
grained entity type classification. In: Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language
Processing (ACL-IJCNLP 2015), Short papers, Bejing, China,
26–31 July 2015, pp. 291–296. Association for Computational
Linguistics (2015)
Image sources
• Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/
Haeckel_arbol_bn.png
• Paul Noonan (singer/songrwiter): http://images.entertainment.ie/
images_content/rectangle/620x372/paulnoonan.jpg
• Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/
page_49377.html
• SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/
2/23/AS42-1.gif/250px-AS42-1.gif
• Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/
H1kAAOSwbopZPupi/s-l200.jpg

More Related Content

Similar to Multilingual Fine-grained Entity Typing

This talk lasts 三十分钟
This talk lasts 三十分钟This talk lasts 三十分钟
This talk lasts 三十分钟
thepilif
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
Janifer Gatenby
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
Tobias Kuhn
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and Localization
Jack Molisani
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software Localization
Lionbridge
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Project
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
David Beazley (Dabeaz LLC)
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana
 
apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...
apidays
 
Galichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISOGalichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISO
National Information Standards Organization (NISO)
 
First Stages and challenges of LibreOffice Translation in Hausa Language
First Stages and challenges  of LibreOffice Translation  in Hausa LanguageFirst Stages and challenges  of LibreOffice Translation  in Hausa Language
First Stages and challenges of LibreOffice Translation in Hausa Language
iCRAFT Corp. (アイクラフト株式会社)
 
Search-Driven Programming
Search-Driven ProgrammingSearch-Driven Programming
Search-Driven ProgrammingEthan Herdrick
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
HPCC Systems
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and Localization
Laura Dent
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
Christophe Guéret
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data Cloud
PretaLLOD
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
Zoltan Varju
 
Introductiontogooglehacking part1
Introductiontogooglehacking part1Introductiontogooglehacking part1
Introductiontogooglehacking part1hacklessons
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos Silveira
Thoughtworks
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speak
Marcos Vinícius
 

Similar to Multilingual Fine-grained Entity Typing (20)

This talk lasts 三十分钟
This talk lasts 三十分钟This talk lasts 三十分钟
This talk lasts 三十分钟
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and Localization
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software Localization
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...
 
Galichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISOGalichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISO
 
First Stages and challenges of LibreOffice Translation in Hausa Language
First Stages and challenges  of LibreOffice Translation  in Hausa LanguageFirst Stages and challenges  of LibreOffice Translation  in Hausa Language
First Stages and challenges of LibreOffice Translation in Hausa Language
 
Search-Driven Programming
Search-Driven ProgrammingSearch-Driven Programming
Search-Driven Programming
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and Localization
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data Cloud
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Introductiontogooglehacking part1
Introductiontogooglehacking part1Introductiontogooglehacking part1
Introductiontogooglehacking part1
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos Silveira
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speak
 

More from Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
Marieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
Marieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
Marieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
Marieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Marieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Marieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Marieke van Erp
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Marieke van Erp
 

More from Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 

Recently uploaded

role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 

Recently uploaded (20)

role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 

Multilingual Fine-grained Entity Typing

  • 2. Take-home message • Fine-grained entity typing is valuable for further downstream NLP tasks • Wikipedia text + DBpedia taxonomy + embeddings enable distantly supervised fine-grained entity typing • Experiments for Dutch and Spanish • Code and experiments available at: https://github.com/ cltl/multilingual-finegrained-entity-typing
  • 3. Why fine-grained entity typing? • Traditional NERC approaches discern limited number of types: • CoNLL: Person, Organisation, Location, Miscellaneous • ACE: Person, Organisation, Location, Facility, Weapon, Vehicle and Geo-Political Entity • Downstream NLP tasks may benefit from more specific entity types, e.g.: • relation extraction, coreference resolution, entity linking
  • 4. Why fine-grained entity typing? Paul Noonan (Singer/songwriter) Paul Noonan (Failure Analysis Engineer)
  • 5. Approach El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Model ?, swansea, Xxxxxxx place Labelled text from Wikipedia Entity types from DBpedia Feature vectors Test instance Predicted label
  • 6. Approach • Wikipedia links provide entity mentions • Surrounding text provides context to these entity mentions • DBpedia provides type information to entity mentions • FIGER (Ling & Weld, 2012) & GFT (Gillick et al. 2014) map types to Freebase via Wikipedia categories: error prone El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. Labelled text from Wikipedia Entity types from DBpedia
  • 7. Approach • Context + entity mention + type information are used to generate feature vectors • Features based on previous work for English place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Feature vectors
  • 8. Approach • A model is trained using the Facebook fastText algorithm • Inspired by word2vec cbow model • Incorporates character n-grams: useful for morphologically rich languages (such as Dutch and Spanish) Model
  • 9. Approach • The model is tested using a held-out dataset • 1/3 of all generated data Model ?, swansea, Xxxxxxx place Test instance Predicted label
  • 10. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 11. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 12. Fine-grained entity types ← Ling & Weld, 2012 113 types listed 45 present in test data including /livingthing/animal, /living_thing and /transportation/road Gillick et al., 2014 → 89 types listed 39 present in test data
  • 13. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 14. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 15. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 16. • Sample errors Mention Gold Standard Prediction Kaeso Fabius Person OfficeHolder Lebeuville Municipality Settlement Lodewijk Bruckman Artist Writer Hyde Park Corner MetroStation Park K.I.M. University Organisation Francine Van Assche Athlete Actor Haackgreerius miopus Reptile Insect Wahab Akbar Politician Family Baureihe 211 Locomotive Train Congrier Municipality Settlement SPA-Viberti AS.42 MilitaryVehicle Automobile Moissey Kogan Artist FictionalCharacter aluminiumplaat ChemicalElement ChemicalCompound Jacob Black FictionalCharacter MusicalArtist Sotla River Mountain Nowa Deba PopulatedPlace TelevisionShow Dean Woods Cyclist Actor Abdullah the Butcher Wrestler FictionalCharacter Ratislav Mores SoccerPlayer MusicalArtist Christophe Laborie Cyclist PoliticalParty
  • 17. DBpedia Type Coverage • Not all 685 DBpedia classes are present in type file: • 269 in Dutch DBpedia • 143 Spanish DBpedia • Type file only contains most specific class: • http://nl.dbpedia.org/resource/Cheddar_(kaas) has type “dbo:Cheese” in type file, “dbo:Food” needs to be inferred (work in progress) • Cultural differences: • College sports are almost entirely absent in the Netherlands, thus unlikely to find mentions of type “dbo:NationalCollegiateAthleticAssociationAthlete”
  • 18. Types and Roles • DBpedia ontology adheres to single type per entity • dbpedia:Arnold_Schwarzenegger is dbo:OfficeHolder • yago:Actor, yago:BodyBuilder, yago:Emigrant • Trade-off: • multiple types/roles can facilitate contextual typing • may also introduce noise in the training data
  • 19. Conclusions and future work • Despite incomplete type coverage, Wikipedia + DBpedia form a good basis for fine-grained entity typing • Links between English and Dutch and Spanish DBpedia versions may be leveraged to increase coverage • DBpedia hierarchy is useful in generic setting • But still has coverage gaps such as ‘cuisine’ and ‘education’ • Explore other hierarchies
  • 21. References • Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014) • Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012) • Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)
  • 22. Image sources • Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/ Haeckel_arbol_bn.png • Paul Noonan (singer/songrwiter): http://images.entertainment.ie/ images_content/rectangle/620x372/paulnoonan.jpg • Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/ page_49377.html • SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/ 2/23/AS42-1.gif/250px-AS42-1.gif • Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/ H1kAAOSwbopZPupi/s-l200.jpg