SlideShare a Scribd company logo
1 of 21
Download to read offline
Evaluating Entity Linking: An Analysis of Current
Benchmark Datasets and a Roadmap for Doing a
Better Job
Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe
Rizzo and Joerg Waitelonis
https://github.com/dbpedia-spotlight/evaluation-datasets
Take home message
• Existing entity linking datasets:
• are not interoperable
• do not cover many different domains
• skew towards popular and frequent entities
• We need to:
• Document & Standardise
• Diversify to cover different domains and the long tail
https://github.com/dbpedia-spotlight/evaluation-datasets/ 2
Why
• Named entity linking approaches achieve F1 scores of ~.80
on various benchmark datasets
• Are we really testing our approaches on all aspects of the
entity linking task?
It’s not just us:
Maud Ehrmann and Damien Nouvel and
Sophie Rosset. Named Entity Resources -
Overview and Outlook. LREC 2016
https://github.com/dbpedia-spotlight/evaluation-datasets/ 3
This work
• Analysis of 7 entity linking benchmark datasets
• Dataset characteristics (document type, domain, license etc)
• Entity, surface form & mention characterisation (overlap
between datasets, confusability, prominence, dominance,
types, etc)
• Annotation characteristics (nested entities, redundancy, IAA,
offsets)
+ Roadmap: how can we do better
https://github.com/dbpedia-spotlight/evaluation-datasets/ 4
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 (5,596)
NEEL2014 (2,380)
NEEL2015 (2,800)
OKE2015 (531)
RSS500 (849)
WES2015 (7,309)
Wikinews (279)
https://github.com/dbpedia-spotlight/evaluation-datasets/ 5
Datasets
Dataset Type Domain Doc length Format Encoding License
AIDA-YAGO2 news general medium TSV ASCII Agreement
2014/2015
NEEL
tweets general short TSV ASCII Open
OKE2015 encyclopaedia general long NIF/RDF UTF8 Open
RSS-500 news general medium NIF/RDF UTF8 Open
WES2015 blog science long NIF/RDF UTF8 Open
WikiNews news general medium XML UTF8 Open
https://github.com/dbpedia-spotlight/evaluation-datasets/ 6
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews
AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16%
NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82%
NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57%
OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95%
RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88%
WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66%
Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20%
https://github.com/dbpedia-spotlight/evaluation-datasets/ 7
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews
AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16%
NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82%
NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57%
OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95%
RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88%
WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66%
Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20%
https://github.com/dbpedia-spotlight/evaluation-datasets/ 8
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews
AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16%
NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82%
NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57%
OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95%
RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88%
WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66%
Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20%
https://github.com/dbpedia-spotlight/evaluation-datasets/ 9
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews
AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16%
NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82%
NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57%
OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95%
RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88%
WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66%
Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20%
https://github.com/dbpedia-spotlight/evaluation-datasets/ 10
Confusability
• The number of meanings a surface form (mention) can have
11
Confusability
Corpus Average Min Max
AIDA-YAGO2 1.08 1 13 0.37
2014 NEEL 1.02 1 3 0.16
2015 NEEL 1.05 1 4 0.25
OKE2015 1.11 1 25 1.22
RSS500 1.02 1 3 0.16
WES2015 1.06 1 6 0.30
Wikinews 1.09 1 29 1.03
https://github.com/dbpedia-spotlight/evaluation-datasets/ 12
Dominance
Corpus Dominance Min Max
AIDA-YAGO2 .98 1 452 0.08
2014 NEEL .99 1 47 0.06
2015 NEEL .98 1 88 0.09
OKE2015 .98 1 1 0.11
RSS500 .99 1 1 0.07
WES2015 .97 1 1 0.12
Wikinews .99 1 72 0.09
https://github.com/dbpedia-spotlight/evaluation-datasets/ 13
Entity Types
https://github.com/dbpedia-spotlight/evaluation-datasets/ 14
Entity Types
15
Entity Prominance
https://github.com/dbpedia-spotlight/evaluation-datasets/ 16
DBpedia PageRank datasets:
http://people.aifb.kit.edu/ath/
How can we do better?
• Document your dataset!
• Use a standardised format
• Diversify both in domains and in entity distribution
https://github.com/dbpedia-spotlight/evaluation-datasets/ 17
Work in Progress & Future work
• Analyse more datasets
• Evaluate the temporal dimension of datasets (current work
by Filip Ilievski & Marten Postma)
• Integrate and build better datasets
https://github.com/dbpedia-spotlight/evaluation-datasets/ 18
Want to help?
Scripts and data used here can be found at:
Contact marieke.van.erp@vu.nl if you want to collaborate
https://github.com/dbpedia-spotlight/evaluation-datasets/
19
Shameless Advertising
NLP&DBpedia 2016
Workshop at ISWC2016
Submission deadline: 1 July
https://nlpdbpedia2016.wordpress.com/
20
Acknowledgements
https://github.com/dbpedia-spotlight/evaluation-datasets/

More Related Content

Similar to Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

tidycf: Turning cashflows on their sides to turn analysis on its head
tidycf: Turning cashflows on their sides to turn analysis on its headtidycf: Turning cashflows on their sides to turn analysis on its head
tidycf: Turning cashflows on their sides to turn analysis on its headEmily Riederer
 
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...resyst
 
Evaluasi Sebagai Dasar Perencanaan
Evaluasi Sebagai Dasar PerencanaanEvaluasi Sebagai Dasar Perencanaan
Evaluasi Sebagai Dasar PerencanaanSiti Sahati
 
柏瑞週報 20200214
柏瑞週報 20200214柏瑞週報 20200214
柏瑞週報 20200214Pinebridge
 
Berkeley Shambhala Financial Overview Q2 2015
Berkeley Shambhala Financial Overview Q2 2015Berkeley Shambhala Financial Overview Q2 2015
Berkeley Shambhala Financial Overview Q2 2015BuckDina
 
BSC Financial Overview Q2 2015
BSC Financial Overview Q2 2015BSC Financial Overview Q2 2015
BSC Financial Overview Q2 2015Gypsychick
 
Forecasting enterprenuership 2311
Forecasting enterprenuership 2311Forecasting enterprenuership 2311
Forecasting enterprenuership 2311sainath balasani
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...Christopher Brown
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflowsdgarijo
 
柏瑞週報 20200605
柏瑞週報 20200605柏瑞週報 20200605
柏瑞週報 20200605Pinebridge
 
柏瑞週報 20200424
柏瑞週報 20200424柏瑞週報 20200424
柏瑞週報 20200424Pinebridge
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationLewandog, Inc,
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalIna Smith
 
柏瑞週報 20200508
柏瑞週報 20200508柏瑞週報 20200508
柏瑞週報 20200508Pinebridge
 
Smart Beta Strategies for Global REITs Presentation ARES 2015
Smart Beta Strategies for Global REITs  Presentation ARES 2015Smart Beta Strategies for Global REITs  Presentation ARES 2015
Smart Beta Strategies for Global REITs Presentation ARES 2015Consiliacapital
 
Portfolio mean and variance analysis
Portfolio mean and variance analysisPortfolio mean and variance analysis
Portfolio mean and variance analysisPrashant S. Keswani
 
柏瑞週報 20200227
柏瑞週報 20200227柏瑞週報 20200227
柏瑞週報 20200227Pinebridge
 

Similar to Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3) (20)

tidycf: Turning cashflows on their sides to turn analysis on its head
tidycf: Turning cashflows on their sides to turn analysis on its headtidycf: Turning cashflows on their sides to turn analysis on its head
tidycf: Turning cashflows on their sides to turn analysis on its head
 
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...
Pay-for-Performance and Distributional Effects in Tanzania: A Supply-side Ass...
 
Evaluasi Sebagai Dasar Perencanaan
Evaluasi Sebagai Dasar PerencanaanEvaluasi Sebagai Dasar Perencanaan
Evaluasi Sebagai Dasar Perencanaan
 
柏瑞週報 20200214
柏瑞週報 20200214柏瑞週報 20200214
柏瑞週報 20200214
 
Berkeley Shambhala Financial Overview Q2 2015
Berkeley Shambhala Financial Overview Q2 2015Berkeley Shambhala Financial Overview Q2 2015
Berkeley Shambhala Financial Overview Q2 2015
 
BSC Financial Overview Q2 2015
BSC Financial Overview Q2 2015BSC Financial Overview Q2 2015
BSC Financial Overview Q2 2015
 
Forecasting enterprenuership 2311
Forecasting enterprenuership 2311Forecasting enterprenuership 2311
Forecasting enterprenuership 2311
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
 
Cassidy Sugimoto - Open Access Mandates: Compliance by funders
Cassidy Sugimoto - Open Access Mandates: Compliance by fundersCassidy Sugimoto - Open Access Mandates: Compliance by funders
Cassidy Sugimoto - Open Access Mandates: Compliance by funders
 
柏瑞週報 20200605
柏瑞週報 20200605柏瑞週報 20200605
柏瑞週報 20200605
 
柏瑞週報 20200424
柏瑞週報 20200424柏瑞週報 20200424
柏瑞週報 20200424
 
1Q07 Results
1Q07 Results1Q07 Results
1Q07 Results
 
Creating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick ImplementationCreating a Big data Strategy with Tactics for Quick Implementation
Creating a Big data Strategy with Tactics for Quick Implementation
 
Monitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access JournalMonitoring & evaluating the usage of your Open Access Journal
Monitoring & evaluating the usage of your Open Access Journal
 
柏瑞週報 20200508
柏瑞週報 20200508柏瑞週報 20200508
柏瑞週報 20200508
 
SCM PROJECT
SCM PROJECTSCM PROJECT
SCM PROJECT
 
Smart Beta Strategies for Global REITs Presentation ARES 2015
Smart Beta Strategies for Global REITs  Presentation ARES 2015Smart Beta Strategies for Global REITs  Presentation ARES 2015
Smart Beta Strategies for Global REITs Presentation ARES 2015
 
Portfolio mean and variance analysis
Portfolio mean and variance analysisPortfolio mean and variance analysis
Portfolio mean and variance analysis
 
柏瑞週報 20200227
柏瑞週報 20200227柏瑞週報 20200227
柏瑞週報 20200227
 

More from Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsMarieke van Erp
 
Orientation EBC 2013: Digitising Natural History
Orientation EBC 2013: Digitising Natural HistoryOrientation EBC 2013: Digitising Natural History
Orientation EBC 2013: Digitising Natural HistoryMarieke van Erp
 

More from Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 
Orientation EBC 2013: Digitising Natural History
Orientation EBC 2013: Digitising Natural HistoryOrientation EBC 2013: Digitising Natural History
Orientation EBC 2013: Digitising Natural History
 

Recently uploaded

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 

Recently uploaded (20)

Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

  • 1. Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis https://github.com/dbpedia-spotlight/evaluation-datasets
  • 2. Take home message • Existing entity linking datasets: • are not interoperable • do not cover many different domains • skew towards popular and frequent entities • We need to: • Document & Standardise • Diversify to cover different domains and the long tail https://github.com/dbpedia-spotlight/evaluation-datasets/ 2
  • 3. Why • Named entity linking approaches achieve F1 scores of ~.80 on various benchmark datasets • Are we really testing our approaches on all aspects of the entity linking task? It’s not just us: Maud Ehrmann and Damien Nouvel and Sophie Rosset. Named Entity Resources - Overview and Outlook. LREC 2016 https://github.com/dbpedia-spotlight/evaluation-datasets/ 3
  • 4. This work • Analysis of 7 entity linking benchmark datasets • Dataset characteristics (document type, domain, license etc) • Entity, surface form & mention characterisation (overlap between datasets, confusability, prominence, dominance, types, etc) • Annotation characteristics (nested entities, redundancy, IAA, offsets) + Roadmap: how can we do better https://github.com/dbpedia-spotlight/evaluation-datasets/ 4
  • 5. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 (5,596) NEEL2014 (2,380) NEEL2015 (2,800) OKE2015 (531) RSS500 (849) WES2015 (7,309) Wikinews (279) https://github.com/dbpedia-spotlight/evaluation-datasets/ 5
  • 6. Datasets Dataset Type Domain Doc length Format Encoding License AIDA-YAGO2 news general medium TSV ASCII Agreement 2014/2015 NEEL tweets general short TSV ASCII Open OKE2015 encyclopaedia general long NIF/RDF UTF8 Open RSS-500 news general medium NIF/RDF UTF8 Open WES2015 blog science long NIF/RDF UTF8 Open WikiNews news general medium XML UTF8 Open https://github.com/dbpedia-spotlight/evaluation-datasets/ 6
  • 7. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 7
  • 8. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 8
  • 9. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 9
  • 10. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 10
  • 11. Confusability • The number of meanings a surface form (mention) can have 11
  • 12. Confusability Corpus Average Min Max AIDA-YAGO2 1.08 1 13 0.37 2014 NEEL 1.02 1 3 0.16 2015 NEEL 1.05 1 4 0.25 OKE2015 1.11 1 25 1.22 RSS500 1.02 1 3 0.16 WES2015 1.06 1 6 0.30 Wikinews 1.09 1 29 1.03 https://github.com/dbpedia-spotlight/evaluation-datasets/ 12
  • 13. Dominance Corpus Dominance Min Max AIDA-YAGO2 .98 1 452 0.08 2014 NEEL .99 1 47 0.06 2015 NEEL .98 1 88 0.09 OKE2015 .98 1 1 0.11 RSS500 .99 1 1 0.07 WES2015 .97 1 1 0.12 Wikinews .99 1 72 0.09 https://github.com/dbpedia-spotlight/evaluation-datasets/ 13
  • 17. How can we do better? • Document your dataset! • Use a standardised format • Diversify both in domains and in entity distribution https://github.com/dbpedia-spotlight/evaluation-datasets/ 17
  • 18. Work in Progress & Future work • Analyse more datasets • Evaluate the temporal dimension of datasets (current work by Filip Ilievski & Marten Postma) • Integrate and build better datasets https://github.com/dbpedia-spotlight/evaluation-datasets/ 18
  • 19. Want to help? Scripts and data used here can be found at: Contact marieke.van.erp@vu.nl if you want to collaborate https://github.com/dbpedia-spotlight/evaluation-datasets/ 19
  • 20. Shameless Advertising NLP&DBpedia 2016 Workshop at ISWC2016 Submission deadline: 1 July https://nlpdbpedia2016.wordpress.com/ 20