Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Evaluating Entity Linking: An Analysis of Current
Benchmark Datasets and a Roadmap for Doing a
Better Job
Marieke van Erp,...
Take home message
• Existing entity linking datasets:
• are not interoperable
• do not cover many different domains
• skew...
Why
• Named entity linking approaches achieve F1 scores of ~.80
on various benchmark datasets
• Are we really testing our ...
This work
• Analysis of 7 entity linking benchmark datasets
• Dataset characteristics (document type, domain, license etc)...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 (5,596)
NEEL...
Datasets
Dataset Type Domain Doc length Format Encoding License
AIDA-YAGO2 news general medium TSV ASCII Agreement
2014/20...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Entity Overlap
• Number of entities present in one dataset that are also
present in other datasets
AIDA-YAGO2 NEEL2014 NEE...
Confusability
• The number of meanings a surface form (mention) can have
11
Confusability
Corpus Average Min Max
AIDA-YAGO2 1.08 1 13 0.37
2014 NEEL 1.02 1 3 0.16
2015 NEEL 1.05 1 4 0.25
OKE2015 1.1...
Dominance
Corpus Dominance Min Max
AIDA-YAGO2 .98 1 452 0.08
2014 NEEL .99 1 47 0.06
2015 NEEL .98 1 88 0.09
OKE2015 .98 1...
Entity Types
https://github.com/dbpedia-spotlight/evaluation-datasets/ 14
Entity Types
15
Entity Prominance
https://github.com/dbpedia-spotlight/evaluation-datasets/ 16
DBpedia PageRank datasets:
http://people.ai...
How can we do better?
• Document your dataset!
• Use a standardised format
• Diversify both in domains and in entity distr...
Work in Progress & Future work
• Analyse more datasets
• Evaluate the temporal dimension of datasets (current work
by Fili...
Want to help?
Scripts and data used here can be found at:
Contact marieke.van.erp@vu.nl if you want to collaborate
https:/...
Shameless Advertising
NLP&DBpedia 2016
Workshop at ISWC2016
Submission deadline: 1 July
https://nlpdbpedia2016.wordpress.c...
Acknowledgements
https://github.com/dbpedia-spotlight/evaluation-datasets/
Upcoming SlideShare
Loading in …5
×

Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

361 views

Published on

Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis
Presented at LREC 2016:
http://www.lrec-conf.org/proceedings/lrec2016/pdf/926_Paper.pdf

Published in: Science
  • Be the first to comment

Evaluating entity linking an analysis of current benchmark datasets and a roadmap for doing a better job (3)

  1. 1. Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis https://github.com/dbpedia-spotlight/evaluation-datasets
  2. 2. Take home message • Existing entity linking datasets: • are not interoperable • do not cover many different domains • skew towards popular and frequent entities • We need to: • Document & Standardise • Diversify to cover different domains and the long tail https://github.com/dbpedia-spotlight/evaluation-datasets/ 2
  3. 3. Why • Named entity linking approaches achieve F1 scores of ~.80 on various benchmark datasets • Are we really testing our approaches on all aspects of the entity linking task? It’s not just us: Maud Ehrmann and Damien Nouvel and Sophie Rosset. Named Entity Resources - Overview and Outlook. LREC 2016 https://github.com/dbpedia-spotlight/evaluation-datasets/ 3
  4. 4. This work • Analysis of 7 entity linking benchmark datasets • Dataset characteristics (document type, domain, license etc) • Entity, surface form & mention characterisation (overlap between datasets, confusability, prominence, dominance, types, etc) • Annotation characteristics (nested entities, redundancy, IAA, offsets) + Roadmap: how can we do better https://github.com/dbpedia-spotlight/evaluation-datasets/ 4
  5. 5. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 (5,596) NEEL2014 (2,380) NEEL2015 (2,800) OKE2015 (531) RSS500 (849) WES2015 (7,309) Wikinews (279) https://github.com/dbpedia-spotlight/evaluation-datasets/ 5
  6. 6. Datasets Dataset Type Domain Doc length Format Encoding License AIDA-YAGO2 news general medium TSV ASCII Agreement 2014/2015 NEEL tweets general short TSV ASCII Open OKE2015 encyclopaedia general long NIF/RDF UTF8 Open RSS-500 news general medium NIF/RDF UTF8 Open WES2015 blog science long NIF/RDF UTF8 Open WikiNews news general medium XML UTF8 Open https://github.com/dbpedia-spotlight/evaluation-datasets/ 6
  7. 7. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 7
  8. 8. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 8
  9. 9. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 9
  10. 10. Entity Overlap • Number of entities present in one dataset that are also present in other datasets AIDA-YAGO2 NEEL2014 NEEL2015 OKE2015 RSS500 WES2015 Wikinews AIDA-YAGO2 (5,596) 5.87% 8.06% 0.00% 1.26% 4.80% 1.16% NEEL2014 (2,380) 13.73% 68.49% 2.39% 2.56% 12.35% 2.82% NEEL2015 (2,800) 16.11% 58.21% 2.00% 2.54% 7.93% 2.57% OKE2015 (531) 0.00% 10.73% 10.55% 2.44% 28.06% 3.95% RSS500 (849) 8.24% 7.18% 8.36% 1.53% 3.18% 1.88% WES2015 (7,309) 3.68% 4.02% 3.04% 2.04% 0.16% 0.66% Wikinews (279) 23.30% 24.01% 25.81% 7.53% 5.73% 17.20% https://github.com/dbpedia-spotlight/evaluation-datasets/ 10
  11. 11. Confusability • The number of meanings a surface form (mention) can have 11
  12. 12. Confusability Corpus Average Min Max AIDA-YAGO2 1.08 1 13 0.37 2014 NEEL 1.02 1 3 0.16 2015 NEEL 1.05 1 4 0.25 OKE2015 1.11 1 25 1.22 RSS500 1.02 1 3 0.16 WES2015 1.06 1 6 0.30 Wikinews 1.09 1 29 1.03 https://github.com/dbpedia-spotlight/evaluation-datasets/ 12
  13. 13. Dominance Corpus Dominance Min Max AIDA-YAGO2 .98 1 452 0.08 2014 NEEL .99 1 47 0.06 2015 NEEL .98 1 88 0.09 OKE2015 .98 1 1 0.11 RSS500 .99 1 1 0.07 WES2015 .97 1 1 0.12 Wikinews .99 1 72 0.09 https://github.com/dbpedia-spotlight/evaluation-datasets/ 13
  14. 14. Entity Types https://github.com/dbpedia-spotlight/evaluation-datasets/ 14
  15. 15. Entity Types 15
  16. 16. Entity Prominance https://github.com/dbpedia-spotlight/evaluation-datasets/ 16 DBpedia PageRank datasets: http://people.aifb.kit.edu/ath/
  17. 17. How can we do better? • Document your dataset! • Use a standardised format • Diversify both in domains and in entity distribution https://github.com/dbpedia-spotlight/evaluation-datasets/ 17
  18. 18. Work in Progress & Future work • Analyse more datasets • Evaluate the temporal dimension of datasets (current work by Filip Ilievski & Marten Postma) • Integrate and build better datasets https://github.com/dbpedia-spotlight/evaluation-datasets/ 18
  19. 19. Want to help? Scripts and data used here can be found at: Contact marieke.van.erp@vu.nl if you want to collaborate https://github.com/dbpedia-spotlight/evaluation-datasets/ 19
  20. 20. Shameless Advertising NLP&DBpedia 2016 Workshop at ISWC2016 Submission deadline: 1 July https://nlpdbpedia2016.wordpress.com/ 20
  21. 21. Acknowledgements https://github.com/dbpedia-spotlight/evaluation-datasets/

×