Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 1.2 improving access to digital content by semantic enrichment

86 views

Published on

Talk at SEMANTiCS 2017
www.semantics.cc

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Session 1.2 improving access to digital content by semantic enrichment

  1. 1. Improving access to digital collections by semantic enrichment Theo van Veen and Juliette Lonij, Semantics 2017, 12-09-2017
  2. 2. Overview • Motivation and approach • Entity linking • Presentation • Semantic search • User feedback • Wikidata as thesaurus • Conclusions and next steps T. v. Veen and J. Lonij, Semantics 2017
  3. 3. Motivation and approach T. v. Veen and J. Lonij, Semantics 2017
  4. 4. Motivation • Improving discovery and usability requires intelligent connection of content to the outside world. • Content contains “knowledge” requiring intelligent preprocessing to be found. • Knowledge should be offered to the user more or less unsolicited • Our software should have read, analyzed and enriched our content completely prior to the user!! T. v. Veen and J. Lonij, Semantics 2017
  5. 5. Enrichment: purpose and approach • making content better findable and usable, especially newspaper articles • by enriching text and names in the text with links to related information • which is in most cases linked data (links to Wikidata, Polygoon news reels, images) • and which enables advanced queries and presentation of context information T. v. Veen and J. Lonij, Semantics 2017
  6. 6. How? • “Things” in text have to be uniquely identified • When identifiers link to resource descriptions it is possible to present context information about “things” • Relevant context information can be indexed as part of a “thing” and so it can be searched for • Using properties in external resource descriptions enable semantic search 1. Identification 2. Context 3. Indexing 4. Semantic search T. v. Veen and J. Lonij, Semantics 2017
  7. 7. Enrichment types • Newspaper articles and radio bulletins linked to Polygoon newsreels • Named entities linked to DBpedia (and Wikidata, VIAF etc.) • Place-street combinations in newspaper articles linked to latitude and longitude • Newspaper articles linked to images from the Memory of the Netherlands Named entities Geodata Links Extracted features User annotation Image enrichment DBpedia Street, place, latt., long. Web pages Classification Tags Face recognition Wikidata Place, latt., long. Video Sentiment Stories (oral history) Emotion detection VIAF Images Relevance Object detection Geonames Sound Interestingness Structure detection Etc. Now available T. v. Veen and J. Lonij, Semantics 2017
  8. 8. https://www.wikidata.org/wiki/Q44306 WIKIDATA http://viaf.org/viaf/29540187/ VIAF http://www.isni.org/isni/0000000122778487 ISNI http://data.kb.nl/thesaurus/068941056 KB thesaurus • For bibliographic data trusted links are available: from thesaurus to VIAF, from VIAF to ISNI and from ISNI to Wikidata • For persons, events and locations in text the links have to be created T. v. Veen and J. Lonij, Semantics 2017 We need:
  9. 9. How do we deal with names in text? • Recognize names (named entity recognition) • Identify names by searching them in DBpedia and link the names to the DBpedia descriptions • Those names are ambiguous: does Einstein link to Albert Einstein or Alfred Einstein? We need disambiguation algorithms. • The accuracy of the links will be improved by machine learning techniques; conventional “if then else” software isn’t fit for this job • We need user feedback to correct false links or add missing links and this will be used for additional training T. v. Veen and J. Lonij, Semantics 2017
  10. 10. Entity linking T. v. Veen and J. Lonij, Semantics 2017
  11. 11. Named Entity Linking DBpediaSolr Index DBpedia Search entity Named Entity recognition List with Einsteins Enrichment database Enrichment and training process article Get entities Store article id + resource ids Find the best candidate VIAF Wikidata Etc. T. v. Veen and J. Lonij, Semantics 2017
  12. 12. Enrichment process articles Enrichment database KB research environment SURFsara HPC environment DBpedia index Enrichment infrastructure using SURFsara HPC cloud disambiguation Named entity recognition search T. v. Veen and J. Lonij, Semantics 2017
  13. 13. Continuous improvement of enrichment algorithm article number / time 80 1 108 mlj • All DBpedia titles searched in news articles • Named Entities searched in DBpedia • Speedup by using HPC cloud SURFsara • Using context and machine learning Quality/confidence(%) 70 T. v. Veen and J. Lonij, Semantics 2017 90 At the end cycle to first article and overwrite earlier enrichments with newest algorithm
  14. 14. algorithm accuracy link recall link precision link F-measure Rule based .76 .76 .65 .70 Machine learning (SVM) .84 .76 .83 .79 Neural network .84 .73 .87 .79 Extra features e.g. word embedding .85 .81 .82 .82 Extra Wikidata data, more training data .87 .81 .86 .84 Entity embedding .88 .86 .85 .85 From conventional entity linking to deep learning and beyond T. v. Veen and J. Lonij, Semantics 2017
  15. 15. Development cycle Justification: Our aim is obtaining a higher quality than existing entity linking software (e.g. DBpedia Spotlight) Trust/quality Stored LNE’s Running algorithm Algorithm in development Enriched by users Target trust level T. v. Veen and J. Lonij, Semantics 2017 train replace improve Example of comparison of stored LNE’s, result of current algorithm, result of algorithm under development and existing software.
  16. 16. Presentation T. v. Veen and J. Lonij, Semantics 2017
  17. 17. Naam en/of datum
  18. 18. Naam en/of datum • Theo van Veen, 16-6-2016
  19. 19. Research portal Identified names (LNE) and other enrichments Configurable services Context information and extra navigation options for a name
  20. 20. Semantic search T. v. Veen and J. Lonij, Semantics 2017
  21. 21. Sematic search: index resource identifiers Newspaper index Text + Viaf id + Wikidata id etc. Enrichment database Indexing Get text for article X Get enrichments for article X search articles with wikidata id’s Wikidata Semantic search (SPARQL) providing wikidata id’s T. v. Veen and J. Lonij, Semantics 2017
  22. 22. Articles mentioning members of parliament not born in the Netherlands SELECT ?p WHERE { ?p wdt:P39 wd:Q18887908 . ?p wdt:P19 ?place . ?place wdt:P17 ?country . FILTER NOT EXISTS { ?place wdt:P17 wd:Q55 . } }
  23. 23. For the same query in the catalogue the Wikidata identifier is converted to the local thesaurus identifier
  24. 24. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example Using square brackets the software tries a few Wikidata SPARQL queries and replaces this string by the Wikidata results.
  25. 25. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  26. 26. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  27. 27. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  28. 28. • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  29. 29. spouse=Elizabeth Taylor • Semantic query between [ ], in this case expand to all Roman Emperors • Select “newspaper+” collection • Select a result • Click on a linked named entity for more information • Click on “More info” for properties of this entity • Click on a property for searching more articles about resources with that property • And see the result: all articles mentioning persons that have been married to Elizabeth Taylor Navigation example
  30. 30. User feedback T. v. Veen and J. Lonij, Semantics 2017
  31. 31. User feedback is needed for correcting false links and adding new links!
  32. 32. This feedback serves as additional training data for the disambiguation software
  33. 33. Wikidata as thesaurus T. v. Veen and J. Lonij, Semantics 2017
  34. 34. Wikidata as central hub? W W “Everything links to everything” “Everything links to Wikidata”
  35. 35. Current situation: many to many links (many identifiers for single resource) Proposed: everything links to Wikidata (same identifier for single resource) Wikidata as universal thesaurus for libraries T. v. Veen and J. Lonij, Semantics 2017
  36. 36. Conclusions and next steps T. v. Veen and J. Lonij, Semantics 2017
  37. 37. Conclusions and next steps • Entity linking combining machine learning and domain knowledge is promising and we still have ideas for improvements • We have shown the added value of linking named entities to Wikidata/DBpedia: it improves findability and usability of content as demonstrated with the research portal • Our aim is to increase the confidence of links so users can trust them “enough” for using them for searching and research • User feedback provides additional training data and needs to be deployed on a larger scale T. v. Veen and J. Lonij, Semantics 2017
  38. 38. Any questions? theo.vanveen@kb.nl juliette.lonij@kb.nl http://www.kbresearch.nl/xportal

×