Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 1.4 a distributed network of heritage information

95 views

Published on

Talk at SEMANTiCS 2017
www.semantics.cc

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Session 1.4 a distributed network of heritage information

  1. 1. A distributed network of digital heritage information Enno Meijers Semantics Conference – Amsterdam – 12 September 2017
  2. 2. Contents • Introduction to Dutch Digital Heritage Network • Problems with the current infrastructure • Strategies for improvement • Building the distributed network
  3. 3. Introduction to Dutch Digital Heritage Network
  4. 4. Digital Heritage Network (NDE) aims at increasing the social value of the heritage information maintained by libraries, archives, museums and other cultural heritage institutions. Long term cooperation between the government and the institutions on national, regional and local level. It’s about information and people!
  5. 5. Three-layered approach for improving the sustainability, the usability and the visibility of digital heritage information. sustainable usable visible
  6. 6. In general - discovery of the “deep web” • Institutional repositories, collection management systems • Millions of ‘invisible’ datasets: publications, research data, heritage collections • Poor coverage by regular search engines • Metadata is key, describing physical materials or (licensed) digital content • Dutch cultural heritage sector: 1500 institutions, >>1500 collections • Demand for cross-institutional, cross-domain discovery • Many specialized portals giving access to different views
  7. 7. General setup of these portals
  8. 8. And even networks of aggregators...
  9. 9. Evaluating current infrastructure
  10. 10. Evaluating current approach Positive results so far: • many data sources available through OAI-PMH protocol • powerful and smart protocol for metadata synchronization • opened up data silos • created the need for aligning data models • made cross-collection and cross-domain discovery possible (e.g. Europeana) But there are two problems areas: • semantic alignment • data integration
  11. 11. Problem #1: Poor semantic alignment Not enough semantic alignment in the data sources: • lack sustainable URIs and shared identifiers • no shared terminology sources available • no provisions for linking between sources • implementations lack support for multiple data models • data is ‘flattened’ to a common data model (EDM, Dublin Core) • loss of meaning due to transformation  poor capabilities for cross-collection, cross-domain discovery  cleaning, aligning and enriching is needed after harvesting
  12. 12. Problem #2: Inefficient data integration Physical data integration based on OAI–PMH (= copying) • synchronizing with the sources is hard work • ownership, licensing, provenance, control over access are difficult topics • no feedback loop to the data source (usage, cleaning, enrichments) • data source owner and end user are disconnected • centralized model leads to scalability problems • OAI-PMH is not a web-centric protocol See also: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017) Herbert Van de Sompel - Reminiscing About 15 Years of Interoperability Efforts - D-lib Magazine - December (2015)
  13. 13. Strategies for improvement
  14. 14. • build portals as views based on a common data layer • minimize the intermediate layers • refer to the source instead of copying • support decentralized discovery • maximize the usability of data at the source • develop a sustainable, ‘webcentric’ solution • use HTTP, RDF and RESTful APIs as building blocks => implement the Linked Data principals Inspired by the work of Ruben Verborgh, Herbert Van de Sompel and colleagues: See for example: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017) Design principles for a discovery infrastructure
  15. 15. At the data source level: • use sustainable URIs to identify the resources • use formal definitions for persons, places, concepts, events (API) • use domain vocabularies / data models to describe the data • add support for cross-domain discovery (EDM, Schema.org,...) At the network level: • create a ‘network of terms’ for shared entities • provide tools for aligning and linking • create alignments and links between different terminology sources • provide easy access for collection management systems (API) Implementing Linked Data principles
  16. 16. Building on previous work
  17. 17. But how will our Linked Data be found?
  18. 18. The Semantic Web is still a dream… #1  So discovery of Linked Data requires registering datasets?!
  19. 19. A tiny example...suppose a resource is defined as: museum_X:object1 a nde:painting ; dcterms:subject aat:windmill . For ‘browsable Linked Data’ you should(!) add the inverse relation [1],[2]: aat:windmill a skos:Concept ; skos:prefLabel “Windmolen“@nl ; dcterms:isSubjectOf museum_X:object1 . # “backlinks” => a Linked Data integration problem… The Semantic Web is still a dream… #2 [1]: Tim Berner’s Lee on ‘browsable linked data’ (2006) [2]: Tom Heath and Christian Bizer on ‘Incoming Links’ (2011)
  20. 20. 1. Only semantic integration • just implement schema.org, let search engines ‘infer’ the relations • is the data interesting enough for Google? • what about special thematic or regional views? how about reuse? • can we reuse the results of the integration? (NO!) 2. Physical integration: • aggregate all the related Linked Data sources • build large triplestore and infer the relations • but like OAI-PMH, based on copying data Special case: LOD Laundromat Comparing Linked Data integration approaches…
  21. 21. ‘Traditional’ solutions to federated querying not feasible: - publishing Linked Data in triplestores is hard for small data providers - service is vulnerable because of rich functionality - federated querying over SPARQL endpoints performs poorly Follow the Linked Data Fragments approach : - Linked Data available through Triple Pattern Fragments interface - easier to implement for data providers - federated querying is supported, even SPARQL - more complexity at network level is acceptable - even support for time-based versions (Memento) 3. Virtual integration by federated approach See also: Miel Vander Sande et al. , (2017) Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation
  22. 22. Use the backlinks to support the discovery process: See also: Miel Vander Sande et al. (2016) Hypermedia-Based Discovery for Source Selection Using Low-Cost Linked Data Interfaces (IJSWIS) 12(3) 79–110 More advanced: data source profiling or dataset summaries Federated querying needs source selection…
  23. 23. To make discovery of Linked Data work: 1. Register organizations and datasets 2. Build a knowledge graph with backlinks for resource discovery Implementations will depend on capabilities of cultural heritage institutions
  24. 24. Building the distributed network of heritage information
  25. 25. Strategy for our distributed network 1. build a knowledge graph for Dutch digital heritage entities 2. improve the usability of the data source: - align object descriptions with shared entities - publish data as Linked Data 3. build a discovery infrastructure: - register organizations and datasets in a registry - build knowledge graph to support discovery (backlinks) 4. implement virtual data integration technology : - use registry and knowledge graph for selecting the resources - support federated querying (or selective aggregation) semantic alignment data integration
  26. 26. https://github.com/netwerk-digitaal-erfgoed/high-level-design High-level design of our discovery infrastructure
  27. 27. Roadmap • Start with the existing (OAI-PMH) based infrastructure • Build registry for organizations and datasets • Build network of terms to provide shared entities for discovery • Upgrade object descriptions with URIs • Make aggregators Linked Data compliant • Build knowledge graph with backlinks for discovery • Support federated querying (or selective harvesting) • Make collection management systems Linked Data compliant • Transform aggregators to service portals for discovery
  28. 28. Thank you for your attention! please share your thoughts with us... email: enno.meijers at kb.nl twitter, slideshare: ennomeijers https://github.com/netwerk-digitaal-erfgoed

×