EuropeanaTech 2018: A distributed network of digital heritage information
1. A distributed network of digital
heritage information
Enno Meijers / 16 May 2018 / Rotterdam
2. Contents
1. Introduction to Dutch Digital Heritage Network (NDE)
2. Design principles
3. Challenges in the semantic web
4. Discovery of Linked Data / pilot with Europeana
5. Global design of our distributed network infrastructure
3.
4. The Digital Heritage Network (NDE) aims at
increasing the social value of the heritage
information maintained by libraries, archives,
museums and other cultural heritage institutions.
This strategy offers a perspective on developing
a national, cross-sector infrastructure of digital
heritage facilities.
It focuses on long term cooperation between the
government and the institutions on national,
regional and local level. It is about organizing the
network of people and information!
National Digital Heritage strategic plan (2015)
5. The Digital Heritage Network
is developing a three-layered
approach for improving the
sustainability, the usability
and the visibility of digital
heritage information.
sustainable
usable
visible
6.
7. Challenges in the current approach
Two main problems areas:
• poor semantic alignment
• inefficient data integration
See also:
Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - Journal of Documentation (2017)
Herbert Van de Sompel - Reminiscing About 15 Years of Interoperability Efforts - D-lib Magazine - December (2015)
9. Rethink the network:
• maximize the usability of data at the source
• refer to data instead of copying
• build service portals as views based on a common data layer
• minimize the intermediate layers
In general:
• open and sustainable network based on Linked Data (and FAIR?) principles
• build on ‘web-centric’ technologies (HTTP, RDF, Web API’s)
• apply decentralized/distributed technologies where possible
Inspired by the work of Ruben Verborgh, Herbert Van de Sompel and colleagues:
See for example: Miel Vander Sande et al. , Towards sustainable publishing and querying of distributed Linked Data archives - (JoD 2017)
Design principles for the discovery infrastructure
10. “You can only be free if you follow rules…”
At the data source level:
• use sustainable URIs to identify the resources
• use formal definitions for persons, places, concepts, events
• use domain data models to describe the data
• add support for cross-domain discovery (Europeana Data Model, Schema.org,...)
• publish the collection information as Linked Data
=> IT vendors are strategic partners for the implementation!
Implementing Linked Data principles (1)
11. At the network level:
• build a ‘network of terms’ for shared terminology
• provide tools for alignment and linking
• work on alignments between different terminology sources
• provide access to shared terminology for collection management systems (API)
• guided by the Digital Heritage Reference Architecture framework (DERA)
=> Align corporate, domain and network strategies!
Implementing Linked Data principles (2)
13. A tiny example...suppose a resource is defined as:
museum_X:object1
a nde:painting ;
dct:subject aat:windmill .
For ‘browsable Linked Data’ you should(!) add the inverse relation [1],[2]:
aat:windmill
a skos:Concept ;
skos:prefLabel “Windmill“@en ;
dct:isSubjectOf museum_X:object1 .
=> the missing “backlinks” - a Linked Data integration problem
[1]: Tim Berner’s Lee on ‘browsable linked data’ (2006) [2]: Tom Heath and Christian Bizer on ‘Incoming Links’ (2011)
The Semantic Web is still a dream… #2
aat:windmill
aat:windmill
14. A. semantic integration only:
1. publish the data as Linked Data (i.e. schema.org)
B. semantic and physical integration
1. aggregate linked datasets into an regular aggregator (using OAI-PMH)
2. aggregate linked datasets into a central triplestore
3. sync linked data in distributed triplestores (using ResourceSync, ...)
C. semantic and virtual integration:
1. federated querying over distributed triplestores (federated SPARQL)
2. federated querying over distributed linked datasets (published as HDT)
=> the NDE network will support a mix of these approaches...
Possible approaches for discovery of Linked Data...
15. Europeana pilot with Schema.org harvesting
Pilot with Europeana R&D, KB and Digital Heritage Network (NDE):
● part of the Rise of Literacy project (digitized books, images)
● metadata published by KB as Linked Data in Schema.org
● defined as datasets with VoID description and Schema.org/Dataset
● experimentally harvested and translated to EDM by Europeana R&D
● fed into the regular ingestion process
● evaluation of data quality and processes
● implementation plan for NDE infrastructure
16. Approach:
• publish Linked Data using HDT technology
• use Linked Data Fragments technology for integration
Pros:
• easy implementation, even for small data providers
• no duplication of data
• possible support for time-based versions (Memento)
Cons:
• more difficult to process the result
• dynamic source selection is necessary
See also: Miel Vander Sande et al. , (2017) Towards sustainable publishing and querying of distributed Linked Data archives -
Journal of Documentation
Virtual integration - using LDF
17. Source selection problem:
• querying many data sources at the same
time is not realistic…
Solution:
• build a Knowledge Graph with backlinks to
support the discovery process
• select relevant sources for querying based
on the Knowledge Graph
See also: Miel Vander Sande et al. (2016) Hypermedia-Based Discovery for Source Selection Using Low-Cost Linked Data Interfaces
(IJSWIS) 12(3) 79–110
*More advanced:
data source profiling or dataset summaries
But federation needs selection of sources…
18. Strategy for our distributed network
1. build a service for shared terminology for Dutch digital heritage
2. improve the usability of the data source:
- align object descriptions with shared terminology
- publish data as Linked Data
3. build a discovery infrastructure:
- register organizations and datasets in a (automated) registry
- build knowledge graph to support discovery (“backlinks”)
4. support mix of linked data integration technologies :
- use registry and knowledge graph for selecting the resources
- include support selective aggregating and federated querying
semantic
alignment
data
integration
20. Aiming for a long term “network effect”
• Building the network: working with many institutions in multiple projects
• Prototyping infrastructure components
• Design cross-domain services: network of terms, registry
• Research on Linked Data integration technologies
• Developing showcases for adoption of the Linked Data principles
• AdamNet (adamlink.nl), Zuiderzeecollectie.nl, Netwerk Oorlogsbronnen, …
• Aligning with (international) developments Digital Humanities and Semantic Web
• Guided by the Digital Heritage Reference Architecture framework (DERA)
21. Thank you for your attention!
please share your thoughts with us...
email: enno.meijers at kb.nl
twitter/slideshare: ennomeijers
for more information:
EuropeanaTech Insight article March 2018
http://www.netwerkdigitaalerfgoed.nl/
https://github.com/netwerk-digitaal-erfgoed