Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage
data: an overview of the
approaches from Europeana and
The European Library
Nuno Freire
Chief data officer
Pacific Neighbourhood Consortium
2014 Annual Conference
Taipei, October 2014

Outline
 Introduction and Context
• The European Library
• Europeana
• The data model for metadata exchange in the
Europeana network
 Linked Data at The European Library
• Managing and linking person names
• Managing and linking place names
• Managing and linking concepts

Introduction and
context
www.theeuropeanlibrary.org

What is The European
Library?
 Project started 1996, full operational service
from 2005
 European hub of metadata, collections and
increasing amount of full text
 Membership of national and research libraries of
47 Council of Europe states
 Non-profit, owned and managed by member
libraries

What does The European
Library offer?
Experienced
European
project partner
Large-scale
aggregation
Infrastructure
Data and
digital content
of Europe’s
libraries
Data
distribution
Data
enrichment
Linked open
data

EUROPEANA - Europe’s cultural heritage
portal
 32.6m records from
2,300 European
galleries, museums,
archives and libraries
 Books, newspapers,
journals, letters, diaries,
archival papers
 Paintings, maps,
drawings, photographs
 Music, spoken word,
radio broadcasts
 Film, newsreels,
television
 Curated exhibitions
 31 languages

The European Library as
libraries aggregator to Europeana
Domain Aggregators National initiatives
Audiovisual
collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
e.g.
Culture
Grid,
Culture.fr
e.g. Musées
Lausannois
e.g. The
European
Library
e.g. APEX
e.g. EUScreen,
European Film
Gateway
e.g. Judaica Europeana,
Europeana Fashion

Metadata in the Europeana Context
 Provides a portal for users to access that data
• Metadata, previews and links to source
 Makes the metadata freely available for anyone to re-use
• Under Creative Commons Zero (CC0) -public domain dedication
 Makes metadata available via an API
 Makes metadata available as Linked Open Data
• http://data.europeana.eu/

Europeana Data Model: a Collaborative Effort
Cross-community development
Involving library, archive and museum experts
Ca. 60 participants
http://pro.europeana.eu/edm-documentation

Europeana Data Model: general principles
• A cross domain approach
• Supporting the common semantics of cultural domains
• Addressing the requirements of the Europeana portal
• Adheres to the modeling principles of the Web of Data
• Available as an OWL ontology and XML schema
• Allows finer-grained models of the different domains to be at least
partly interoperable at the semantic level
• Allows metadata to retain their original expressivity and richness

Linked Data at
Managing and linking person names

Which data from VIAF is used at The
European Library
 Name variants
Various forms of the name of the person or organization.
May include the complete name, abbreviated names,
acronyms, etc.
 Date of birth/death
The dates of birth and death of the person
 Nationalities
The nationalities of a person or organization.

How data from VIAF is used
in The European Library
 Name variants
• For matching of names across records and data sources
• Improves the identification of all publications of a work, the
identification of publications in books-in-print databases, and the
identification of the contributor in the rights-holders databases.
 Date of birth/death
• Used for determining the public domain status.
• Used for matching confirmation and disambiguation of
homonyms across data sources
 Nationalities
• Used, in some countries, for determining the public domain
status of the work.

The matching process
 VIAF data used for matching,
disambiguation, and match probability

Matching work contributors with VIAF
 Names are matched by similarity
 Confirmation of the correctness of a name
match is taken from other matching data
• The dates of birth and death
• The title of the work is compared against the list
of titles available in VIAF
• All the contributors of the work are matched
against the list of known co-authors in VIAF
• The publisher(s) of the work are matched against
the list of known publishers in VIAF
 A match is only chosen if enough supporting
evidence is found

Contributor names in statements of
responsibility
“French Canadian freely arranged by Katherine K. Davis”.
“ed. by Peter Noever ; with a forew. by Frank O. Gehry; and contrib. by
Coop Himmelblau.”
“W. Lange, A.C. Zeven and N.G. Hogenboom, editors”
“by Pamela and Neal Priestland”
“Vicente Aleixandre ; estudio previo, selección y notas de Leopoldo de
Luis”

The approach
 To approach the problem as a Named Entity Recognition
task in text that may not be grammatically correct, thus
lacking lexical evidence
 Some requirements from the ARROW context
• Easily applicable to several languages
• The outcomes of the recognition task must be explainable
 Design decisions
• Exploring the structured data within national bibliographies
• By analysis of the frequency of word occurrences in names of
persons, and in other textual data
• Using word occurrence frequency allows to
• bypass the need for building training sets
• be able to provide simpler explanations of the name recognition
results

The process – bibliographic record
processing
 The named entity recognition is performed for a
record as follows:
• Statement of responsibility is tokenized
• The person names are recognized by comparing the
tokens with the dictionaries
• The recognized names are compared against the
names of the contributors present in the structured
fields of the record.
• If no similar name exists in the record, the contributor
is added to the record in a structured data field

Evaluation data set
(size of bibliographies and evaluation samples)
National Bibliography Total
records
Main
language
Evaluation sample
Statements of
responsibility
Referred
Persons
British Library 13.4 million English 205 328
German National
Library 9.4 million German 200 378
National Library of the
Netherlands 3.2 million Dutch 200 335
National Library of
Greece 0.4 million Greek 297 379
Central Institute for the
Union Catalogue of
12.4 million Italian 224 297
Italian Libraries
Royal Library of
Belgium 1 million French and
Dutch 203 387
Total: 1329 2104

Evaluation results
Dataset
Exact match
metric
Partial match
metric
Precision Recall Precision Recall
British Library 0.981 0.979 0.991 0.991
German National Library 0.975 0.934 0.992 0.992
National Library of the
Netherlands 0.973 0.875 0.977 0.979
National Library of
Greece 0.656 0.414 0.758 0.868
Central Institute for the
Union Catalogue of
0.97 0.896 0.971 0.973
Italian Libraries
Royal Library of Belgium 0.981 0.959 0.981 0.982
Overall: 0.948 0.837 0.958 0.963

Linked Data at
Managing and linking place names

The approach for place name
linking
• We process the complete metadata elements
• The alignment is performed with Geonames
• Using the RDF dump of Geonames
• A generic approach not using any language
specific information
• The words themselves are not used as evidence
• We use only characteristics of the words (capitalization, size,
etc)
• Wordnets, part-of-speech analysis, morphological
analysis, etc., are not used.
• … in order to allow the use of this approach in a
language independent manner

Resolution of the place names
• This task aims to find a single entity in the geographic
ontology for aligning with the place name
• The first step of this task is to find all possible
candidates for the resolution in the geographic
ontology
• Uses a heuristic based predictive model:
• Assigns a probability for each resolution candidate as match
or non_match
• An alignment is established if a minimum probability
threshold for the class match is achieved.

Which information supports the place
name resolution
Feature Description
Number of words The number of words in the place name.
Name match If the recognized place name matched: the main name of the
place, an alternate name, etc.
Exact name
match
If the recognized place name matched exactly the place
name.
Relative
population
Relative population of the candidate in comparison with other
candidates.
Geographic
feature type
The type of geographic feature: continent, country, city, etc.
Related places
found
The number of other place names found in the
administrative hierarchy.
Relative related
places
The relative number of administrative divisions found in the
subject heading
In source country If it is located in one of the source countries of the subject
heading system.

Linked Data at
Managing and linking concepts

Linking Subject Indexing and
Classification Data
 The context
• The centralization of bibliographic metadata enables
resource access under a unified knowledge organization
system
 The challenges
• Diversity of languages
• Diversity of knowledge organization systems in use across
European libraries
• Heterogeneous levels of details in subject information
 Current status at The European Library
• Use of alignments between ontologies:
• Alignments were created manually or semi-automatically
• Alignments in use include: CERIF, MACS (LCSH,
RAMEAU, SWD), UDC and DDC

References
Further details may be consulted in the following publications:
•Freire, N, 2014, 'Word Occurrence Based Extraction of Work Contributors from Statements
of Responsibility'. International Journal on Digital Libraries: Volume 14, Issue 3 (2014),
Page 141-148. DOI: 10.1007/s00799-014-0113-3.
•Charles, V., Freire, N, Antoine, I., 2014, 'Links, languages and semantics: linked data
approaches in The European Library and Europeana', in 'Linked Data in Libraries: Let's
make it happen!' IFLA 2014 Satellite Meeting on Linked Data in Libraries.
•Freire, N, Muhr, M, 2013, 'Use of Authorities Open Data in the ARROW Rights
Infrastructure' in proceeding of the DC-2013 Linking to the Future Conference, 2013.
•Freire, N, 2013, 'Visualization and navigation of knowledge in pan-European resources: the
case of The European Library' in proceedings of International UDC Seminar on
Classification & Visualization: interfaces to knowledge.
•N. Freire, et al., "Author Consolidation across European National Bibliographies and
Academic Digital Repositories", 11th International Conference on Current Research
Information Systems, 2012.
•N. Freire, J. Borbinha, P. Calado, "A Language Independent Approach for Aligning Subject
Heading Systems with Geographic Ontologies", International Conference on Dublin Core
and Metadata Applications 2011, 2011.
•N. Freire, J. Borbinha, P. Calado, B. Martins, "A Metadata Geoparsing System for Place
Name Recognition and Resolution in Metadata Records", ACM/IEEE Joint Conference on
Digital Libraries, 2011.

Thank you
Nuno Freire
Chief data officer
nuno.freire@theeuropeanlibrary.org

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Similar to Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library (20)

More from The European Library

More from The European Library (20)

Recently uploaded

Recently uploaded (20)

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Editor's Notes