Europeana provides access to digital resources from a wide range of cultural heritage institutions all across Europe. In order to support Europeana, a wide network of organizations collaborates in data integration activities. The European Library plays the role of library-domain aggregator for Europeana, and its activities include also being a gateway to the collections and data of Europe’s national and research libraries, operating on the principle of open data for re-use.
The Europeana Network addresses its data integration challenges by leveraging on Linked Data and the Semantic Web. Its approach to data integration is based in a single data model, the Europeana Data Model, which embraces the Semantic Web principles to integrate the various data models and ontologies used in cultural heritage data.
The paradigm of Linked Data, brings many new challenges to libraries. The generic nature of data representation used in Linked Data, while allowing any community to manipulate the data, also opens many paths for implementation, with no clear optimal choice for libraries. The European Library leverages on its operational infrastructure to make library data available. It maintains The European Library Open Dataset, which is derived from the data aggregated from member libraries, and made available under the Creative Commons CC0 1.0 Universal license, in order to promote and facilitate its reuse by any community.
Extensive linking is performed in the preparation of The European Library Open Dataset. It relies on Information Extraction and Data Mining to establish links to external open datasets, covering the most prominent entities types present in library data: persons, corporate bodies, places, concepts, intellectual works and manifestations.
The European Library also applies a linked data approach for intellectual property rights clearance processes, for supporting mass digitization projects. This approach is applied in the within the European ARROW rights infrastructure .
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
1.
2. Linked Data and cultural heritage
data: an overview of the
approaches from Europeana and
The European Library
Nuno Freire
Chief data officer
The European Library
Pacific Neighbourhood Consortium
2014 Annual Conference
Taipei, October 2014
3. Outline
Introduction and Context
• The European Library
• Europeana
• The data model for metadata exchange in the
Europeana network
Linked Data at The European Library
• Managing and linking person names
• Managing and linking place names
• Managing and linking concepts
5. What is The European
Library?
Project started 1996, full operational service
from 2005
European hub of metadata, collections and
increasing amount of full text
Membership of national and research libraries of
47 Council of Europe states
Non-profit, owned and managed by member
libraries
6.
7. What does The European
Library offer?
Experienced
European
project partner
Large-scale
aggregation
Infrastructure
Data and
digital content
of Europe’s
libraries
Data
distribution
Data
enrichment
Linked open
data
9. EUROPEANA - Europe’s cultural heritage
portal
32.6m records from
2,300 European
galleries, museums,
archives and libraries
Books, newspapers,
journals, letters, diaries,
archival papers
Paintings, maps,
drawings, photographs
Music, spoken word,
radio broadcasts
Film, newsreels,
television
Curated exhibitions
31 languages
10. The European Library as
libraries aggregator to Europeana
Domain Aggregators National initiatives
Audiovisual
collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
e.g.
Culture
Grid,
Culture.fr
e.g. Musées
Lausannois
e.g. The
European
Library
e.g. APEX
e.g. EUScreen,
European Film
Gateway
e.g. Judaica Europeana,
Europeana Fashion
11. Metadata in the Europeana Context
Provides a portal for users to access that data
• Metadata, previews and links to source
Makes the metadata freely available for anyone to re-use
• Under Creative Commons Zero (CC0) -public domain dedication
Makes metadata available via an API
Makes metadata available as Linked Open Data
• http://data.europeana.eu/
12. Europeana Data Model: a Collaborative Effort
Cross-community development
Involving library, archive and museum experts
Ca. 60 participants
http://pro.europeana.eu/edm-documentation
13. Europeana Data Model: general principles
• A cross domain approach
• Supporting the common semantics of cultural domains
• Addressing the requirements of the Europeana portal
• Adheres to the modeling principles of the Web of Data
• Available as an OWL ontology and XML schema
• Allows finer-grained models of the different domains to be at least
partly interoperable at the semantic level
• Allows metadata to retain their original expressivity and richness
14. Linked Data at
The European Library
Managing and linking person names
15. Which data from VIAF is used at The
European Library
Name variants
Various forms of the name of the person or organization.
May include the complete name, abbreviated names,
acronyms, etc.
Date of birth/death
The dates of birth and death of the person
Nationalities
The nationalities of a person or organization.
16. How data from VIAF is used
in The European Library
Name variants
• For matching of names across records and data sources
• Improves the identification of all publications of a work, the
identification of publications in books-in-print databases, and the
identification of the contributor in the rights-holders databases.
Date of birth/death
• Used for determining the public domain status.
• Used for matching confirmation and disambiguation of
homonyms across data sources
Nationalities
• Used, in some countries, for determining the public domain
status of the work.
17. The matching process
VIAF data used for matching,
disambiguation, and match probability
18. Matching work contributors with VIAF
Names are matched by similarity
Confirmation of the correctness of a name
match is taken from other matching data
• The dates of birth and death
• The title of the work is compared against the list
of titles available in VIAF
• All the contributors of the work are matched
against the list of known co-authors in VIAF
• The publisher(s) of the work are matched against
the list of known publishers in VIAF
A match is only chosen if enough supporting
evidence is found
19. Contributor names in statements of
responsibility
“French Canadian freely arranged by Katherine K. Davis”.
“ed. by Peter Noever ; with a forew. by Frank O. Gehry; and contrib. by
Coop Himmelblau.”
“W. Lange, A.C. Zeven and N.G. Hogenboom, editors”
“by Pamela and Neal Priestland”
“Vicente Aleixandre ; estudio previo, selección y notas de Leopoldo de
Luis”
20. The approach
To approach the problem as a Named Entity Recognition
task in text that may not be grammatically correct, thus
lacking lexical evidence
Some requirements from the ARROW context
• Easily applicable to several languages
• The outcomes of the recognition task must be explainable
Design decisions
• Exploring the structured data within national bibliographies
• By analysis of the frequency of word occurrences in names of
persons, and in other textual data
• Using word occurrence frequency allows to
• bypass the need for building training sets
• be able to provide simpler explanations of the name recognition
results
21. The process – bibliographic record
processing
The named entity recognition is performed for a
record as follows:
• Statement of responsibility is tokenized
• The person names are recognized by comparing the
tokens with the dictionaries
• The recognized names are compared against the
names of the contributors present in the structured
fields of the record.
• If no similar name exists in the record, the contributor
is added to the record in a structured data field
22. Evaluation data set
(size of bibliographies and evaluation samples)
National Bibliography Total
records
Main
language
Evaluation sample
Statements of
responsibility
Referred
Persons
British Library 13.4 million English 205 328
German National
Library 9.4 million German 200 378
National Library of the
Netherlands 3.2 million Dutch 200 335
National Library of
Greece 0.4 million Greek 297 379
Central Institute for the
Union Catalogue of
12.4 million Italian 224 297
Italian Libraries
Royal Library of
Belgium 1 million French and
Dutch 203 387
Total: 1329 2104
23. Evaluation results
Dataset
Exact match
metric
Partial match
metric
Precision Recall Precision Recall
British Library 0.981 0.979 0.991 0.991
German National Library 0.975 0.934 0.992 0.992
National Library of the
Netherlands 0.973 0.875 0.977 0.979
National Library of
Greece 0.656 0.414 0.758 0.868
Central Institute for the
Union Catalogue of
0.97 0.896 0.971 0.973
Italian Libraries
Royal Library of Belgium 0.981 0.959 0.981 0.982
Overall: 0.948 0.837 0.958 0.963
24. Linked Data at
The European Library
Managing and linking place names
25. The approach for place name
linking
• We process the complete metadata elements
• The alignment is performed with Geonames
• Using the RDF dump of Geonames
• A generic approach not using any language
specific information
• The words themselves are not used as evidence
• We use only characteristics of the words (capitalization, size,
etc)
• Wordnets, part-of-speech analysis, morphological
analysis, etc., are not used.
• … in order to allow the use of this approach in a
language independent manner
26. Resolution of the place names
• This task aims to find a single entity in the geographic
ontology for aligning with the place name
• The first step of this task is to find all possible
candidates for the resolution in the geographic
ontology
• Uses a heuristic based predictive model:
• Assigns a probability for each resolution candidate as match
or non_match
• An alignment is established if a minimum probability
threshold for the class match is achieved.
27. Which information supports the place
name resolution
Feature Description
Number of words The number of words in the place name.
Name match If the recognized place name matched: the main name of the
place, an alternate name, etc.
Exact name
match
If the recognized place name matched exactly the place
name.
Relative
population
Relative population of the candidate in comparison with other
candidates.
Geographic
feature type
The type of geographic feature: continent, country, city, etc.
Related places
found
The number of other place names found in the
administrative hierarchy.
Relative related
places
The relative number of administrative divisions found in the
subject heading
In source country If it is located in one of the source countries of the subject
heading system.
28. Linked Data at
The European Library
Managing and linking concepts
29. Linking Subject Indexing and
Classification Data
The context
• The centralization of bibliographic metadata enables
resource access under a unified knowledge organization
system
The challenges
• Diversity of languages
• Diversity of knowledge organization systems in use across
European libraries
• Heterogeneous levels of details in subject information
Current status at The European Library
• Use of alignments between ontologies:
• Alignments were created manually or semi-automatically
• Alignments in use include: CERIF, MACS (LCSH,
RAMEAU, SWD), UDC and DDC
30. References
Further details may be consulted in the following publications:
•Freire, N, 2014, 'Word Occurrence Based Extraction of Work Contributors from Statements
of Responsibility'. International Journal on Digital Libraries: Volume 14, Issue 3 (2014),
Page 141-148. DOI: 10.1007/s00799-014-0113-3.
•Charles, V., Freire, N, Antoine, I., 2014, 'Links, languages and semantics: linked data
approaches in The European Library and Europeana', in 'Linked Data in Libraries: Let's
make it happen!' IFLA 2014 Satellite Meeting on Linked Data in Libraries.
•Freire, N, Muhr, M, 2013, 'Use of Authorities Open Data in the ARROW Rights
Infrastructure' in proceeding of the DC-2013 Linking to the Future Conference, 2013.
•Freire, N, 2013, 'Visualization and navigation of knowledge in pan-European resources: the
case of The European Library' in proceedings of International UDC Seminar on
Classification & Visualization: interfaces to knowledge.
•N. Freire, et al., "Author Consolidation across European National Bibliographies and
Academic Digital Repositories", 11th International Conference on Current Research
Information Systems, 2012.
•N. Freire, J. Borbinha, P. Calado, "A Language Independent Approach for Aligning Subject
Heading Systems with Geographic Ontologies", International Conference on Dublin Core
and Metadata Applications 2011, 2011.
•N. Freire, J. Borbinha, P. Calado, B. Martins, "A Metadata Geoparsing System for Place
Name Recognition and Resolution in Metadata Records", ACM/IEEE Joint Conference on
Digital Libraries, 2011.
31. Thank you
Nuno Freire
Chief data officer
nuno.freire@theeuropeanlibrary.org
Editor's Notes
The European Library has a long history as an online service, starting as a project in 1996 and launching as a full operational service in 2005. Its membership base comprises all the national libraries of Europe, based on the Council of Europe’s 47 states, and a significant number of research libraries. The service is owned and run by its members, with representatives from the pan-European library organisations, including LIBER, on the board. The European Library is a single online gateway to the resources of Europe’s libraries.
The European Library has a very large dataset of some 200 million bibliographic records, representing Europe’s bibliography; 26 million digitised object;, millions of pages of digitised text. The European Library aggregates all types of data and content provided by libraries, including respository text and data. It has a large-scale aggregation infrastructure, and ingests, indexes, enriches and clusters significant amounts of data and content. Metadata is open and distributed as data dumps and APIs and placed in the workflows and systems used by researchers. The data is also provided as linked open datasets. The European Library is an experienced partner in European projects, including project co-ordination. Let’s see some examples.
At a working level, we operate in a network of aggregators. We can’t work directly with 2,200 organisations, so we rely on aggregators to
collect data, harmonise it, and deliver to Europeana.
Aggregators are important because they share a background with the organisations whose content they bring together, so there is close understanding.The aggregation model enables Europeana to collect huge quantities of data from thousands of providers, through only a handful of channels.