Linked Open Data Projects for Cultural Heritage:
Evolution of an Information Technology
Julia Marsden – Carolyn Li-Madeo
Jeff Edelstein – Noreen Whysel
Lola Galla– Alison Rhonemus
Cultural Heritage: Description & Access
Pratt SILS LIS 670 – Spring 2013
Prof. Cristina Pattuelli
WHAT IS LINKED OPEN DATA?
Linked Data provides a mechanism
for representing databases (RDF)
and a mechanism for querying
those databases (SPARQL)*
Linked Open Data uses W3C
Semantic Web standards to create
relationships between previously
isolated data silos
Behind almost every website is a
database and although these sites
are linkable the information in their
databases is left unconnected
*From the New York Times’ OPEN blog
REVIEW OF TERMINOLOGIES
RDF Triple
Subject
Object
Predicate
URI
API
An Application Programming Interface
software
program
software
program
Allows software programs to interact
with one another
URL URN
U
R
I
Unique Resource Identifier
URI
SPARQL Query
• SPARQL Protocol and RDF Query Language
• Query language for RDF / Databases
• Allows users to write unambiguous queries
METHODOLOGY
•Affiliation / Mission / Intended Audience
•Knowledge Organization / Data Models & Vocabulary
•Technology Platform
•Usability/Interface Design
•Discovery (search & navigation)
•Data Shareability (ie. availability of an API)
•Sustainability (ie. digital preservation, documentation or available code)
•Project Leaders
•Funding Sources
•Level of Collaboration
•Analysis
•Star-Rating (ie Tim Berners-Lee's coffee cup)
Developing Datasets
Release one or more datasets in linked
open format, expressed as RDF triples,
that others may use.
Projects: Library of Congress; Pan-
Canadian Documentary Heritage
Network
Linking Data
Cultural heritage institutions link their datasets
to others (e.g., DBpedia, VIAF, GeoNames) to
enhance discovery and reuse of
their collections.
Projects: Hungarian National Library;
Civil War 150; Linking Lives;
Bibliothèque national de France
Documenting Processes for
Reuse
Explain linked open data and ways
that cultural heritage professionals
can use datasets.
Projects: New York Times;
Deutsche National Bibliothek
Developing User Interfaces
Institutional or collaborative projects use
the datasets to develop applications , including
interfaces, visualizations, and augmented reality.
Projects: Agora; Pan-Canadian Documentary Heritage
Network; Amsterdam Mobile City App; Linked Jazz
Promoting Reuse
Institutions go beyond the creation
of their own test projects, encouraging
users to develop innovative applications.
Projects: Open Cultuur Data, EUScreen
Expanding the Definition
of Cultural Heritage
Efforts from outside the cultural
heritage framework, such as
government agencies and
international aid organizations,
can serve to strengthen societies
and their cultural institutions.
Project: Open Data for Resilience
Initiative
LINKED DATA LIFE CYCLES
Stage 1. Developing Datasets
Pan-Canadian Documentary Heritage Network
• Formed in 2010; highly collaborative effort across a broad spectrum
of LAMs.
• Pilot project results published July 2012:
• RDF metadata
• Detailed project report
• Demonstration video, “Out of the Trenches”
• Project content submitted in various formats:
• War songs (MARC records; BAnQ)
• War posters (spreadsheets; McGill)
• Newspaper articles, postcards, and wartime records (MODS XML; University of Alberta)
• Portrait archives of CEF solders; WWI documents (spreadsheets; University of Calgary)
• Archival material from Saskatchewan War Experience Project (DC RDF; University of
Saskatchewan)
• Use of external LOD datasets:
• Geonames, VIAF, LCSH, TGM, Rameau, LACSH
• Metadata then mapped to ontologies (e.g., events, places,
persons)
• Principal findings:
• Good approach for resource integration and discovery
• Considered “reuse” in terms of using element sets in multiple
contexts (e.g., “role” as predicate or as object) and repurposing vocabularies
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
Dereferenceable URI
Name Variants
Related Terms
Promotes existing
Library of Congress
resources to Linked
Open Data web
resources, uncovers
and connects
related names and
terms
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LIBRARY OF CONGRESS
Multiple formats are
available for wider use
LC Classification Numbers
are related to each entry
Connects with and
acknowledges other
schemes
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 2. Linking Data
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CIVIL WAR DATA 150
Project was designed to
encourage the contribution of a
wide variety of data sources:
from institutions to individuals
Partnership between The
Archives of Michigan, The
Internet Archive and Freebase
Celebrating the
sesquicentennial of the
American Civil War
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CIVIL WAR DATA 150
Project Goals:
Create web apps to
enable users to add to or
modify shared metadata
with strong identifiers
Engage the public in the process of
interacting with and adding value to the data
Identify sources and map
metadata into Freebase
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
LOCAH and Linking Lives
• Projects of Archives Hub UK (http://archiveshub.ac.uk), which represents more than 220
institutions
• LOCAH (Linked Open Copac& Archives Hub; 2010-2011):
• Published data from Archives Hub finding aids and Copac, a union catalog of more than 70
major UK libraries
• Created LOD resources:
1. SPARQL endpoint
2. Query box for trying out SPARQL queries
3. RDF dump of the dataset
4. Archives HUB EAD to RDF XSLT stylesheet
• Linking Lives (2011-2012) expanded on LOCAH
• Test project focusing on biography
• Brought in more external datasets (Dbpedia, VIAF,
Freebase, OpenLibrary, BBC Programmes, Linked Open
British National Biography)
• Developed interface model (wireframe)
• Principal findings:
• Even when expressed in triples, data may lack uniformity, requiring time-consuming clean-up
• Difficulty of firmly establishing identity when there are variant forms of names or identifying
roles (e.g., “author” vs. “writer”) and when different people have the same name
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 3. Documenting Processes for Reuse
DEUTSCHE NATIONAL BIBLIOTEK
• Linked Data Service
• Library scientist led
• Authority names and
bibliographic data
• Downloadable dataset
• SRU and OAI/PMH interfaces
• Extensive documentation
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMES
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMES
The OPEN Blog
Documents and contextualizes the APIs
Platform for sharing Open Source Code
Forum for trouble shooting and ideas
Downloadable SKOS Files
The entire dataset is downloadable
Developers can also chose by topic
Users are invited to utilize the datasets
and APIs through downloads,
documentation, support and explanation
of LOD terminology, code and uses
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE NEW YORK TIMES
Available APIs
Developer Network
API Request Tool allows developers to
search through the expansive list of
APIs and set parameters for their search
using a widget. The tool then formats
the URL and request results
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 4. Developing User Interfaces
AUSTRALIAN WAR MEMORIAL
• Proof of concept
• Developer led
• Embedded RDF tags
• Page based API
• No documentation or
downloadable dataset
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
THE AMSTERDAM MUSEUM
• Mobile app parses data
from Amsterdam museum
and linked ontologies
• Proposal for visual
interface that enables
user to become tour guide
• Current problem: search
and download speed
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Out of the Trenches Demonstration Video
Subjects can be explored across a range of dimensions
Source: http://www.canadiana.ca/sites/pub.canadiana.ca/files/LOD-Demo-ENG_0.mp4
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 5. Promoting Reuse
OPEN CULTUUR DATA INITIATIVE
• Offered workshops on how cultural heritage orgs could open their
data
• Hosted hackathons to encourage developers to turn datasets into
apps
• Three award-winners:
• VISTORY (using LOD Open Images dataset)
• Rijksmonumenten.info
• Connected Collection
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
OPEN CULTUUR DATA INITIATIVE
Screenshot from
http://www.glimworm.com/vistory.shtml
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
EUSCREEN
• Linked Data Pilot
• International collaboration
• Open, International standards
• Downloadable datasets
• Fully documented
• Showcase of projects in blog
• Active in promoting reuse
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
Stage 6. Expanding the Definition of
Cultural Heritage
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
CONCLUSIONS
• (Most) LOD projects:
• Proof of concept
• No access to a dataset
• Not highly documented
• Highly curated
• Experimental
• Promising
• The number of LOD datasets continues to increase
• Actual use by cultural heritage institutions appears to remain limited
• Trust remains an obstacle
• Compare: “A guppy is_a_Kind_of fish” (TRUE)
“A pony is_a_Kind_offish" (UNTRUE)
Computers see these as equally valid.
• Verifying or identifying source of a statement may become a best practice
• Information added to triples?
“A guppy is_a_Kind_offish [source] DBpedia”
• Published datasets hold great potential for making the content of an archive's collections
known
• Researcher studying Person A finds that a collection of Person X's letters includes letters
to or from Person A

Linked Open Data for Cultural Heritage

  • 1.
    Linked Open DataProjects for Cultural Heritage: Evolution of an Information Technology Julia Marsden – Carolyn Li-Madeo Jeff Edelstein – Noreen Whysel Lola Galla– Alison Rhonemus Cultural Heritage: Description & Access Pratt SILS LIS 670 – Spring 2013 Prof. Cristina Pattuelli
  • 2.
    WHAT IS LINKEDOPEN DATA? Linked Data provides a mechanism for representing databases (RDF) and a mechanism for querying those databases (SPARQL)* Linked Open Data uses W3C Semantic Web standards to create relationships between previously isolated data silos Behind almost every website is a database and although these sites are linkable the information in their databases is left unconnected *From the New York Times’ OPEN blog
  • 3.
    REVIEW OF TERMINOLOGIES RDFTriple Subject Object Predicate URI API An Application Programming Interface software program software program Allows software programs to interact with one another URL URN U R I Unique Resource Identifier URI SPARQL Query • SPARQL Protocol and RDF Query Language • Query language for RDF / Databases • Allows users to write unambiguous queries
  • 4.
    METHODOLOGY •Affiliation / Mission/ Intended Audience •Knowledge Organization / Data Models & Vocabulary •Technology Platform •Usability/Interface Design •Discovery (search & navigation) •Data Shareability (ie. availability of an API) •Sustainability (ie. digital preservation, documentation or available code) •Project Leaders •Funding Sources •Level of Collaboration •Analysis •Star-Rating (ie Tim Berners-Lee's coffee cup)
  • 5.
    Developing Datasets Release oneor more datasets in linked open format, expressed as RDF triples, that others may use. Projects: Library of Congress; Pan- Canadian Documentary Heritage Network Linking Data Cultural heritage institutions link their datasets to others (e.g., DBpedia, VIAF, GeoNames) to enhance discovery and reuse of their collections. Projects: Hungarian National Library; Civil War 150; Linking Lives; Bibliothèque national de France Documenting Processes for Reuse Explain linked open data and ways that cultural heritage professionals can use datasets. Projects: New York Times; Deutsche National Bibliothek Developing User Interfaces Institutional or collaborative projects use the datasets to develop applications , including interfaces, visualizations, and augmented reality. Projects: Agora; Pan-Canadian Documentary Heritage Network; Amsterdam Mobile City App; Linked Jazz Promoting Reuse Institutions go beyond the creation of their own test projects, encouraging users to develop innovative applications. Projects: Open Cultuur Data, EUScreen Expanding the Definition of Cultural Heritage Efforts from outside the cultural heritage framework, such as government agencies and international aid organizations, can serve to strengthen societies and their cultural institutions. Project: Open Data for Resilience Initiative LINKED DATA LIFE CYCLES
  • 6.
  • 7.
    Pan-Canadian Documentary HeritageNetwork • Formed in 2010; highly collaborative effort across a broad spectrum of LAMs. • Pilot project results published July 2012: • RDF metadata • Detailed project report • Demonstration video, “Out of the Trenches” • Project content submitted in various formats: • War songs (MARC records; BAnQ) • War posters (spreadsheets; McGill) • Newspaper articles, postcards, and wartime records (MODS XML; University of Alberta) • Portrait archives of CEF solders; WWI documents (spreadsheets; University of Calgary) • Archival material from Saskatchewan War Experience Project (DC RDF; University of Saskatchewan) • Use of external LOD datasets: • Geonames, VIAF, LCSH, TGM, Rameau, LACSH • Metadata then mapped to ontologies (e.g., events, places, persons) • Principal findings: • Good approach for resource integration and discovery • Considered “reuse” in terms of using element sets in multiple contexts (e.g., “role” as predicate or as object) and repurposing vocabularies developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 8.
    LIBRARY OF CONGRESS developingdatasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 9.
    LIBRARY OF CONGRESS DereferenceableURI Name Variants Related Terms Promotes existing Library of Congress resources to Linked Open Data web resources, uncovers and connects related names and terms developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 10.
    LIBRARY OF CONGRESS Multipleformats are available for wider use LC Classification Numbers are related to each entry Connects with and acknowledges other schemes developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 11.
  • 12.
    developing datasets –linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 13.
    developing datasets –linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 15.
    CIVIL WAR DATA150 Project was designed to encourage the contribution of a wide variety of data sources: from institutions to individuals Partnership between The Archives of Michigan, The Internet Archive and Freebase Celebrating the sesquicentennial of the American Civil War developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 16.
    CIVIL WAR DATA150 Project Goals: Create web apps to enable users to add to or modify shared metadata with strong identifiers Engage the public in the process of interacting with and adding value to the data Identify sources and map metadata into Freebase developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 17.
    LOCAH and LinkingLives • Projects of Archives Hub UK (http://archiveshub.ac.uk), which represents more than 220 institutions • LOCAH (Linked Open Copac& Archives Hub; 2010-2011): • Published data from Archives Hub finding aids and Copac, a union catalog of more than 70 major UK libraries • Created LOD resources: 1. SPARQL endpoint 2. Query box for trying out SPARQL queries 3. RDF dump of the dataset 4. Archives HUB EAD to RDF XSLT stylesheet • Linking Lives (2011-2012) expanded on LOCAH • Test project focusing on biography • Brought in more external datasets (Dbpedia, VIAF, Freebase, OpenLibrary, BBC Programmes, Linked Open British National Biography) • Developed interface model (wireframe) • Principal findings: • Even when expressed in triples, data may lack uniformity, requiring time-consuming clean-up • Difficulty of firmly establishing identity when there are variant forms of names or identifying roles (e.g., “author” vs. “writer”) and when different people have the same name developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 18.
    Stage 3. DocumentingProcesses for Reuse
  • 19.
    DEUTSCHE NATIONAL BIBLIOTEK •Linked Data Service • Library scientist led • Authority names and bibliographic data • Downloadable dataset • SRU and OAI/PMH interfaces • Extensive documentation developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 20.
    THE NEW YORKTIMES developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 21.
    THE NEW YORKTIMES The OPEN Blog Documents and contextualizes the APIs Platform for sharing Open Source Code Forum for trouble shooting and ideas Downloadable SKOS Files The entire dataset is downloadable Developers can also chose by topic Users are invited to utilize the datasets and APIs through downloads, documentation, support and explanation of LOD terminology, code and uses developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 22.
    THE NEW YORKTIMES Available APIs Developer Network API Request Tool allows developers to search through the expansive list of APIs and set parameters for their search using a widget. The tool then formats the URL and request results developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 23.
    Stage 4. DevelopingUser Interfaces
  • 24.
    AUSTRALIAN WAR MEMORIAL •Proof of concept • Developer led • Embedded RDF tags • Page based API • No documentation or downloadable dataset developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 25.
    THE AMSTERDAM MUSEUM •Mobile app parses data from Amsterdam museum and linked ontologies • Proposal for visual interface that enables user to become tour guide • Current problem: search and download speed developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 26.
    Out of theTrenches Demonstration Video Subjects can be explored across a range of dimensions Source: http://www.canadiana.ca/sites/pub.canadiana.ca/files/LOD-Demo-ENG_0.mp4 developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 27.
    developing datasets –linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 28.
  • 29.
    OPEN CULTUUR DATAINITIATIVE • Offered workshops on how cultural heritage orgs could open their data • Hosted hackathons to encourage developers to turn datasets into apps • Three award-winners: • VISTORY (using LOD Open Images dataset) • Rijksmonumenten.info • Connected Collection developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 30.
    OPEN CULTUUR DATAINITIATIVE Screenshot from http://www.glimworm.com/vistory.shtml developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 31.
    EUSCREEN • Linked DataPilot • International collaboration • Open, International standards • Downloadable datasets • Fully documented • Showcase of projects in blog • Active in promoting reuse developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 32.
    Stage 6. Expandingthe Definition of Cultural Heritage
  • 33.
    developing datasets –linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
  • 34.
    CONCLUSIONS • (Most) LODprojects: • Proof of concept • No access to a dataset • Not highly documented • Highly curated • Experimental • Promising • The number of LOD datasets continues to increase • Actual use by cultural heritage institutions appears to remain limited • Trust remains an obstacle • Compare: “A guppy is_a_Kind_of fish” (TRUE) “A pony is_a_Kind_offish" (UNTRUE) Computers see these as equally valid. • Verifying or identifying source of a statement may become a best practice • Information added to triples? “A guppy is_a_Kind_offish [source] DBpedia” • Published datasets hold great potential for making the content of an archive's collections known • Researcher studying Person A finds that a collection of Person X's letters includes letters to or from Person A