Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22.
http://tpdl2013.upatras.gr/tut-lddl.php
This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data.
For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked
Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.
3. Schedule for the day
9:00 - Introduction of presenters, tutorial schedule, and learning outcomes
9:10 - Motivation and concepts of Linked Data
9:30 - Discuss: How would you envision using Linked Data in your institution?
9:45 - Lifecycle of Linked Data & Exploring Linked Data
10:10 - Case Study 1: Authority Data
10:30 – 11 COFFEE BREAK
11:00 - Recap
11:10 - Modelling data as Linked Data
11:30 - Case Study 2: Geographical Linked Data
11:50 - Choice of Hands-on Activities
12:25 - Conclusions
4. Hands-on Activities
11:50 – 12:25
Choice of Activities….
• Data Modelling
• Data Cleaning & Structuring
• Querying (SPARQL)
5. Please share your expertise!
• In the room
• On paper
• Online - shared folder:
http://tinyurl.com/tpdl2013-ld-notes
– PDF of the programme
– Shared notes
– More materials later
6. Objectives for Today
• What is Linked Data? Why use it?
• What are some examples of
Linked Data in Digital Libraries?
• What are the best practices for
exploring & creating Linked Data?
22. How Linked Data works
Reuses the existing Web infrastructure to publish your
data along with your documents:
– Using URI identifiers
– and HTTP for accessing the information
23. Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide
useful information, using the standards
- RDF, SPARQL
4. Include links to other URIs. so that they can
discover more things.
http://www.w3.org/wiki/LinkedData
http://www.w3.org/DesignIssues/LinkedData
24. Data on the Web is not enough…
• We need a proper infrastructure for a real
Web of Data
– data is available on the Web
• accessible via standard Web technologies
– data is interlinked over the Web
– ie, data can be integrated over the Web
• We need Linked Data
Slide credit: Ivan Herman
25. In groups of 2-3: Discuss
• How would you envision using Linked
Data?What are the opportunities?
• Is your institution already using Linked Data?
Planning a Linked Data project?
27. Lifecycle of Linked Data
•
•
•
•
•
•
•
•
Find
Explore
Transform
Model
Store
Query
Interlink
Publish
28. Semantic Web for Digital Libraries
Exploring Linked Data
(Practical Tools and Approaches)
Uldis Bojars, Nuno Lopes, & Jodi Schneider
29. Objectives
• Learn about Linked Data (LD) by looking
at existing data sources
• Discover tools and approaches for
exploring Linked Data
30. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/
31. Exploring Linked Data
• Discovering Linked Data
• Accessing RDF data
• Making sense of the data
– Validating RDF data
– Converting between formats
– Browsing Linked Data
• Querying RDF data
33. What RDF looks like
• RDF can be expressed in a number of formats:
– some are good for machines;
some – understandable to people
• Common formats:
– RDF/XML – common, but difficult to read
– NTriples – a simple list of RDF triples
– Turtle – human-readable, easier to understand
• Can be represented visually
34. Accessing RDF data
RDF data on the Web can be found as:
• Linked Data
– follow links, request data by URI
– returned data can be in various RDF formats
• Data dumps
– download the data
• SPARQL endpoints
– query Linked Data (more on that later)
36. Discovering Linked Data
a) find on a link in a Web page
b) have some tools alert you Linked Data is there
–
–
Tabulator
Semantic Radar
c) explore a project you heard about
–
and know LOD should be there
d) use a registry of sources
http://datahub.io/group/lodcloud
e) Just ask someone
37. RDF discovery example
• data at Ivan Herman’s page can be found via:
– finding the RDF icon (with the link to FOAF file)
– letting browser tools alert you that RDF is present
• RDF auto-discovery
– extracting RDFa data embedded in the page
• for other data sources RDF content
negotiation might work
38. Making sense of the data
• Validating RDF data
– Ensures that data representation is correct
• Converting between formats
– Convert to a [more] human-readable RDF format
• Browsing Linked Data
– Browse the data without worrying about
“reading” RDF
39. Validating and Converting RDF
• W3C RDF validator
http://www.w3.org/RDF/Validator/
• URI debugger – “Swiss knife” of Linked Data
http://linkeddata.informatik.hu-berlin.de/uridbg/
• RDFa distiller – extracts RDF embedded in web
pages
http://www.w3.org/2012/pyRdfa/
• Command-line tools (we’ll return to that)
40. <http://www.ivan-herman.net/> a foaf:PersonalProfileDocument;
dc:creator "Ivan Herman";
dc:date "2009-06-17"^^xsd:date;
dc:title "Ivan Herman’s home page";
xhv:stylesheet <http://www.ivan-herman.net/Style/gray.css>;
foaf:primaryTopic <http://www.ivan-herman.net/foaf#me> .
<http://twitter.com/ivan_herman> a foaf:OnlineAccount;
foaf:accountName "ivan_herman";
foaf:accountServiceHomepage <http://twitter.com/> .
<http://www.ivan-herman.net/cgi-bin/rss2to1.py> a rss:channel .
<http://www.ivan-herman.net/foaf#me>
a
dc:Agent,
foaf:Person;
rdfs:seeAlso <http://www.ivan-herman.net/AboutMe>,
<http://www.ivan-herman.net/cgi-bin/rss2to1.py>,
<http://www.ivan-herman.net/foaf.rdf>;
...
Extracted from http://www.ivan-herman.net/ using RDFa Distiller
49. Tool catalogues: many more tools
• Collection of tools from other projects
– http://www.w3.org/2001/sw/wiki/LLDtools
– http://www.w3.org/2001/sw/wiki/Tools
– http://semanticweb.org/wiki/Tools
– http://dbpedia.org/Applications
50. Interesting Projects
• LOCAH
a stylesheet to transform UK Archives Hub EAD to RDF/XML, and provides
examples of the process using XLST
http://data.archiveshub.ac.uk/ead2rdf/
• AliCAT (Archival Linked-data Cataloguing)
Tool for editing collection level records
http://data.aim25.ac.uk/step-change/
• Axiell CALM
Solution for LAM that includes Linked Data functionality, allowing
archivists to tag their collections with URIs from any chosen Linked
Dataset.
http://www.axiell.com/calm
51. Tools for Converting MARC records
• MariMba
Tool to translate MARC to RDF and Linked Data
http://mayor2.dia.fi.upm.es/oegupm/index.php/en/downloads/228-marimba
• marcauth-2-madsrdf
XQuery utility to convert MARC/XML Authority
records to MADS/RDF and SKOS resources
https://github.com/kefo/marcauth-2-madsrdf
52. Tools for museum curators
• Karma (http://isi.edu/integration/karma/)
was used to map the records of the Smithsonian
American Art Museum to RDF and link them the
Web and the Linked Open Data Cloud.
Demo:
http://www.youtube.com/watch?v=kUIqTI56oeQ
55. • Use a single, distinct name for
each person, organization, …
• Name is consistently used
throughout library systems
• Issues:
– “Strings” not “things”
– in Linked Data world we’d just use
URIs
57. VIAF
• Virtual Internet Authority File (viaf.org)
• Integrating authority information from
a number of national libraries
– Linked data + links to related information
• Matching authority data from multiple sources
– using related bibliographic records to help matching
58.
59. Wikipedia + VIAF
• How can people discover useful information
in VIAF and via VIAF?
• Linked Data eco-system – let’s explore (!)
– Wikipedia -> VIAF -> National Library LD
• Example (Andrejs Pumpurs):
– http://en.wikipedia.org/wiki/Andrejs_Pumpurs
– http://viaf.org/viaf/44427367/
64. How did VIAF get into Wikipedia?
• VIAFbot
– algorithmically matched by name, important
dates, and selected works
• “The principal benefit of VIAFbot is the
interconnected structure.” -
68. VIAF – summary:
– an efficient way for putting library authority data
online as linked data
– in case if the organization also provides Linked
Data itself can add links to VIAF to link back to
organization’s LD records (which may contain
richer / additional information)
70. Publishing Data
• Naïve Transform
– Direct Mapping of Relational Data to RDF
See RDB2RDF
OR
• Model & Transform
– Figure out how to represent data
– Then transform according to the model
71. Model
• Describe the domain
– What are the important concepts?
– What are their properties?
– What are their relations?
• Choose vocabularies
73. Deciding on URI patterns
•
•
•
•
Use a domain that you control
Use consistent patterns
Manage change: transparent isn’t always best
Consider what concepts are worth
distinguishing
74. Example URI patterns
• Designing URI Sets for the UK Public Sector
• Defines patterns for
– Identifier URI
– Document URI
– Representation URI
• Identifier example:
http://{domain}/id/{concept}/{reference}
http://data.archiveshub.ac.uk/id/person/ncarules/s
kinnerbeverley1938-1999artist
75. Choosing Vocabularies
• Audience & Purpose
– e.g. search engine vs. bibliographic exchange
• Domain
– Biomedical, geographical, …
• Granularity
• Popularity: potential for interlinking & reuse
80. Look at Publications & Lists
http://www.w3.org/2005/Incubator/lld/XGR-lldvocabdataset-20111025/
81. Ask the community
• Mailing lists
– LOD-LAM
– Code4Lib
– OKFN Open-Bibliography Working Group
– W3C Schema.org BibEx Community Group
• Domain-specific Linked Data groups & lists
84. Modeling spectrum:
lightweight to heavyweight
An ontology ”spectrum” (in the order of complexity).
Source: [Lassila and McGuinness, 2001]. Image from Bojars 2009
90. British Library Data Model - Book
@prefix blt:
@prefix rdf:
@prefix rdfs:
@prefix owl:
@prefix xsd:
@prefix dct:
@prefix isbd:
@prefix skos:
@prefix bibo:
@prefix rda:
@prefix bio:
@prefix foaf:
@prefix event:
@prefix org:
@prefix geo:
Publication Events
Series
<http://www.bl.uk/schemas/bibliographic/blterms#> .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://www.w3.org/2000/01/rdf-schema#> .
<http://www.w3.org/2002/07/owl#> .
<http://www.w3.org/2001/XMLSchema#> .
<http://purl.org/dc/terms/> .
<http://iflastandards.info/ns/isbd/elements/> .
<http://www.w3.org/2004/02/skos/core#> .
<http://purl.org/ontology/bibo/> .
<http://rdvocab.info/ElementsGr2/> .
<http://purl.org/vocab/bio/0.1/> .
<http://xmlns.com/foaf/0.1/> .
<http://purl.org/NET/c4dm/event.owl#> .
<http://www.w3.org/ns/org#> .
<http://www.w3.org/2003/01/geo/wgs84_pos#> .
rdfs:subClassOf
Author
bio:Birth
event:place
CalendarYear
bio:date
blt:publicationStart
blt:publication
a
bio:date
blt:PublicationEndEvent
PublicationEvent
BL URI
Topic LCSH
BL URI
a
rdfs:subClassOf
PublicationStartEvent
BL URI
event:time
dct:hasPart
skos:inScheme
a
bio:Death
http://r.d.g/id/year/
xxxx
owl:sameAs
blt:TopicLCSH
a
event:place
a
LCSH URI if
available
blt:PublicationStartEvent
event:agent
a
dct:isPartOf
id.loc.gov URI for
scheme
A Literal
All properties with a range of
blt:PublicationEvent can be used
with blt:PublicationStartEvent
and blt:PublicationEndEvent.
Arrows omitted for clarity.
Agent
BL URI
Place
BL URI
GeoNames URI
External
Link
rdfs:subClassOf
a
foaf:focus
An Instance
blt:PublicationEvent
geo:SpatialThing
a
Series
BL URI
event:Event
A Class
skos:prefLabel
skos:notation
MARC country code
URI
a
bibo:issn
Key
foaf:Agent
dcterms:Agent
bibo:Series
Birth BL URI
Death BL URI
a
foaf:familyName
PublicationEndEvent
BL URI
foaf:givenName
bio:event
dct:BibliographicResource
bio:event
foaf:name
blt:publicationEnd
rdfs:subClassOf
Person-as-Concept
BL URI
a
blt:PersonConcept
a
dct:subject
bibo:Book
or
bibo:MultiVolumeBook
Person-as-Agent
BL URI
blt:hasCreated
a
rdfs:subClassOf
id.loc.gov URI for
scheme
Family-as-Concept
BL URI
rdfs:subClassOf
blt:hasContributedTo
dct:subject
foaf:focus
blt:hasCreated
dct:subject
rdfs:label
rdfs:subClassOf
foaf:focus
blt:OrganizationConcept
blt:hasContributedTo
blt:bnb
Lexvo URI
dct:subject
dct:subject
id.loc.gov URI for
scheme
a
MARC language
code URI
dct:spatial
Dewey
BL URI
skos:notation
dct:alternative
skos:prefLabel
isbd:P1073
(note on language)
isbd:P1042
(content note)
skos:broader
Place-as-Concept
BL URI
a
owl:sameAs
foaf:focus
Title
dct:description
isbd:P1053
(extent)
skos:notation
Dewey Info URI
bibo:isbn13
dct:title
isbd:P1008
(edition statement)
skos:inScheme
Dewey Info URI for
scheme
foaf:Agent
dct:Agent
foaf:Organization
org:Organization
Identifiers
dct:tableOfContents
rdfs:subClassOf
blt:TopicDDC
bibo:isbn10
a
dct:abstract
foaf:focus
skos:inScheme
Subject
rdfs:label
[foaf:name]
dct:language
Organization-as-Concept
BL URI
a
Organization-as-Agent BL
URI
dct:contributor
Family-as-Agent
BL URI
id.loc.gov URI for
scheme
VIAF URI if available
dct:creator
Resource
BL URI
a
skos:inScheme
blt:FamilyConcept
owl:sameAs
rda:periodOfActivityOfThePerson
dct:contributor
dct:subject
Skos:Concept
foaf:Agent
dct:Agent
foaf:Person
a
dct:creator
foaf:focus
skos:inScheme
bibo:numVolumes
Place-as-Thing
BL URI
a
Miscellaneous literals
rdfs:subClassOf
blt:PlaceConcept
LCSH URI if
available
geo:SpatialThing
dct:Location
Assume that most instance data will have
an rdfs:label. These properties have been
omitted for clarity.
V.1.4 August 2012
Tim Hodson - tim.hodson@talis.com
Corine Deliot - Corine.Deliot@bl.uk
Alan Danskin - Alan.Danskin@bl.uk
Heather Rosie - Heather.Rosie@bl.uk
Jan Ashton - Jan.Ashton@bl.uk
British Library Data Model
http://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf
91. Semantic Web for Digital Libraries
Geographical LD case study
Uldis Bojars, Nuno Lopes, & Jodi Schneider
92. The NLI Longfield Map Collection
• Collections refer to Geographical Data in many forms…
• The Longfield Maps are a set of 1,570 surveys carried out in
Ireland between 1770 and 1840.
• Currently catalogued in MarcXML, using data from
Logainm, Geonames and Dbpedia.
94. Geographic Data Providers
DBpedia
– Includes latitude and longitude for geographic entities
LinkedGeoData
– Export of data from OpenStreetMap
– Beyond lat/lon (areas as polygons)
GeoNames
– Access data as RDF (download requires subscription)
GeoLinkedData
Spain
Ordnance Survey
UK
95. Logainm.ie
• The authority list of Irish place
names, validated by the Place Names
Branch.
• Delivering a more detailed level than in
DBpedia, Geonames.
• Unique source of Irish language place
names.
• NLI looking to integrate Logainm data into
their workflow. Allowing to search for
place names in Irish.
96. Geo-Vocabularies
• W3C Geo (very basic)
– SpatialThing, latitude and longitude
• Most providers have defined their own
• NeoGeo (http://geovocab.org/doc/neogeo/)
– Feature vs Geometry
– Spatial Relations (is_part_of)
97. NeoGeo Overview
• Classes
– Feature (spatial:Feature)
• A geographical feature, capable of holding spatial
relations.
– Geometry (geom:Geometry)
• Super-class of all geometrical representations
(RDF, KML, GML, WKT...).
• Connected by the geometry (geom:geometry)
99. Creating a LD Dataset
Steps:
1. Data transformation / access
•
Vocabulary assessment
2. Link Discovery
•
Evaluation of generated links
3. Deployment
•
Virtuoso OpenSource
100. Converting Logainm to RDF
~100,000 place names
~1.3M triples
http://data.logainm.ie/1
375542
Dublin
http://sws.geona
mes.org/2964574/
101. Link Discovery
• Silk
– http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/
• LIMES
– http://aksw.org/Projects/LIMES.html
• Based on specifying rules that compare pairs
of entities
102. Rules to discover links to other
datasets
•
Rules based on:
–
–
–
–
Place names
Geographical coordinates
Name of the county / parent place name
Hierarchy of places
• # entities matched:
– DBpedia: 1,552
– LinkedGeoData: 6,611
– GeoNames: 8,229
105. Hands-on Activities
11:50 – 12:25
Choice of Activities….
• Data Modelling
• Data Cleaning & Structuring
• Querying (SPARQL)
106. Semantic Web for Digital Libraries
Open Refine Exercise
Uldis Bojars, Nuno Lopes, & Jodi Schneider
107. Open Refine
• Useful for batch transformation of large amounts
of data
– data cleanup (misspellings, splitting multiple-valued
columns, …)
• Linking to other databases
– Freebase
– Any SPARQL enabled LD
• Website: http://openrefine.org/
• RDF extension: http://refine.deri.ie/
108. Exercise
• Examples from: http://freeyourmetadata.org/
• Sample Data (collection metadata from the
Sydney Powerhouse Museum):
http://data.freeyourmetadata.org/powerhous
e-museum/phm-collection.zip
• Screencast:
http://www.youtube.com/watch?v=NnCA1dn
CT-c
109. Task 1 - Data Cleanup
1.
2.
3.
4.
5.
6.
7.
8.
Import the collection into OpenRefine
Get to know your data
Remove blank rows
Remove duplicate rows
Split cells with multiple values
Remove blank cells
Cluster values
Remove double category values
110. Task 2 - Data Reconciliation & RDF
Export
1.
2.
3.
4.
5.
6.
7.
Pick a column to reconcile
Pick a vocabulary to reconcile with
Tell OpenRefine about the vocabulary
Start the reconciliation process
Understanding the reconciliation results
Interpreting the new reconciliation results
Exporting RDF
111. Semantic Web for Digital Libraries
SPARQL Hands-on Session
Uldis Bojars, Nuno Lopes, & Jodi Schneider
112. SPARQL
• Query Language for RDF data
• W3C Standard
• Components of a SPARQL Query:
– Prefix Declarations
– Result type (SELECT, CONSTRUCT, DESCRIBE, ASK)
– Dataset
– Query pattern
– Solution modifiers
114. SPARQL by example – Europeana Endpoint
Endpoint: http://europeana.ontotext.com/sparql
1. SPARQL Select template
2. List of data providers having contributed
content to Europeana
3. List of provided objects with their aggregators
4. 18th century Europeana objects from France
5. Write your own
Editor's Notes
USB stick OR online**Google DocData for the hands-on activities
http://www.europeana.eu/
“The Evergreen and Koha integrated library systems now express their record details in the schema.org vocabulary out of the box using RDFa.”http://www.coffeecode.net/archives/271-RDFa-and-schema.org-all-the-library-things.htmlhttp://koha-community.org/http://evergreen-ils.org
http://aims.fao.org/aos/agrovoc/
http://projecthydra.org/
Rob Styles at Code4Lib 2008http://code4lib.org/conference/2008/stylesSemanticMARCuphttp://dynamicorange.com/uploads/Semantic%20Marcup.pdf
Using identifiersto enable accessto add structure to link to other stuff
http://en.wikipedia.org/wiki/SPARQLwe’re running a bit ahead when mentioning
http://en.wikipedia.org/wiki/SPARQL
Flow of presentation: - show this - then show RDF Validator slides (saying we discovered the URI from (a) webpage or (b) extracted RDFa data)
show SPARQL query assistants (or the link Jodi found where there are a number of example queries)was it Europeana data?provide links to further info re. SPARQLor just refer to the break-out session
Many, many tools – best to ask other people what they can recommend
MARC2SKOSXQuery utility to convert MARC/XML Authority records to MADS/RDF and SKOS resources: https://github.com/kefo/marcauth-2-madsrdf Dublin Core to RDF crosswalk http://dublincore.org/documents/dc-rdf/OAI-PHM RDFizerconverts the metadata from an OAI-PMH-capable repository to RDF.http://simile.mit.edu/wiki/OAI-PMH_RDFizer
Library Data in Wikipedia & WikidataMaximilian Klein, Wikipedian in Residence at OCLChttp://www.slideshare.net/oclcr/viaf-data-in-wikipedia-and-wikidataQuote from http://hangingtogether.org/?p=2306