Your SlideShare is downloading. ×
TPDL2013 tutorial linked data for digital libraries 2013-10-22
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

TPDL2013 tutorial linked data for digital libraries 2013-10-22


Published on

Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22. …

Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22.

This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data.
For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked
Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.

Published in: Technology, Education

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • USB stick OR online**Google DocData for the hands-on activities
  • “The Evergreen and Koha integrated library systems now express their record details in the vocabulary out of the box using RDFa.”
  • Rob Styles at Code4Lib 2008
  • Using identifiersto enable accessto add structure to link to other stuff
  •’re running a bit ahead when mentioning
  • Flow of presentation: - show this - then show RDF Validator slides (saying we discovered the URI from (a) webpage or (b) extracted RDFa data)
  • RDF validator
  • querying too ?
  • show SPARQL query assistants (or the link Jodi found where there are a number of example queries)was it Europeana data?provide links to further info re. SPARQLor just refer to the break-out session
  • Many, many tools – best to ask other people what they can recommend
  • MARC2SKOSXQuery utility to convert MARC/XML Authority records to MADS/RDF and SKOS resources: Dublin Core to RDF crosswalk RDFizerconverts the metadata from an OAI-PMH-capable repository to RDF.
  • Alphabets, diacrits
  • Library Data in Wikipedia & WikidataMaximilian Klein, Wikipedian in Residence at OCLC from
  • via and copying link patterns
  • Linked Data is lightweight: RDF vocabularies.Less focus on constraints (e.g. OWL ontologies)
  • via and copying link patterns
  • Transcript

    • 1. Linked Data for Digital Libraries Uldis Bojars, Nuno Lopes, & Jodi Schneider TPDL 2013 September 22, 2013 Valletta, Malta 1
    • 2. Nuno Digital Repository of Ireland & DERI Uldis National Library of Latvia Jodi DERI
    • 3. Schedule for the day 9:00 - Introduction of presenters, tutorial schedule, and learning outcomes 9:10 - Motivation and concepts of Linked Data 9:30 - Discuss: How would you envision using Linked Data in your institution? 9:45 - Lifecycle of Linked Data & Exploring Linked Data 10:10 - Case Study 1: Authority Data 10:30 – 11 COFFEE BREAK 11:00 - Recap 11:10 - Modelling data as Linked Data 11:30 - Case Study 2: Geographical Linked Data 11:50 - Choice of Hands-on Activities 12:25 - Conclusions
    • 4. Hands-on Activities 11:50 – 12:25 Choice of Activities…. • Data Modelling • Data Cleaning & Structuring • Querying (SPARQL)
    • 5. Please share your expertise! • In the room • On paper • Online - shared folder: – PDF of the programme – Shared notes – More materials later
    • 6. Objectives for Today • What is Linked Data? Why use it? • What are some examples of Linked Data in Digital Libraries? • What are the best practices for exploring & creating Linked Data?
    • 7. Motivation and concepts of Linked Data
    • 8. What is Linked Data? • • • • Using identifiers to enable access to add structure to link to other stuff
    • 9. Why use Linked Data?
    • 10. Key technology for library data! Representing Publishing Exchanging
    • 11. • Powerful querying • Ability to mix/match vocabularies • Same technology stack as everybody else – Findability – Interoperability
    • 12. Who is using Linked Data?
    • 13. Aggregators
    • 14. Integrated Library Systems & OPACs
    • 15. Thesauri
    • 16. Repositories
    • 17. What is Linked Data (redux)?
    • 18. Rob Styles
    • 19. Towards RDF Subject Predicate Object
    • 20. RDF triple Subject Predicate Object
    • 21. RDF graph
    • 22. How Linked Data works Reuses the existing Web infrastructure to publish your data along with your documents: – Using URI identifiers – and HTTP for accessing the information
    • 23. Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards - RDF, SPARQL 4. Include links to other URIs. so that they can discover more things.
    • 24. Data on the Web is not enough… • We need a proper infrastructure for a real Web of Data – data is available on the Web • accessible via standard Web technologies – data is interlinked over the Web – ie, data can be integrated over the Web • We need Linked Data Slide credit: Ivan Herman
    • 25. In groups of 2-3: Discuss • How would you envision using Linked Data?What are the opportunities? • Is your institution already using Linked Data? Planning a Linked Data project?
    • 26. Lifecycle of Linked Data
    • 27. Lifecycle of Linked Data • • • • • • • • Find Explore Transform Model Store Query Interlink Publish
    • 28. Semantic Web for Digital Libraries Exploring Linked Data (Practical Tools and Approaches) Uldis Bojars, Nuno Lopes, & Jodi Schneider
    • 29. Objectives • Learn about Linked Data (LD) by looking at existing data sources • Discover tools and approaches for exploring Linked Data
    • 30. Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
    • 31. Exploring Linked Data • Discovering Linked Data • Accessing RDF data • Making sense of the data – Validating RDF data – Converting between formats – Browsing Linked Data • Querying RDF data
    • 32. RDF graph
    • 33. What RDF looks like • RDF can be expressed in a number of formats: – some are good for machines; some – understandable to people • Common formats: – RDF/XML – common, but difficult to read – NTriples – a simple list of RDF triples – Turtle – human-readable, easier to understand • Can be represented visually
    • 34. Accessing RDF data RDF data on the Web can be found as: • Linked Data – follow links, request data by URI – returned data can be in various RDF formats • Data dumps – download the data • SPARQL endpoints – query Linked Data (more on that later)
    • 35.
    • 36. Discovering Linked Data a) find on a link in a Web page b) have some tools alert you Linked Data is there – – Tabulator Semantic Radar c) explore a project you heard about – and know LOD should be there d) use a registry of sources e) Just ask someone
    • 37. RDF discovery example • data at Ivan Herman’s page can be found via: – finding the RDF icon (with the link to FOAF file) – letting browser tools alert you that RDF is present • RDF auto-discovery – extracting RDFa data embedded in the page • for other data sources RDF content negotiation might work
    • 38. Making sense of the data • Validating RDF data – Ensures that data representation is correct • Converting between formats – Convert to a [more] human-readable RDF format • Browsing Linked Data – Browse the data without worrying about “reading” RDF
    • 39. Validating and Converting RDF • W3C RDF validator • URI debugger – “Swiss knife” of Linked Data • RDFa distiller – extracts RDF embedded in web pages • Command-line tools (we’ll return to that)
    • 40. <> a foaf:PersonalProfileDocument; dc:creator "Ivan Herman"; dc:date "2009-06-17"^^xsd:date; dc:title "Ivan Herman’s home page"; xhv:stylesheet <>; foaf:primaryTopic <> . <> a foaf:OnlineAccount; foaf:accountName "ivan_herman"; foaf:accountServiceHomepage <> . <> a rss:channel . <> a dc:Agent, foaf:Person; rdfs:seeAlso <>, <>, <>; ... Extracted from using RDFa Distiller
    • 41. Browsing Linked Data (DBPedia):
    • 42. Command Line Tools • wget – command line network downloader $ wget • curl – specify HTTP headers $ curl -L -H "Accept: text/rdf+n3” • Redland rapper – RDF parsing and serialisation $ rapper -o turtle
    • 43. Querying Linked Data • SPARQL Protocol and RDF Query Language • Graph Matching • Components of a SPARQL Query: – Prefix Declarations – Result type (SELECT, CONSTRUCT, DESCRIBE, ASK) – Dataset – Query pattern – Solution modifiers
    • 44. Europeana SPARQL endpoint
    • 45. Sample queries provided:
    • 46.
    • 47. Tool catalogues: many more tools • Collection of tools from other projects – – – –
    • 48. Interesting Projects • LOCAH a stylesheet to transform UK Archives Hub EAD to RDF/XML, and provides examples of the process using XLST • AliCAT (Archival Linked-data Cataloguing) Tool for editing collection level records • Axiell CALM Solution for LAM that includes Linked Data functionality, allowing archivists to tag their collections with URIs from any chosen Linked Dataset.
    • 49. Tools for Converting MARC records • MariMba Tool to translate MARC to RDF and Linked Data • marcauth-2-madsrdf XQuery utility to convert MARC/XML Authority records to MADS/RDF and SKOS resources
    • 50. Tools for museum curators • Karma ( was used to map the records of the Smithsonian American Art Museum to RDF and link them the Web and the Linked Open Data Cloud. Demo:
    • 51. Authority Linked Data VIAF and Wikipedia case study
    • 52. library links Slide credit: Jindřich Mynarz
    • 53. • Use a single, distinct name for each person, organization, … • Name is consistently used throughout library systems • Issues: – “Strings” not “things” – in Linked Data world we’d just use URIs 
    • 54.
    • 55. VIAF • Virtual Internet Authority File ( • Integrating authority information from a number of national libraries – Linked data + links to related information • Matching authority data from multiple sources – using related bibliographic records to help matching
    • 56. Wikipedia + VIAF • How can people discover useful information in VIAF and via VIAF? • Linked Data eco-system – let’s explore (!) – Wikipedia -> VIAF -> National Library LD • Example (Andrejs Pumpurs): – –
    • 57.
    • 58.
    • 59. VIAF • Ontologies used: – FOAF, SKOS, RDA (FRBR entities and elements), Dublin Core, VIAF, UMBEL • Related datasets: – National authority data: • Germany (, Sweden (LIBRIS), France (idref.rf) – DBPedia
    • 60.
    • 61. How did VIAF get into Wikipedia? • VIAFbot – algorithmically matched by name, important dates, and selected works • “The principal benefit of VIAFbot is the interconnected structure.” -
    • 62. One Direction VIAF Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC English Wiki
    • 63. Enter VIAFBot: Wikipedia Robot VIAF Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC English Wiki
    • 64. Idea: Reciprocate VIAF Slide credit: Maximilian Klein, Wikipedian in Residence at OCLC English Wiki
    • 65. VIAF – summary: – an efficient way for putting library authority data online as linked data – in case if the organization also provides Linked Data itself can add links to VIAF to link back to organization’s LD records (which may contain richer / additional information)
    • 66. Data Modelling
    • 67. Publishing Data • Naïve Transform – Direct Mapping of Relational Data to RDF See RDB2RDF OR • Model & Transform – Figure out how to represent data – Then transform according to the model
    • 68. Model • Describe the domain – What are the important concepts? – What are their properties? – What are their relations? • Choose vocabularies
    • 69. DC TERMS RDF Vocabulary
    • 70. Deciding on URI patterns • • • • Use a domain that you control Use consistent patterns Manage change: transparent isn’t always best Consider what concepts are worth distinguishing
    • 71. Example URI patterns • Designing URI Sets for the UK Public Sector • Defines patterns for – Identifier URI – Document URI – Representation URI • Identifier example: http://{domain}/id/{concept}/{reference} kinnerbeverley1938-1999artist
    • 72. Choosing Vocabularies • Audience & Purpose – e.g. search engine vs. bibliographic exchange • Domain – Biomedical, geographical, … • Granularity • Popularity: potential for interlinking & reuse
    • 73. Finding vocabularies & ontologies
    • 74. Look at examples
    • 75. Look at examples
    • 76. Find examples: Linked Open Data Cloud
    • 77. Look at Publications & Lists
    • 78. Ask the community • Mailing lists – LOD-LAM – Code4Lib – OKFN Open-Bibliography Working Group – W3C BibEx Community Group • Domain-specific Linked Data groups & lists
    • 79. Popularity
    • 80. Popularity: Semantic search engines
    • 81. Modeling spectrum: lightweight to heavyweight An ontology ”spectrum” (in the order of complexity). Source: [Lassila and McGuinness, 2001]. Image from Bojars 2009
    • 82. Some popular vocabularies • • • • • • DC BIBO FOAF LODE (LinkedEvents) OAI-ORE SKOS
    • 83. Be aware of & connect to • Authority data – e.g. VIAF • Thesauri – e.g. Agrovoc • Linked Data is about Linking!
    • 84. Modeling examples • • • • • BIBFRAME British Library Data Model EDM LIBRIS VIAF
    • 85. VIAF • Ontologies used: – FOAF, SKOS, RDA (FRBR entities and elements), Dublin Core, VIAF, UMBEL • Related datasets: – National authority data: • Germany (, Sweden (LIBRIS), France (idref.rf) – DBPedia
    • 86. LIBRIS Modeling
    • 87. British Library Data Model - Book @prefix blt: @prefix rdf: @prefix rdfs: @prefix owl: @prefix xsd: @prefix dct: @prefix isbd: @prefix skos: @prefix bibo: @prefix rda: @prefix bio: @prefix foaf: @prefix event: @prefix org: @prefix geo: Publication Events Series <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . <> . rdfs:subClassOf Author bio:Birth event:place CalendarYear bio:date blt:publicationStart blt:publication a bio:date blt:PublicationEndEvent PublicationEvent BL URI Topic LCSH BL URI a rdfs:subClassOf PublicationStartEvent BL URI event:time dct:hasPart skos:inScheme a bio:Death http://r.d.g/id/year/ xxxx owl:sameAs blt:TopicLCSH a event:place a LCSH URI if available blt:PublicationStartEvent event:agent a dct:isPartOf URI for scheme A Literal All properties with a range of blt:PublicationEvent can be used with blt:PublicationStartEvent and blt:PublicationEndEvent. Arrows omitted for clarity. Agent BL URI Place BL URI GeoNames URI External Link rdfs:subClassOf a foaf:focus An Instance blt:PublicationEvent geo:SpatialThing a Series BL URI event:Event A Class skos:prefLabel skos:notation MARC country code URI a bibo:issn Key foaf:Agent dcterms:Agent bibo:Series Birth BL URI Death BL URI a foaf:familyName PublicationEndEvent BL URI foaf:givenName bio:event dct:BibliographicResource bio:event foaf:name blt:publicationEnd rdfs:subClassOf Person-as-Concept BL URI a blt:PersonConcept a dct:subject bibo:Book or bibo:MultiVolumeBook Person-as-Agent BL URI blt:hasCreated a rdfs:subClassOf URI for scheme Family-as-Concept BL URI rdfs:subClassOf blt:hasContributedTo dct:subject foaf:focus blt:hasCreated dct:subject rdfs:label rdfs:subClassOf foaf:focus blt:OrganizationConcept blt:hasContributedTo blt:bnb Lexvo URI dct:subject dct:subject URI for scheme a MARC language code URI dct:spatial Dewey BL URI skos:notation dct:alternative skos:prefLabel isbd:P1073 (note on language) isbd:P1042 (content note) skos:broader Place-as-Concept BL URI a owl:sameAs foaf:focus Title dct:description isbd:P1053 (extent) skos:notation Dewey Info URI bibo:isbn13 dct:title isbd:P1008 (edition statement) skos:inScheme Dewey Info URI for scheme foaf:Agent dct:Agent foaf:Organization org:Organization Identifiers dct:tableOfContents rdfs:subClassOf blt:TopicDDC bibo:isbn10 a dct:abstract foaf:focus skos:inScheme Subject rdfs:label [foaf:name] dct:language Organization-as-Concept BL URI a Organization-as-Agent BL URI dct:contributor Family-as-Agent BL URI URI for scheme VIAF URI if available dct:creator Resource BL URI a skos:inScheme blt:FamilyConcept owl:sameAs rda:periodOfActivityOfThePerson dct:contributor dct:subject Skos:Concept foaf:Agent dct:Agent foaf:Person a dct:creator foaf:focus skos:inScheme bibo:numVolumes Place-as-Thing BL URI a Miscellaneous literals rdfs:subClassOf blt:PlaceConcept LCSH URI if available geo:SpatialThing dct:Location Assume that most instance data will have an rdfs:label. These properties have been omitted for clarity. V.1.4 August 2012 Tim Hodson - Corine Deliot - Alan Danskin - Heather Rosie - Jan Ashton - British Library Data Model
    • 88. Semantic Web for Digital Libraries Geographical LD case study Uldis Bojars, Nuno Lopes, & Jodi Schneider
    • 89. The NLI Longfield Map Collection • Collections refer to Geographical Data in many forms… • The Longfield Maps are a set of 1,570 surveys carried out in Ireland between 1770 and 1840. • Currently catalogued in MarcXML, using data from Logainm, Geonames and Dbpedia.
    • 90. Longfield Map example <marc:datafield tag="650" ind1="" ind2=""> <marc:subfield code="a">Land tenure</marc:subfield> <marc:subfield code="z">Ireland</marc:subfield> <marc:subfield code="z">Rathdown (Barony)</marc:subfield> </marc:datafield> <marc:datafield tag="650" ind1="" ind2=""> <marc:subfield code="a">Land use surveys</marc:subfield> <marc:subfield code="z">Ireland</marc:subfield> <marc:subfield code="z">Wicklow (County)</marc:subfield> </marc:datafield>
    • 91. Geographic Data Providers  DBpedia – Includes latitude and longitude for geographic entities  LinkedGeoData – Export of data from OpenStreetMap – Beyond lat/lon (areas as polygons)  GeoNames – Access data as RDF (download requires subscription)  GeoLinkedData   Spain Ordnance Survey  UK
    • 92. • The authority list of Irish place names, validated by the Place Names Branch. • Delivering a more detailed level than in DBpedia, Geonames. • Unique source of Irish language place names. • NLI looking to integrate Logainm data into their workflow. Allowing to search for place names in Irish.
    • 93. Geo-Vocabularies • W3C Geo (very basic) – SpatialThing, latitude and longitude • Most providers have defined their own • NeoGeo ( – Feature vs Geometry – Spatial Relations (is_part_of)
    • 94. NeoGeo Overview • Classes – Feature (spatial:Feature) • A geographical feature, capable of holding spatial relations. – Geometry (geom:Geometry) • Super-class of all geometrical representations (RDF, KML, GML, WKT...). • Connected by the geometry (geom:geometry)
    • 95. Relations between geometries Properties • • • • • connects with (spatial:C) overlaps (spatial:O) is part of (spatial:P) contains (spatial:Pi) …
    • 96. Creating a LD Dataset Steps: 1. Data transformation / access • Vocabulary assessment 2. Link Discovery • Evaluation of generated links 3. Deployment • Virtuoso OpenSource
    • 97. Converting Logainm to RDF ~100,000 place names ~1.3M triples 375542 Dublin http://sws.geona
    • 98. Link Discovery • Silk – • LIMES – • Based on specifying rules that compare pairs of entities
    • 99. Rules to discover links to other datasets • Rules based on: – – – – Place names Geographical coordinates Name of the county / parent place name Hierarchy of places • # entities matched: – DBpedia: 1,552 – LinkedGeoData: 6,611 – GeoNames: 8,229
    • 100. Longfield Map example <marc:datafield tag="650" ind1="" ind2=""> <marc:subfield code="a">Land tenure</marc:subfield> <marc:subfield code="z">Ireland</marc:subfield> <marc:subfield code="z">Rathdown (Barony)</marc:subfield> </marc:datafield> <marc:datafield tag="650" ind1="" ind2=""> <marc:subfield code="a">Land use surveys</marc:subfield> <marc:subfield code="z">Ireland</marc:subfield> <marc:subfield code="z">Wicklow (County)</marc:subfield> </marc:datafield> <marc:datafield tag="651" ind2="7" ind1=""> <marc:subfield code="2"></marc:subfield> <marc:subfield code="a">Rathdown</marc:subfield> <marc:subfield code="0”></marc:subfield> </marc:datafield>
    • 101. Demo: Location LODer
    • 102. Hands-on Activities 11:50 – 12:25 Choice of Activities…. • Data Modelling • Data Cleaning & Structuring • Querying (SPARQL)
    • 103. Semantic Web for Digital Libraries Open Refine Exercise Uldis Bojars, Nuno Lopes, & Jodi Schneider
    • 104. Open Refine • Useful for batch transformation of large amounts of data – data cleanup (misspellings, splitting multiple-valued columns, …) • Linking to other databases – Freebase – Any SPARQL enabled LD • Website: • RDF extension:
    • 105. Exercise • Examples from: • Sample Data (collection metadata from the Sydney Powerhouse Museum): e-museum/ • Screencast: CT-c
    • 106. Task 1 - Data Cleanup 1. 2. 3. 4. 5. 6. 7. 8. Import the collection into OpenRefine Get to know your data Remove blank rows Remove duplicate rows Split cells with multiple values Remove blank cells Cluster values Remove double category values
    • 107. Task 2 - Data Reconciliation & RDF Export 1. 2. 3. 4. 5. 6. 7. Pick a column to reconcile Pick a vocabulary to reconcile with Tell OpenRefine about the vocabulary Start the reconciliation process Understanding the reconciliation results Interpreting the new reconciliation results Exporting RDF
    • 108. Semantic Web for Digital Libraries SPARQL Hands-on Session Uldis Bojars, Nuno Lopes, & Jodi Schneider
    • 109. SPARQL • Query Language for RDF data • W3C Standard • Components of a SPARQL Query: – Prefix Declarations – Result type (SELECT, CONSTRUCT, DESCRIBE, ASK) – Dataset – Query pattern – Solution modifiers
    • 110. Further information • In-Depth SPARQL tutorials – – Tutorial.pptx – • SPARQL: – (Jena) –
    • 111. SPARQL by example – Europeana Endpoint Endpoint: 1. SPARQL Select template 2. List of data providers having contributed content to Europeana 3. List of provided objects with their aggregators 4. 18th century Europeana objects from France 5. Write your own