Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data


Published on

Presentation given at the Dev8d Developer Days event at the University of London Students Union, London, UK on 15th February 2011.

The talk was primarily aimed at developers with the assumption that they knew a bit about RDF and Linked Data, so it doesn’t discuss these except in passing. I was mainly trying to give some specifics on the technicalities involved, and what platforms and tools we’re using, so people can follow the same path if they wanted.

More info at and

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Copac a union catalogue Both successful JISC services running for many years now Locah is a research project – will have to see if go into service with LD interface
  • Encoded Archival Description is an XML standard for encoding archival finding aids The Object Description Schema (MODS) is an XML -based bibliographic description schema MODS - Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. EAD - Things” include concepts and abstractions as well as material objects We want location – archives physical things so location important Also wanted event data, partly steered by the visualisation prototype Also ‘extent’ data – number of boxes
  • 303 and Content Neg from ‘Cool URIs for the Semantic Web’
  • Open Data Commons Public Domain Dedication Creative Commons CC0 license
  • Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

    1. 1. UKOLN is supported by: Do the LOCAH-Motion How to Make Archival and Bibliographic Linked Data 16 th February 2011 Dev8D, University of London, UK Adrian Stevenson LOCAH Project Manager
    2. 2. What is the LOCAH Project? <ul><li>L inked O pen C opac and A rchives H ub </li></ul><ul><li>Funded by #JiscEXPO 2/10 ‘Expose’ call </li></ul><ul><li>1 year project. Started August 2010 </li></ul><ul><li>Partners & Consultants: </li></ul><ul><ul><li>UKOLN – Adrian Stevenson, Julian Cheal </li></ul></ul><ul><ul><li>Mimas – Jane Stevenson, Bethan Ruddock, Yogesh Patel </li></ul></ul><ul><ul><li>Eduserv – Pete Johnston </li></ul></ul><ul><ul><li>Talis – Leigh Dodds, Tim Hodson </li></ul></ul><ul><ul><li>OCLC - Ralph LeVan, Thom Hickey </li></ul></ul><ul><ul><li>Ed Summers </li></ul></ul><ul><li> tag: #locah </li></ul>
    3. 3. What are the Archives Hub and Copac? <ul><li>The Archives Hub is an aggregation of archival descriptions from archive repositories across the UK - </li></ul><ul><li>Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries - </li></ul>
    4. 4. What is Linked Data? <ul><li>URIs </li></ul><ul><li>LD Design Issues </li></ul><ul><li>Triples </li></ul>
    5. 5. What does Linked Data Offer? <ul><li>Haven’t we been putting linked data on the web for years? </li></ul><ul><ul><li>In CSV , relational databases, XML etc? </li></ul></ul><ul><li>Well yes, but these approaches are not easy to integrate </li></ul><ul><li>Web 2.0 mashups work against a fixed set of data sources </li></ul><ul><li>Linked Data applications operate on top of an unbound, global data space. </li></ul>
    6. 6. What is LOCAH Doing? <ul><li>Part 1: Exposing the Linked Data </li></ul><ul><li>Part 2: Creating a prototype visualisation </li></ul><ul><li>Part 3: Reporting on opportunities and barriers </li></ul>
    7. 7. How are we Exposing the LOCAH Linked Data? <ul><li>Model our ‘things’ into RDF </li></ul><ul><li>Transform the existing data into RDF/XML </li></ul><ul><li>Enhance the data </li></ul><ul><li>Load the RDF/XML into a triple store </li></ul><ul><li>Create Linked Data Views </li></ul><ul><li>Document the process, opportunities and barriers on LOCAH Blog </li></ul>
    8. 8. 1. Modelling ‘things’ into RDF <ul><li>Archives Hub data in ‘Encoded Archival Description’ EAD XML form </li></ul><ul><li>Copac data in ‘Metadata Object Description Schema’ MODS XML form </li></ul><ul><li>Take a step back from the data format </li></ul><ul><ul><li>Think about your ‘things’ </li></ul></ul><ul><ul><li>What is EAD document “saying” about “things in the world”? </li></ul></ul><ul><ul><li>What questions do we want to answer about those “things”? </li></ul></ul><ul><li>Can help make data more user-centric </li></ul>
    9. 9. Triples <ul><li>Thinking falls naturally into ‘triple’ statements </li></ul><ul><ul><li>‘ Things’ have ‘properties’ with ‘values’ </li></ul></ul><ul><ul><li>Subject – Predicate - Object </li></ul></ul><ul><li>Triples are basis of RDF </li></ul><ul><li>More on all this at </li></ul>Archival Resource Repository Provides Access To
    10. 10. Data Modelling Challenges <ul><li>Archival description is hierarchical and multi-level </li></ul><ul><li>Information is provided about aggregation of records, and then about component parts </li></ul><ul><li>Multi-level approach gives a strong sense of “context” </li></ul><ul><ul><li>“ lower level” units interpreted in context of the higher levels of description </li></ul></ul><ul><ul><li>Arguably “incomplete” without the contextual data </li></ul></ul><ul><li>Linked Data involves ‘bounded descriptions </li></ul><ul><ul><li>Relations are asserted, e.g. member-of/component-of </li></ul></ul><ul><ul><li>But there is no requirement or expectation that data consumers will follow the links describing the relations </li></ul></ul>
    11. 11. Data Modelling Challenges <ul><li>Hub: inconsistencies in data and lack of standardisation </li></ul><ul><ul><li>there's actually no content standard in the UK </li></ul></ul><ul><li>Copac: not a standard library catalogue </li></ul><ul><ul><li>merged catalogues with de-duplication to an extent but cannot be done entirely </li></ul></ul>
    12. 12. 1. Modelling ‘things’ into RDF <ul><li>Decide on patterns for URIs we generate </li></ul><ul><li>Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’ </li></ul><ul><ul><li>E.g. ‘thing’ URI </li></ul></ul><ul><ul><li>Use HTTP 303 ‘See Other’ to redirect to … </li></ul></ul><ul><ul><li>E.g. doc URI </li></ul></ul><ul><ul><li>Content negotiates to … </li></ul></ul><ul><ul><li>…/doc.rdf , …/doc.html for documents about things </li></ul></ul><ul><ul><li>More info at </li></ul></ul><ul><ul><li> </li></ul></ul>
    13. 13. 1. Modelling ‘things’ into RDF <ul><li>Using existing RDF vocabularies: </li></ul><ul><ul><li>DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies </li></ul></ul><ul><li>Define additional RDF terms where required </li></ul><ul><ul><li>FindingAid </li></ul></ul><ul><ul><li>ArchivalResource </li></ul></ul><ul><ul><li>maintenanceAgency </li></ul></ul><ul><li>It can be hard to know where to look for vocabs and ontologies </li></ul><ul><li>Decide on license – CC0, ODC PDD </li></ul>
    14. 14. Archival Resource Finding Aid Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf Repository (Agent) Book foaf:focus Is-a associatedWith Is-a Concept Scheme inScheme Place administeredBy/ administers
    15. 15. Archives Hub Model (as at 14/2/2011) Archival Resource Finding Aid EAD Document Biographical History Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
    16. 16. Copac Model (as at November 2010 – work in progress)
    17. 17. Feedback Requested! <ul><li>We would like feedback on the model </li></ul><ul><li>Appreciate this will be easier when the data available </li></ul><ul><li>Via blog </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li> </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Via email, twitter, in person at Dev8d </li></ul>
    18. 18. 2. Transforming in RDF/XML <ul><li>Need to transform data in EAD and MODS to RDF/XML, based on our models </li></ul><ul><li>For Hub data created XSLT Stylesheet and used Saxon parser </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>Saxon runs the XSLT against a set of EAD files and creates a set of RDF/XML files </li></ul></ul><ul><li>For Copac data created in-house Java transformation program </li></ul>
    19. 19. 3. Enhancing our data <ul><li>Already have some links: </li></ul><ul><ul><li> URIs for languages of archival materials </li></ul></ul><ul><ul><li> URIs for time periods </li></ul></ul><ul><ul><li>URIs for postcodes, using both UK Postcodes URIs and Ordnance Survey URIs </li></ul></ul><ul><li>Currently also looking at: </li></ul><ul><ul><li>Virtual International Authority File </li></ul></ul><ul><ul><ul><li>Matches and links widely-used authority files - </li></ul></ul></ul><ul><ul><li>Library Congress Subject Headings </li></ul></ul><ul><ul><li>DBPedia </li></ul></ul>
    20. 20. 4. Load the RDF/XML into a triple store <ul><li>Using the Talis Platform triple store </li></ul><ul><li>RDF/XML is HTTP POSTed </li></ul><ul><li>We’re using Pynappl </li></ul><ul><ul><li>Python client for the Talis Platform </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Store provides us with a SPARQL query interface </li></ul>
    21. 21. 5. Create Linked Data Views <ul><li>Expose ‘bounded’ descriptions from the triple store over the Web </li></ul><ul><li>Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV) </li></ul><ul><li>Using Paget ‘Linked Data Publishing Framework’ </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>PHP scripts query Sparql endpoint </li></ul></ul>
    22. 22. <ul><li>‘ Out-of-the-box’ Paget view </li></ul><ul><li> domain just given as example </li></ul>
    23. 23. Other Stuff We Might Try <ul><li>Linked Data API </li></ul><ul><ul><li>APIs, data formats and supporting tools to aid the adoption of linked data </li></ul></ul><ul><ul><li> </li></ul></ul><ul><li>Entity extraction from free text </li></ul><ul><ul><li>Open Calais </li></ul></ul><ul><ul><ul><li>“ creates rich semantic metadata for the content you submit” - </li></ul></ul></ul><ul><ul><li>DBPedia Spotlight (announced yesterday) </li></ul></ul><ul><ul><ul><li>“ solution for linking unstructured information sources to the Linked Open Data” </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>
    24. 25. Can I Access the Locah Linked Data? <ul><li>Not quite yet … </li></ul><ul><li>Hoping to release the Hub data by end February 2011 </li></ul><ul><li>Copac data end March 2011 </li></ul><ul><li>Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation </li></ul>
    25. 26. How are we creating the Visualisation Prototype? <ul><li>Based on researcher use cases </li></ul><ul><li>Data queried from Sparql endpoint </li></ul><ul><li>Use tools such as Simile, Many Eyes, Google Charts </li></ul>
    26. 27. Visualisation Protoype <ul><li>Using Timemap – </li></ul><ul><ul><li>Googlemaps and Simile </li></ul></ul><ul><ul><li> / </li></ul></ul><ul><li>Early stages with this </li></ul><ul><li>Will give location and ‘extent’ of archive. </li></ul><ul><li>Will link through to Archives Hub </li></ul>
    27. 28. How are we reporting on opportunities and barriers? <ul><li>Recording these as we go along on the blog (tags: ‘opportunities’ ‘barriers’) </li></ul><ul><li>Feed into #JiscEXPO synthesis work </li></ul><ul><li>Not time to go into these today </li></ul><ul><li>More at: </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li> </li></ul></ul>
    28. 29. Questions? <ul><li>Contacts: </li></ul><ul><ul><li>Ade Stevenson @adrianstevenson </li></ul></ul><ul><ul><li>Jane Stevenson @janestevenson </li></ul></ul><ul><ul><li>Pete Johnston @ppetej </li></ul></ul><ul><ul><li>Bethan Ruddock @bethanar </li></ul></ul><ul><ul><li>Julian Cheal @juliancheal </li></ul></ul><ul><ul><li>Yogesh Patel </li></ul></ul>
    29. 30. Attribution and CC License <ul><li>Sections of this presentation adapted from materials created by other members of the LOCAH Project </li></ul><ul><li>This presentation available under creative commons Non Commercial-Share Alike: </li></ul>