UKOLN is supported  by: Do the LOCAH-Motion How to Make Archival and Bibliographic Linked Data 16 th  February 2011 Dev8D, University of London, UK Adrian Stevenson LOCAH Project Manager
What is the LOCAH Project? L inked  O pen  C opac and  A rchives  H ub Funded by #JiscEXPO 2/10 ‘Expose’ call 1 year project. Started August 2010 Partners & Consultants: UKOLN  – Adrian Stevenson, Julian Cheal Mimas  – Jane Stevenson, Bethan Ruddock, Yogesh Patel Eduserv  – Pete Johnston Talis  – Leigh Dodds, Tim Hodson OCLC  - Ralph LeVan, Thom Hickey Ed Summers http://blogs.ukoln.ac.uk/locah/  tag: #locah
What are the Archives Hub and Copac? The Archives Hub is an aggregation of archival descriptions from archive repositories across the UK -  http://archiveshub.ac.uk Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries - http://copac.ac.uk
What is Linked Data? URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html
What does Linked Data Offer? Haven’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
What is LOCAH Doing? Part 1: Exposing the Linked Data Part 2: Creating a prototype visualisation Part 3: Reporting on opportunities and barriers
How are we Exposing the LOCAH Linked Data? Model our ‘things’ into RDF Transform the existing data into RDF/XML  Enhance the data Load the RDF/XML into a triple store Create Linked Data Views Document the process, opportunities and barriers on LOCAH Blog
1. Modelling ‘things’ into RDF Archives Hub data in ‘Encoded Archival Description’ EAD XML form Copac data in ‘Metadata Object Description Schema’ MODS XML form Take a step back from the data format Think about your ‘things’ What is EAD document “saying” about “things in the world”? What questions do we want to answer about those “things”? Can help make data more user-centric http://www.loc.gov/ead/  http://www.loc.gov/standards/mods/
Triples Thinking falls naturally into ‘triple’ statements ‘ Things’ have ‘properties’ with ‘values’ Subject – Predicate - Object Triples are basis of RDF More on all this at http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ Archival Resource Repository Provides Access To
Data Modelling Challenges Archival description is hierarchical and multi-level Information is provided about aggregation of records, and then about component parts Multi-level approach gives a strong sense of “context” “ lower level” units interpreted in context of the higher levels of description Arguably “incomplete” without the contextual data Linked Data involves ‘bounded descriptions Relations are asserted, e.g. member-of/component-of But there is  no requirement or expectation that data consumers will follow the links describing the relations
Data Modelling Challenges Hub: inconsistencies in data and lack of standardisation there's actually no content standard in the UK Copac: not a standard library catalogue merged catalogues with de-duplication to an extent but cannot be done entirely
1. Modelling ‘things’ into RDF Decide on patterns for URIs we generate Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’ E.g. http://example.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI Use HTTP 303 ‘See Other’ to redirect to … E.g. http://example.ac.uk/doc/id/findingaid/gb1086skinner doc URI Content negotiates to … http://example.ac.uk/doc/…/doc.rdf , …/doc.html for documents about things More info at http://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/ http://www.w3.org/TR/cooluris/ http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
1. Modelling ‘things’ into RDF Using existing RDF vocabularies: DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies Define additional RDF terms where required FindingAid ArchivalResource maintenanceAgency It can be hard to know where to look for vocabs and ontologies Decide on license – CC0, ODC PDD
Archival Resource Finding  Aid Agent  Family  Person  Place  Concept  Genre  Function  Organisation  maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf Repository (Agent) Book foaf:focus Is-a associatedWith Is-a Concept Scheme inScheme Place  administeredBy/ administers
Archives Hub Model (as at 14/2/2011) Archival Resource Finding  Aid EAD  Document Biographical  History Agent  Family  Person  Place  Concept  Genre  Function  Organisation  maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place  topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
Copac Model (as at November 2010 – work in progress)
Feedback Requested! We would like feedback on the model Appreciate this will be easier when the data available Via blog  http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/ http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/ Via email, twitter, in person at Dev8d
2. Transforming in RDF/XML Need to transform data in EAD and MODS to RDF/XML, based on our models For Hub data created XSLT Stylesheet and used Saxon parser http://saxon.sourceforge.net/ Saxon runs the XSLT against a set of EAD files and creates a set of RDF/XML files For Copac data created in-house Java transformation program
3. Enhancing our data Already have some links: lexvo.org URIs for languages of archival materials reference.data.gov.uk URIs for time periods URIs for postcodes, using both UK Postcodes URIs and Ordnance Survey URIs  Currently also looking at: Virtual International Authority File Matches and links widely-used authority files - http://viaf.org/ Library Congress Subject Headings DBPedia
4. Load the RDF/XML into a triple store Using the Talis Platform triple store RDF/XML is HTTP POSTed We’re using Pynappl  Python client for the Talis Platform http://code.google.com/p/pynappl/ Store provides us with a SPARQL query interface
5. Create Linked Data Views Expose ‘bounded’ descriptions from the triple store over the Web Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV) Using Paget ‘Linked Data Publishing Framework’ http://code.google.com/p/paget/ PHP scripts query Sparql endpoint
‘ Out-of-the-box’ Paget view Linkedhub.ac.uk domain just given as example
Other Stuff We Might Try Linked Data API APIs, data formats and supporting tools to aid the adoption of linked data http://code.google.com/p/linked-data-api/ Entity extraction from free text Open Calais “ creates rich semantic metadata for the content you submit”  - http://www.opencalais.com/ DBPedia Spotlight (announced yesterday) “ solution for linking unstructured information sources to the Linked Open Data” http://dbpedia.org/spotlight
 
Can I Access the Locah Linked Data? Not quite yet … Hoping to release the Hub data by end February 2011 Copac data end March 2011 Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation
How are we creating the Visualisation Prototype? Based on researcher use cases Data queried from Sparql endpoint Use tools such as Simile, Many Eyes, Google Charts
Visualisation Protoype Using Timemap –  Googlemaps and Simile http://code.google.com/p/timemap / Early stages with this Will give location and ‘extent’ of archive. Will link through to Archives Hub
How are we reporting on opportunities and barriers? Recording these as we go along on the blog (tags: ‘opportunities’ ‘barriers’) Feed into #JiscEXPO synthesis work Not time to go into these today More at: http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/ http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data
Questions? Contacts: Ade Stevenson @adrianstevenson Jane Stevenson @janestevenson Pete Johnston @ppetej Bethan Ruddock @bethanar Julian Cheal @juliancheal Yogesh Patel  http://mimas.ac.uk/staff/
Attribution and CC License  Sections of this presentation adapted from materials created by other members of the LOCAH Project This presentation available under creative commons   Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data

  • 1.
    UKOLN is supported by: Do the LOCAH-Motion How to Make Archival and Bibliographic Linked Data 16 th February 2011 Dev8D, University of London, UK Adrian Stevenson LOCAH Project Manager
  • 2.
    What is theLOCAH Project? L inked O pen C opac and A rchives H ub Funded by #JiscEXPO 2/10 ‘Expose’ call 1 year project. Started August 2010 Partners & Consultants: UKOLN – Adrian Stevenson, Julian Cheal Mimas – Jane Stevenson, Bethan Ruddock, Yogesh Patel Eduserv – Pete Johnston Talis – Leigh Dodds, Tim Hodson OCLC - Ralph LeVan, Thom Hickey Ed Summers http://blogs.ukoln.ac.uk/locah/ tag: #locah
  • 3.
    What are theArchives Hub and Copac? The Archives Hub is an aggregation of archival descriptions from archive repositories across the UK - http://archiveshub.ac.uk Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries - http://copac.ac.uk
  • 4.
    What is LinkedData? URIs LD Design Issues Triples http://www.w3.org/DesignIssues/LinkedData.html
  • 5.
    What does LinkedData Offer? Haven’t we been putting linked data on the web for years? In CSV , relational databases, XML etc? Well yes, but these approaches are not easy to integrate Web 2.0 mashups work against a fixed set of data sources Linked Data applications operate on top of an unbound, global data space.
  • 6.
    What is LOCAHDoing? Part 1: Exposing the Linked Data Part 2: Creating a prototype visualisation Part 3: Reporting on opportunities and barriers
  • 7.
    How are weExposing the LOCAH Linked Data? Model our ‘things’ into RDF Transform the existing data into RDF/XML Enhance the data Load the RDF/XML into a triple store Create Linked Data Views Document the process, opportunities and barriers on LOCAH Blog
  • 8.
    1. Modelling ‘things’into RDF Archives Hub data in ‘Encoded Archival Description’ EAD XML form Copac data in ‘Metadata Object Description Schema’ MODS XML form Take a step back from the data format Think about your ‘things’ What is EAD document “saying” about “things in the world”? What questions do we want to answer about those “things”? Can help make data more user-centric http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/
  • 9.
    Triples Thinking fallsnaturally into ‘triple’ statements ‘ Things’ have ‘properties’ with ‘values’ Subject – Predicate - Object Triples are basis of RDF More on all this at http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ Archival Resource Repository Provides Access To
  • 10.
    Data Modelling ChallengesArchival description is hierarchical and multi-level Information is provided about aggregation of records, and then about component parts Multi-level approach gives a strong sense of “context” “ lower level” units interpreted in context of the higher levels of description Arguably “incomplete” without the contextual data Linked Data involves ‘bounded descriptions Relations are asserted, e.g. member-of/component-of But there is no requirement or expectation that data consumers will follow the links describing the relations
  • 11.
    Data Modelling ChallengesHub: inconsistencies in data and lack of standardisation there's actually no content standard in the UK Copac: not a standard library catalogue merged catalogues with de-duplication to an extent but cannot be done entirely
  • 12.
    1. Modelling ‘things’into RDF Decide on patterns for URIs we generate Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’ E.g. http://example.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI Use HTTP 303 ‘See Other’ to redirect to … E.g. http://example.ac.uk/doc/id/findingaid/gb1086skinner doc URI Content negotiates to … http://example.ac.uk/doc/…/doc.rdf , …/doc.html for documents about things More info at http://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-patterns-for-the-hub-linked-data/ http://www.w3.org/TR/cooluris/ http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • 13.
    1. Modelling ‘things’into RDF Using existing RDF vocabularies: DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies Define additional RDF terms where required FindingAid ArchivalResource maintenanceAgency It can be hard to know where to look for vocabs and ontologies Decide on license – CC0, ODC PDD
  • 14.
    Archival Resource Finding Aid Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf Repository (Agent) Book foaf:focus Is-a associatedWith Is-a Concept Scheme inScheme Place administeredBy/ administers
  • 15.
    Archives Hub Model(as at 14/2/2011) Archival Resource Finding Aid EAD Document Biographical History Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
  • 16.
    Copac Model (asat November 2010 – work in progress)
  • 17.
    Feedback Requested! Wewould like feedback on the model Appreciate this will be easier when the data available Via blog http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/ http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/ http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/ Via email, twitter, in person at Dev8d
  • 18.
    2. Transforming inRDF/XML Need to transform data in EAD and MODS to RDF/XML, based on our models For Hub data created XSLT Stylesheet and used Saxon parser http://saxon.sourceforge.net/ Saxon runs the XSLT against a set of EAD files and creates a set of RDF/XML files For Copac data created in-house Java transformation program
  • 19.
    3. Enhancing ourdata Already have some links: lexvo.org URIs for languages of archival materials reference.data.gov.uk URIs for time periods URIs for postcodes, using both UK Postcodes URIs and Ordnance Survey URIs Currently also looking at: Virtual International Authority File Matches and links widely-used authority files - http://viaf.org/ Library Congress Subject Headings DBPedia
  • 20.
    4. Load theRDF/XML into a triple store Using the Talis Platform triple store RDF/XML is HTTP POSTed We’re using Pynappl Python client for the Talis Platform http://code.google.com/p/pynappl/ Store provides us with a SPARQL query interface
  • 21.
    5. Create LinkedData Views Expose ‘bounded’ descriptions from the triple store over the Web Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV) Using Paget ‘Linked Data Publishing Framework’ http://code.google.com/p/paget/ PHP scripts query Sparql endpoint
  • 22.
    ‘ Out-of-the-box’ Pagetview Linkedhub.ac.uk domain just given as example
  • 23.
    Other Stuff WeMight Try Linked Data API APIs, data formats and supporting tools to aid the adoption of linked data http://code.google.com/p/linked-data-api/ Entity extraction from free text Open Calais “ creates rich semantic metadata for the content you submit” - http://www.opencalais.com/ DBPedia Spotlight (announced yesterday) “ solution for linking unstructured information sources to the Linked Open Data” http://dbpedia.org/spotlight
  • 24.
  • 25.
    Can I Accessthe Locah Linked Data? Not quite yet … Hoping to release the Hub data by end February 2011 Copac data end March 2011 Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation
  • 26.
    How are wecreating the Visualisation Prototype? Based on researcher use cases Data queried from Sparql endpoint Use tools such as Simile, Many Eyes, Google Charts
  • 27.
    Visualisation Protoype UsingTimemap – Googlemaps and Simile http://code.google.com/p/timemap / Early stages with this Will give location and ‘extent’ of archive. Will link through to Archives Hub
  • 28.
    How are wereporting on opportunities and barriers? Recording these as we go along on the blog (tags: ‘opportunities’ ‘barriers’) Feed into #JiscEXPO synthesis work Not time to go into these today More at: http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/ http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data
  • 29.
    Questions? Contacts: AdeStevenson @adrianstevenson Jane Stevenson @janestevenson Pete Johnston @ppetej Bethan Ruddock @bethanar Julian Cheal @juliancheal Yogesh Patel http://mimas.ac.uk/staff/
  • 30.
    Attribution and CCLicense Sections of this presentation adapted from materials created by other members of the LOCAH Project This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/

Editor's Notes

  • #4 Copac a union catalogue Both successful JISC services running for many years now Locah is a research project – will have to see if go into service with LD interface
  • #5 http://www.w3.org/DesignIssues/LinkedData.html
  • #9 Encoded Archival Description is an XML standard for encoding archival finding aids The Object Description Schema (MODS) is an XML -based bibliographic description schema MODS - Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. EAD - Things” include concepts and abstractions as well as material objects We want location – archives physical things so location important Also wanted event data, partly steered by the visualisation prototype Also ‘extent’ data – number of boxes
  • #13 303 and Content Neg from ‘Cool URIs for the Semantic Web’
  • #14 Open Data Commons Public Domain Dedication Creative Commons CC0 license