LOCAH Project and Considerations of Linked Data Approaches


Published on

Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011


Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Has been described as a ‘data commons’, or more usually a Web of Data.
  • Problem for machines to extract meaning. At present, the raw data is not really available.
  • Data.gov.uk Newspaper
  • Principles underpinning the technology
  • Step back a bit to HTML HTML web of documents doesn’t encourage re-use, reduce redundancy. Are network effects but could be much better.
  • Note this is a considerable simplification of the detail in danger of misleading. Linked data exploits semantically meaningful tagging to encourage re-use, reduce redundancy etc.
  • http://www.w3.org/DesignIssues/LinkedData.html
  • Uses predicate logic. Goes back to Aristotle. Conceptualises things, and the relationships between things
  • Copac a union catalogue Both successful JISC services running for many years now Locah is a research project – will have to see if go into service with LD interface
  • In hypertext web sites it is considered generally rather bad etiquette not to link to related external material. The value of your own information is very much a function of what it links to, as well as the inherent value of the information within the web page.  So it is also in the Semantic Web. Remember, this is about machines linking – machines need identifiers; humans generally know when something is a place or when it is a person. BBC + DBPedia + GeoNames + Archives Hub + Copac + VIAF = the Web as an exploratory space
  • You can imagine the research benefits if these principle were applied to datasets of your own interest area. British Library example. Improves Google visibility. Already seeing traffic via data.archiveshub.ac.uk
  • The aggregation and merging of the Hub data sources is enabled by the use of linked data. Linked data can also provide enrichment by linking to other data sources such as dbpedia.
  • “ lower level” units interpreted in context of the higher levels of description Arguably “incomplete” without the contextual data. Relations are asserted, e.g. member-of/component-of But there is no requirement or expectation that data consumers will follow the links describing the relations
  • LOCAH Project and Considerations of Linked Data Approaches

    1. 1. UKOLN is supported by: LOCAH Project and Considerations of Linked Data Approaches 29 th March 2011 JISC Managing Research Data International Workshop, Birmingham, UK Adrian Stevenson LOCAH Project Manager
    2. 2. <ul><li>“ The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.” </li></ul><ul><li>“ the Semantic Web is the goal or end result… Linked Data provides the means to reach that goal” </li></ul><ul><li>From ‘ Linked Data: The Story So Far ’ - Heath, Bizer and Berners-Lee 2009 </li></ul>
    3. 3. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. Bizer/Cyganiak/Heath Linked Data Tutorial, linkeddata.org
    4. 4. In essence, it marks a shift in thinking from publishing data in human readable HTML documents to machine readable documents. That means that machines can do a little more of the thinking work for us. http://www.linkeddatatools.com/semantic-web-basics
    5. 5. <ul><li>But haven’t we been putting linked data on the web for years? </li></ul><ul><ul><li>In CSV , relational databases, XML etc? </li></ul></ul><ul><li>Well yes, but these approaches are not so easy to integrate </li></ul><ul><li>Web 2.0 mashups work against a fixed set of data sources </li></ul><ul><li>Linked Data applications operate on top of an unbound, global data space. </li></ul>
    6. 6. So what’s been happening?
    7. 8. Data.gov.uk Officially launched 21 st January 2010
    8. 9. BBC Music
    9. 10. A little bit of the techy stuff
    10. 11. Linked Data is … <ul><li>A way of publishing data on the web that: </li></ul><ul><ul><li>Encourages reuse </li></ul></ul><ul><ul><li>Reduces redundancy </li></ul></ul><ul><ul><li>Maximises inter-connectedness </li></ul></ul><ul><ul><li>Enables network effects </li></ul></ul><ul><li>So how is this achieved? </li></ul>
    11. 12. Presentational tagging – HTML <ul><li><h1>Manchester Physiotherapy Centre</h1> <p>Welcome to the Manchester Physiotherapy Centre home page. Do you feel pain? Have you had an injury? Let our staff take care of your body and soul.</p> <h2>Consultation hours</h2> Mon 11am - 7pm<br/> Tue 11am - 7pm<br/> Wed 3pm - 7pm<br/> Thu 11am - 7pm<br/> Fri 11am - 3pm </li></ul><ul><li><p> Please note that we will not be offering consultation during the weeks of the <a href=&quot;. . .&quot;>Olympic</a> games.</p> </li></ul>
    12. 13. Semantic tagging <ul><li><company> </li></ul><ul><li><treatmentOffered>Physiotherapy</treatmentOffered> </li></ul><ul><li><companyName>Manchester Physiotherapy Centre</companyName> </li></ul><ul><li><staff> </li></ul><ul><li><therapist>Lisa Davenport</therapist> <therapist>Steve Matthews</therapist> </li></ul><ul><li><secretary>Kelly Townsend</secretary> </li></ul><ul><li></staff> </li></ul><ul><li></company> </li></ul>
    13. 14. Linked Data Design Issues <ul><li>URIs </li></ul><ul><li>LD Design Issues </li></ul><ul><li>Triples </li></ul>http://www.w3.org/DesignIssues/LinkedData.html
    14. 15. URIs and HTTP <ul><li>A ‘Uniform Resource Identifier’ (URI) provides a simple and extensible means for identifying a resource - RFC 3986 </li></ul><ul><ul><li>HTTP URIs can be ‘de-referenced’ </li></ul></ul><ul><ul><ul><li>A URL is a type of URI </li></ul></ul></ul><ul><li>HTTP URIs are used for “real world” things </li></ul><ul><ul><ul><li>http://adrianstevenson.com/id/me </li></ul></ul></ul><ul><ul><ul><li>http://dbpedia.org/page/Tim_Berners-Lee </li></ul></ul></ul>
    15. 16. RDF <ul><li>Resource Description Framework </li></ul><ul><ul><li>a language for representing information about resources on the Web </li></ul></ul><ul><ul><li>RDF can be used to represent things identified on the Web, even when they cannot be directly retrieved on the Web </li></ul></ul><ul><li>Describes relations using ‘triples’ </li></ul><ul><li>http://www.w3.org/TR/REC-rdf-syntax/ </li></ul>
    16. 17. Triples <ul><li>Triples statements </li></ul><ul><ul><li>‘ Things’ have ‘properties’ with ‘values’ </li></ul></ul><ul><ul><li>Subject – Predicate - Object </li></ul></ul><ul><li>Triples are the basis of RDF </li></ul>Archival Resource Repository Provides Access To The Rolling Stones Keith Richards Is Member Of
    17. 18. BBC Music
    18. 19. LOCAH Project
    19. 20. What is the LOCAH Project? <ul><li>L inked O pen C opac and A rchives H ub </li></ul><ul><li>Funded by #JiscEXPO 2/10 ‘Expose’ call </li></ul><ul><li>1 year project. Started August 2010 </li></ul><ul><li>http://blogs.ukoln.ac.uk/locah/ tag: #locah </li></ul>
    20. 21. What are the Archives Hub and Copac? <ul><li>National data services </li></ul><ul><li>The Archives Hub is an aggregation of archival descriptions from archive repositories across the UK </li></ul><ul><ul><li>http://archiveshub.ac.uk </li></ul></ul><ul><li>Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries </li></ul><ul><ul><li>http://copac.ac.uk </li></ul></ul>
    21. 22. What is LOCAH Doing? <ul><li>Part 1: Exposing Archives Hub & Copac data as Linked Data </li></ul><ul><li>Part 2: Creating a prototype visualisation </li></ul><ul><li>Part 3: Reporting on opportunities and barriers </li></ul>
    22. 23. LOCAH Linked Data <ul><li>If something is identified, it can be linked to </li></ul><ul><li>We can then take items from one dataset and link them to items from other datasets </li></ul>BBC VIAF DBPedia Archives Hub Copac GeoNames
    23. 24. BBC:Cranford VIAF:Dickens DBpedia: Gaskell Hub:Gaskell Copac:Cranford Geonames:Manchester DBpedia: Dickens Hub:Dickens The Linking benefits of Linked Data
    24. 25. Archives Hub Model (as at 14/2/2011) Archival Resource Finding Aid EAD Document Biographical History Agent Family Person Place Concept Genre Function Organisation maintainedBy/ maintains origination associatedWith accessProvidedBy/ providesAccessTo topic/ page hasPart/ partOf hasPart/ partOf encodedAs/ encodes Repository (Agent) Book Place topic/ page Language Level administeredBy/ administers hasBiogHist/ isBiogHistFor foaf:focus Is-a associatedWith level Is-a language Concept Scheme inScheme Object representedBy Postcode Unit Extent Creation Birth Death extent participates in Temporal Entity Temporal Entity at time at time product of in
    25. 26. Enhancing our data <ul><li>Already have some links: </li></ul><ul><ul><li>lexvo.org URIs for languages of archival materials </li></ul></ul><ul><ul><li>reference.data.gov.uk URIs for time periods </li></ul></ul><ul><ul><li>Postcodes, using both UK Postcodes URIs and Ordnance Survey URIs </li></ul></ul><ul><ul><li>Virtual International Authority File </li></ul></ul><ul><ul><ul><li>Matches and links widely-used authority files - http://viaf.org/ </li></ul></ul></ul><ul><ul><li>DBPedia </li></ul></ul><ul><li>Also looking at: </li></ul><ul><ul><li>Library Congress Subject Headings </li></ul></ul>
    26. 27. http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner
    27. 28. http://data.archiveshub.ac.uk/doc/person/ncarules/chamberlainarthurneville1869-1940statesman
    28. 29. How are we creating the Visualisation Prototype? <ul><li>Based on researcher use cases </li></ul><ul><li>Data queried from Sparql endpoint </li></ul><ul><li>Use tools such as Simile, Many Eyes, Google Charts </li></ul><ul><li>Also looking at custom built prototype </li></ul>
    29. 30. Use Case Slide http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_LOCAH
    30. 31. Visualisation Prototype <ul><li>Using Timemap – </li></ul><ul><ul><li>Googlemaps and Simile </li></ul></ul><ul><ul><li>http://code.google.com/p/timemap / </li></ul></ul><ul><li>Early stages with this </li></ul><ul><li>Will give location and ‘extent’ of archive. </li></ul><ul><li>Will link through to Archives Hub </li></ul>
    31. 32. Some issues <ul><li>Data Modelling </li></ul><ul><li>Sustainability </li></ul><ul><li>Provenance </li></ul><ul><li>Licensing </li></ul>
    32. 33. Data Modelling Challenges <ul><li>Archival description is hierarchical and multi-level </li></ul><ul><li>Archives Hub: inconsistencies in data and lack of standardisation </li></ul><ul><ul><li>there's no content standard in the UK </li></ul></ul>
    33. 34. Sustainability <ul><li>Can you rely on data sources long-term? </li></ul><ul><li>Ed Summers at the Library of Congress created http://lcsh.info </li></ul><ul><li>Linked Data interface for LOC subject headings </li></ul><ul><li>People started using it </li></ul>
    34. 35. Library of Congress Subject Headings
    35. 36. Provenance <ul><li>Triples create individual statements </li></ul><ul><li>OK if data ‘watermarked’ </li></ul><ul><li>But can often be a problem </li></ul>
    36. 37. Licensing <ul><li>Nature of Linked Data: each triple as a piece of data </li></ul><ul><li>‘ Ownership’ of data </li></ul><ul><li>Hard to track attribution </li></ul><ul><li>We’re using CC BY-NC 2.0 for now </li></ul>
    37. 38. Questions?
    38. 39. Attribution and CC License <ul><li>Sections of this presentation adapted from materials created by other members of the LOCAH Project </li></ul><ul><li>This presentation available under creative commons Non Commercial-Share Alike: </li></ul><ul><li>http://creativecommons.org/licenses/by-nc/2.0/uk/ </li></ul>