Your SlideShare is downloading. ×
Linked Data and Locah, UKSG2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Linked Data and Locah, UKSG2011

1,286
views

Published on

An introduction to Linked Data and to the Linked Open Copac and Archives Hub project.

An introduction to Linked Data and to the Linked Open Copac and Archives Hub project.

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,286
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Has been described as a ‘data commons’, or more usually a Web of Data.
  • Problem for machines to extract meaning. At present, the raw data is not really available.
  • Persitent URIs for names of things – http URIs are names, not addressesProvide information – properties and classes for a URIMore links
  • Things are resources because someone created a URI to identify them, not because they have some particular properties in and of themselves.HTTP URIs provide a simple way to create globally unique names without centralized management; and URIs work not just as a name but also as a means of accessing information about a resource over the Web
  • In a data graph, there is no concept of roots (or a hierarchy). A graph consists of resources related to other resources, with no single resource having any particular intrinsic importance over another.
  • This subject – the archive itself – has a page (foaf:page being the property) with name ‘finding aid’. The ‘finding aid’ is the object of this statement, but is also itself a subject. A subject in an RDF document may also be referenced as an object of a property in another RDF statement.
  • We have four ‘things’ here: unit of description; repostiory; finding aid; EAD document. We have given Unit of description a number of properties. Other things can also have properties (this is simplified)These properties are indicated in the green boxes. They are also called predicates.
  • In hypertext web sites it is considered generally rather bad etiquette not to link to related external material.  The value of your own information is very much a function of what it links to, as well as the inherent value of the information within the web page.  So it is also in the Semantic Web.Remember, this is about machines linking – machines need identifiers; humans generally know when something is a place or when it is a person. BBC + DBPedia + GeoNames + Archives Hub + Copac + VIAF = the Web as an exploratory space
  • Once you say that they are the same, the implication is that they share the same classes and properties.
  • Ontology defines a ‘knowledge domain’
  • Encoded Archival Description is an XML standard for encoding archival finding aidsThe Object Description Schema (MODS) is an XML-based bibliographic description schemaMODS - Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.EAD - Things” include concepts and abstractions as well as material objects We want location – archives physical things so location importantAlso wanted event data, partly steered by the visualisation prototypeAlso ‘extent’ data – number of boxes
  • 303 and Content Neg from ‘Cool URIs for the Semantic Web’
  • Open Data Commons Public Domain DedicationCreative Commons CC0 license
  • e.g. index terms may not always apply down the hierarchy of the descriptionWe are pulling <repository> down into lower-level descriptions
  • Transcript

    • 1. How to Become a First Class Citizen of the Web
      Linked Data and the LOCAH project
      Jane Stevenson & Adrian Stevenson
    • 2. Remit
      This session will give a brief overview of the concepts behind Linked Data and will explain how we are applying these ideas to archival and bibliographic data.
      Archives Hub: merged catalogue of archival descriptions from 200 institutions across the UK
      Copac: merged catalogue of bibliographic records from libraries across the UK
    • 3. Introduction
    • 4. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today.
      [The creation of] a space where people and organizations can post and consume data about anything.
      Bizer/Cyganiak/Heath Linked Data Tuturial, linkeddata.org
    • 5. In essence, it marks a shift in thinking from publishing data in human readable HTML documents to machine readable documents. That means that machines can do a little more of the thinking work for us.
      http://www.linkeddatatools.com/semantic-web-basics
    • 6. Linked Data encourages open data, open licences and reuse.
      …but Linked Data does not have to be open.
    • 7. Core questions
      Is it achievable?
      Will it bring substantial benefits?
      “It is the unexpected re-use of information which is the value added by the web”
    • 8. What is Linked Data?
      4 ‘rules’ of for the web of data:
      Use URIs as names for things
      Use HTTP URIs so that people can look up those names.
      When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
      Include links to other URIs. so that they can discover more things.
      http://www.w3.org/DesignIssues/LinkedData.html
    • 9. Giving Things identifiers
      We can make statements about things and establish relationships by assigning identifiers to them.
      Jane Stevenson = http://archiveshub.ac.uk/janefoaf.rdf
      Manchester = http://dbpedia.org/resource/manchester
      English = http://lexvo.org/id/iso639-3/eng
    • 10. URIs
      Uniform Resource Identifiers (URIs) are identifiers for entities (people, places, subjects, records, institutions).
      They identify resources, and ideally allow you to access representations of those resources.
      Think not of locations, but of identifiers!
      For Linked Data you use HTTP URIs
      Jane Stevenson = http://archiveshub.ac.uk/janefoaf.rdf
      Manchester = http://dbpedia.org/resource/manchester
      English = http://lexvo.org/id/iso639-3/eng
    • 11. Entities and Relationships
    • 12. Triple statement
      AccessProvidedBy
      Archival Resource
      Repository
      ProvidesAccessTo
      Subject: Archival Resource
      Predicate: AccessProvidedBy
      Object: Repository
      Subject > Predicate > Object
    • 13. HTTP URIs
      ArchivalResource: http://data.archiveshub.ac.uk/id/findingaid/gb-106-7esp
      <accessProvidedBy>
      Repository: http://data.archiveshub.ac.uk/id/repository/gb106
      Archival Resource
      Repository
      accessProvidedBy
    • 14. An RDF Graph
      Title
      has
      Archival Resource
      Repository
      heldAt
      describedBy
      encodedAs
      Finding Aid
      EAD document
    • 15. So...?
      If something is identified, it can be linked to
      We can then take items from one dataset and link them to items from other datasets
      BBC
      Copac
      VIAF
      DBPedia
      GeoNames
      Archives Hub
    • 16. The Linking benefits of Linked Data
      BBC:Cranford
      Copac:Cranford
      VIAF:Dickens
      DBPedia: Gaskell
      Hub:Gaskell
      Geonames:Manchester
      DBPedia: Dickens
      Hub:Dickens
    • 17. The Web of ‘Documents’
      Global information space (for humans)
      Document paradigm
      Hyperlinks
      Search engines index and infering relevance
      Implicit relationships between documents
      Lack of semantics
    • 18. The Web of Linked Data
      Global data space (for humans and machines)
      Making connections between entities across domains (people, books, films, music, genes, medicines, health, statistics...)
      LD is not about searching for specific documents or visiting particular websites, it is about things - identifying and connecting them.
      Closely aligned to the general architecture of the Web
    • 19. From one thing…to the same thing
      <sameAs>
      http://dbpedia.org/resource/manchester
      http://sws.geonames.org/2643123
      http://data.archiveshub.ac.uk/id/concept/ncarules/manchester
      Are they the same?
    • 20. Vocabularies & Ontologies
    • 21. Vocabularies & Ontologies
      Vocabulary: set of terms
      Ontology: organisation of terms – hierarchy, relationships
    • 22. Shared vocabularies
      Problems of data integration: information exchange across independently designed systems
      Two different databases: one for films one for actors
      To collaborate using their current databases, the owners of either site would have to decide on a common data format by which to share information that they could both understand by using a common film and actor unique ID scheme of their own invention.
    • 23. Need ‘film title’; ‘actor name’; ‘actor birthdate’, etc. to mean the same thing to each
      Use the same vocabulary
      Query both databases.
      No need for transformations, mappings, contracts
    • 24. Vocabularies in Linked Data
      Common vocabulary to describe the data, e.g. ‘film-title’ means the same thing
      Adopt the same ontologies for expressing meaning
      Use semantics to link data
      Want to avoid transformation, mapping, contracts between data providers
    • 25. Shared use of vocabularies
      DC
      DC
      Copac
      Hub
      Hub RDF
      Copac RDF
      foaf
      bibo
      foaf
      skos
      skos
      dcterms:title
      dcterms:identifier
    • 26. Ontologies
      Many widely used ontologies
      Use others as far as possible
      Use your own where necessary
      Dublin Core
      Friend of a Friend (FOAF)
      Simple Knowledge Organisation System (SKOS)
      Bibo
      Open Cyc
    • 27. Linked Data on the Hub & Copac
      Linked Open Copac and Archives Hub: Locah
      JISC funded project
      August 2010 – July 2011
      Mimas
      UKOLN
      Eduserv
    • 28. What is LOCAH doing?
      Part 1: Exposing the Linked Data
      Part 2: Creating a prototype visualisation
      Part 3: Reporting on opportunities and barriers
    • 29. How are we exposing the Data?
      Model our ‘things’ into RDF
      Transform the existing data into RDF/XML
      Enhance the data
      Load the RDF/XML into a triple store
      Create Linked Data Views
      Document the process, opportunities and barriers on LOCAH Blog
    • 30. 1. Modelling ‘things’ into RDF
      Hub data in ‘Encoded Archival Description’ EAD XML form
      Copac data in ‘Metadata Object Description Schema’ MODS XML form
      Take a step back from the data format
      Think about your ‘things’
      What is EAD document “saying” about “things in the world”?
      What questions do we want to answer about those “things”?
      http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/
    • 31. 1. Modelling ‘things’ into RDF
      Need to decide on patterns for URIs we generate
      Following guidance from W3C ‘Cool URIs for the Semantic Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’
      http://data.archiveshub.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI
      … is HTTP 303 ‘See Other’ redirected to …
      http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner document URI
      … which is then content negotiated to …
      http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.htmlhttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.turtlehttp://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner.json
      http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
    • 32. 1. Modelling ‘things’ into RDF
      Using existing RDF vocabularies:
      DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies
      Define additional RDF terms where required,
      hub:ArchivalResource
      copac:BibiographicResource
      hub:maintenanceAgency
      copac:Creator
      It can be hard to know where to look for vocabs and ontologies
      Decide on licence – CC BY-NC 2.0, CC0, ODC PDD
    • 33. Archives Hub Model (as at 14/2/2011)
      in
      Finding Aid
      Place
      PostcodeUnit
      Repository(Agent)
      administeredBy/administers
      maintainedBy/maintains
      encodedAs/encodes
      hasPart/partOf
      EAD Document
      accessProvidedBy/providesAccessTo
      Level
      Biographical History
      topic/page
      hasBiogHist/isBiogHistFor
      Language
      level
      ArchivalResource
      language
      at time
      topic/page
      origination
      hasPart/partOf
      TemporalEntity
      Creation
      product of
      associatedWith
      extent
      inScheme
      Extent
      Concept
      ConceptScheme
      Agent
      representedBy
      Object
      foaf:focus
      Is-a
      Is-a
      associatedWith
      Family
      Person
      Organisation
      Place
      Book
      participates in
      Genre
      Function
      Birth
      Death
      TemporalEntity
      at time
    • 34. Copac Model (as at November 2010)
    • 35. Feedback Requested!
      We would like feedback on the model
      Appreciate this will be easier when the data available
      Via blog
      http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/
      http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/
      http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/
      Via email, twitter, in person
    • 36. 2. Transforming in RDF/XML
      Transform EAD and MODS to RDF/XML based on our models
      Hub: created XSLT Stylesheet and used Saxon parser
      http://saxon.sourceforge.net/
      Saxon runs the XSLT against a set of EAD files and creates a set of RDF/XML files
      Copac: created in-house Java transformation program
    • 37. 3. Enhancing our data
      Language - lexvo.org
      Time periods - reference.data.gov.uk
      Geolocation - UK Postcodes URIs and Ordnance Survey URIs
      Names - Virtual International Authority File
      Matches and links widely-used authority files - http://viaf.org/
      Names (and subjects) - DBPedia
      Subjects - Library of Congress Subject Headings
    • 38. 4. Load RDF/XML into triple store
      Using the Talis Platform triple store
      RDF/XML is HTTP POSTed
      We’re using Pynappl
      Python client for the Talis Platform
      http://code.google.com/p/pynappl/
      Store provides us with a SPARQL query interface
    • 39. 5. Create Linked Data Views
      Expose ‘bounded’ descriptions from the triple store over the Web
      Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)
      Using Paget ‘Linked Data Publishing Framework’
      http://code.google.com/p/paget/
      PHP scripts query Sparql endpoint
    • 40. http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner
    • 41. http://data.archiveshub.ac.uk/
    • 42. Can I access the Locah Linked Data?
      Will be releasing the Hub data very soon!
      Copac data will follow approx 1 month later
      Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation
    • 43. Reporting on opportunities and barriers
      Locah Blog (tags: ‘opportunities’ ‘barriers’)
      Feed into #JiscEXPO programme evidence gathering
      More at:
      http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/
      http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data
    • 44. Creating the Visualisation Prototype
      Based on researcher use cases
      Data queried from Sparql endpoint
      Use tools such as Simile, Many Eyes, Google Charts
      For first Hub visualisation using Timemap –
      Googlemaps and Simile
      http://code.google.com/p/timemap/
    • 45. Visualisation Prototype
      Using Timemap –
      Googlemaps and Simile
      http://code.google.com/p/timemap/
      Early stages with this
      Will give location and ‘extent’ of archive.
      Will link through to Archives Hub
    • 46. Sir Ernest Henry Shackleton
      http://archiveshub.ac.uk/data/gb15sirernesthenryshackleton
      Archives related to Shackleton:
      VIAF URL: http://viaf.org/viaf/12338195/
      Books related to Shackleton:
      Biographical History:
      Ernest Henry Shackleton was born on 15 February 1874 in Kilkea, Ireland, one of six children of Anglo-Irish parents. The family moved from their farm to Dublin, where his father, Henry studied medicine. On qualifying in 1884, Henry took up a practice in south London, and between 1887 and 1890, Ernest was educated at Dulwich College. On leaving school, he entered the merchant service, serving in the square-rigged ship Hoghton Tower until 1894 when he transferred to tramp steamers. In 1896, he qualified as first mate, and two years later, was certified as master, joining the Union Castle line in 1899. [more]
    • 47. The challenges
    • 48. The learning process
      Model the data, not the description
      The description is one of the entities
      Understand the importance of URIs
      Think about your world before others
      …but external links are important
      Try to get to grips with terminology
    • 49. Names
      6947115KNAPPF
      F Knapp associated with record 6947115
      /id/agent/6947115KNAPPF
      <copac:isCreatorOf rdf:resource="http://data.copac.ac.uk/id/mods/6947115"/>
      6957115KNAPPF
      6947115
      <isCreatorOf>
    • 50. Index terms (names, subjects, places)
      ‘AssociatedWith’ as the relationship
      Benefits of structured index terms
      Use /person/ and /organisation/ in the URI
      Distinguish /person/pilkington’ the person and /organisation/pilkington
      Distinguish place/reading/ and subject/reading/
    • 51. Problems with source data
      EAD very permissive: whole range of finding aids
      Copac more consistent but still wide variety
      Hub EAD: We limited the tags we worked with
      Large files (around 5Mb) tend to need splitting up
    • 52. Duplication of data
      “So statements which relate things in the two documents must be repeated in each. This clearly is against the first rule of data storage: don't store the same data in two different places: you will have problems keeping it consistent.” (T B-L www.w3.org/designissues/linkeddata.html)
    • 53. Archival Inheritance
      “Do not repeat information at a lower level of description that has already been given at a higher level.” ISAD(G)
      Many elements do not apply to ‘child’ descriptions
      Simple rule of inheritance not always appropriate
      LD does assert hierarchical relationships but no requirement to follow these links
    • 54. Copac
      Larger community: more potential vocabularies/documentation/support/confusion/inconsistencies
      Merged catalogues: a unique scenario
      ‘Creator’ and ‘Others’ (editor, authors, illustrator)
      Learning from Hub / Doing what is appropriate
      Usually not right or wrong answers
    • 55. Copac model
      Groundwork done with Archives Hub. Then had to decide what we wanted to say about the data
      Challenges over what a ‘record’ is – ‘Bleak House’ from each contributor? or one merged record?
      In many ways simpler than archival data; but also can decide to create a simpler model
    • 56. Copac Model
    • 57. Copac specification
      Hard to start but proved to be very crucial
      Very iterative process between spec and RDF output
      Important to establish the structure of the spec (we used tabs for each ‘entity’)
    • 58. Copac specification
    • 59. Copac decisions
      Where to create Copac URIs –
      copac:creator
      copac:contributor
      copac:heldBy
      When to create URIs
      Title = literal
      Publication place = URI
      How to deal with problematic/ambiguous data
      Date? = productionDate
    • 60. Issues
    • 61. Risks
      Can you rely on data sources long-term?
      Persistence of persistent URIs?
      New technologies
      Investment of time – unsure of benefits
      Licensing issues
    • 62. Provenance
      Track which data comes from our sources: URIs identify your entities
      Linked Data tends towards disassembling
      Copac/Hub as trusted sources…is DBPedia (for example) as reliable?
      Contributors may want data to be identified
      Issues around administrative/biographical history
      Benefits of trust?
      Users may want to know where data is from
    • 63. Licensing
      Nature of Linked Data: each triple as a piece of data
      ‘Ownership’ of data?
      Data often already freely available (M2M interfaces)
    • 64. Licensing
      Public Domain Licences: simple, explicit, and permit widest possible reuse. Waive all rights to the data
      BL, British National Bibiography uses public domain licence
      Limit commercial uses?
      Build in community norms: attribution, share alike - to reinforce desire for acknowledgement
      Legal situation?
    • 65. Thank You
    • 66. Attribution and CC licence
      Sections of this presentation adapted from materials created by other members of the LOCAH Project
      This presentation available under creative commonsNon Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/