Lifting the Lid on Linked Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Lifting the Lid on Linked Data

on

  • 10,940 views

Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011

Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011

http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/

Statistics

Views

Total Views
10,940
Views on SlideShare
5,696
Embed Views
5,244

Actions

Likes
7
Downloads
36
Comments
0

10 Embeds 5,244

http://blogs.ukoln.ac.uk 4217
http://archiveshub.ac.uk 983
url_unknown 16
http://translate.googleusercontent.com 11
http://locahproject.wordpress.com 6
http://www.newsblur.com 5
http://webcache.googleusercontent.com 3
http://www.twylah.com 1
http://twitter.com 1
http://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Has been described as a ‘data commons’, or more usually a Web of Data.
  • Persitent URIs for names of things – http URIs are names, not addresses Provide information – properties and classes for a URI More links
  • In a data graph, there is no concept of roots (or a hierarchy). A graph consists of resources related to other resources, with no single resource having any particular intrinsic importance over another.
  • We have four ‘things’ here: unit of description; repostiory; finding aid; EAD document. We have given Unit of description a number of properties. Other things can also have properties (this is simplified) These properties are indicated in the green boxes. They are also called predicates.
  • In hypertext web sites it is considered generally rather bad etiquette not to link to related external material.  The value of your own information is very much a function of what it links to, as well as the inherent value of the information within the web page.  So it is also in the Semantic Web. Remember, this is about machines linking – machines need identifiers; humans generally know when something is a place or when it is a person. BBC + DBPedia + GeoNames + Archives Hub + Copac + VIAF = the Web as an exploratory space
  • Encoded Archival Description is an XML standard for encoding archival finding aids The Object Description Schema (MODS) is an XML -based bibliographic description schema MODS - Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. EAD - Things” include concepts and abstractions as well as material objects We want location – archives physical things so location important Also wanted event data, partly steered by the visualisation prototype Also ‘extent’ data – number of boxes
  • 303 and Content Neg from ‘Cool URIs for the Semantic Web’
  • Open Data Commons Public Domain Dedication Creative Commons CC0 license
  • Once you say that they are the same, the implication is that they share the same classes and properties.

Lifting the Lid on Linked Data Presentation Transcript

  • 1. Linked Data and the LOCAH project Jane Stevenson & Adrian Stevenson
  • 2. Linked Data on the Hub & Copac Linked Open Copac and Archives Hub: Locah JISC funded project August 2010 – July 2011 Mimas UKOLN Eduserv
  • 3. The goal of Linked Data is to enable people to share structured data on the Web as easily as they can share documents today. It is a space where people and organizations can post and consume data about anything. Bizer/Cyganiak/Heath Linked Data Tuturial, linkeddata.org
  • 4. Core questions
    • Is it achievable?
    • Will it bring substantial benefits?
    • “ It is the unexpected re-use of information which is the value added by the web”
  • 5. What is Linked Data?
    • 4 ‘ rules ’ of for the web of data:
    • Use URIs as names for things
    • Use HTTP URIs so that people can look up those names.
    • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
    • Include links to other URIs. so that they can discover more things.
    http://www.w3.org/DesignIssues/LinkedData.html
  • 6. Use URIs as Names
    • We can make statements about things and establish relationships by assigning identifiers to them.
    • Uniform Resource Identifiers (URIs) are identifiers for entities (people, places, subjects, records, institutions).
    • They identify resources , and ideally allow you to access representations of those resources.
    author = http://archiveshub.ac.uk/janefoaf.rdf book = http://dbpedia.org/resource/manchester subject = English = http://lexvo.org/id/iso639-3/eng
  • 7. Entities and Relationships
  • 8. ProvidesAccessTo Subject: Bibliographic Resource Predicate: AccessProvidedBy Object: Library Subject > Predicate > Object AccessProvidedBy Triple statement Bibliographic Resource Library
  • 9. describedBy heldAt encodedAs has An RDF Graph Bibliographic Resource Library Bibliographic Record MODS document Title
  • 10. So...?
    • If something is identified, it can be linked to
    • We can then t ake items from one dataset and link them to items from other datasets
    BBC VIAF DBPedia Archives Hub Copac GeoNames
  • 11. The Linking benefits of Linked Data BBC:Cranford VIAF:Gaskell DBPedia: Gaskell Hub:Gaskell Copac:Cranford Geonames:Manchester DBPedia: Dickens Hub:Dickens
  • 12. The Web of ‘ Documents ’
    • Global information space (for humans)
    • Document paradigm
    • Hyperlinks
    • Search engines index and infering relevance
    • Implicit relationships between documents
    • Lack of semantics
  • 13. The Web of Linked Data
    • Global data space (for humans and machines)
    • Making connections between entities across domains (people, books, films, music, genes, medicines, health, statistics...)
    • LD is not about searching for specific documents or visiting particular websites, it is about things - identifying and connecting them.
  • 14. Copac model
    • Groundwork done with Archives Hub. Then had to decide what we wanted to say about the data
    • Challenges over what a ‘record’ is – ‘Bleak House’ from each contributor? or one merged record?
    • In many ways simpler than archival data; but also can decide to create a simpler model
  • 15. Copac Model (as at November 2010)
  • 16. Copac specification
    • Model = entities and relationships
    • Specification = means to specify these more exactly – programmer can create transform script
    • Iterative process – model – spec – RDF output
  • 17. Cardinality Property URI/literal 1 1 dct:title literal 0 1 dct:extent literal 0 m bibo:isbn literal 0 m bibo:issn literal 0 m bibo:note literal 0 m dct:alternative literal 0 m copac:uniformtitle literal Node name MODS field Ontology BibliographicResource <modscollection> bibo
  • 18. Node name MODS field Ontology BibliographicResource <modscollection> bibo cardinality property URI/literal ontology 0 1 copac:creator Creator URI dc 0 m copac:contributor Contributor URI coapc 0 1 event:producedIn Production Date URI event 0 1 dct:issued Production Date URI dc 0 m pode:publicationPlace Place URI pode 0 m isbd:P1016 Place URI isbd 0 m dct:publisher Publisher URI dc 0 1 dct:isPartOf Series URI dc 1 m copac:HeldBy Institution URI with Institution as subject 1 1 bibo:type Type URI bibo 0 m dct:subject Subject URI dc 0 m skos:subject subject URI skos 0 m dct:language Language URI dc 1 1 hub:encodedAs mods URI hub
  • 19. cardinality property URI/literal URI 1 1 rdf:type URIs http://purl.org/dc/terms/Agenthttp://xmlns.com/foaf/0.1/Agent 1 1 rdfs:label literal {namePart} 1 1 skos:prefLabel literal {namePart} 1 1 isCreatorOf Bibliographic Resource URI root/id/bibliographicresource/{recordIdentifer} Node name MODS field URI namespace uri pattern Creator <name> <namePart></namePart>where <roleTerm>creator</roleTerm> copac root/id/agent/{BibID}{namePart}
  • 20. Aggregated Data
  • 21. Aggregated data
    • Copac MODS record = an aggregated book record
    • e.g. ‘Bleak House’ held at 10 different libraries
    • Copac ‘merges’ the descriptions from 8 of them
    • 2 are not consistent with the rest, so they remain as stand-alone descriptions
    • End result: have 3 records for ‘Bleak House’
    • Not talking about ‘a book’
  • 22. Copac decisions
    • Vocabularies:
      • dcterms:creator
      • dcterms:contributor
      • copac:heldBy
    • When to create URIs
      • Title = literal
      • Publication place = URI
    • How to deal with problematic/ambiguous data
      • Date? = productionDate
  • 23. ‘ Creator’
      • Copac ‘creator’ = author or editor
    • <copac:creator> <dcterms:creator> <biblioResource>
    • Alternative name = dct:alternative
    • Uniform name = copac:uniform
    6957115KNAPPF 6947115 <isCreatorOf>
  • 24. ‘ Contributor’
    • Contributor = editor, illustrator, translator
    • Cannot specify role – has to be general
    • <dcterms:contributor>
  • 25. RDF Process
  • 26. What is LOCAH doing?
    • Part 1: Exposing the Linked Data
    • Part 2: Creating a prototype visualisation
    • Part 3: Reporting on opportunities and barriers
  • 27. How are we exposing the Data?
    • Model our ‘things’ into RDF
    • Transform the existing data into RDF/XML
    • Enhance the data
    • Load the RDF/XML into a triple store
    • Create Linked Data Views
    • Document the process, opportunities and barriers on LOCAH Blog
  • 28. 1. Modelling ‘things’ into RDF
    • Hub data in ‘Encoded Archival Description’ EAD XML form
    • Copac data in ‘Metadata Object Description Schema’ MODS XML form
    • Take a step back from the data format
      • Think about your ‘things’
      • What is EAD document “saying” about “things in the world”?
      • What questions do we want to answer about those “things”?
    http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/
  • 29. 1. Modelling ‘things’ into RDF
    • Need to decide on patterns for URIs we generate
    • Following guidance from W3C ‘ Cool URIs for the Semantic Web ’ and UK Cabinet Office ‘ Designing URI Sets for the UK Public Sector ’
      • http://data.archiveshub.ac.uk/ id /findingaid/gb1086skinner ‘thing’ URI
      • … is HTTP 303 ‘See Other’ redirected to …
      • http://data.archiveshub.ac.uk/ doc /findingaid/gb1086skinner document URI
      • … which is then content negotiated to …
      • http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .html http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .rdf http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .turtle http://data.archiveshub.ac.uk/doc/findingaid/gb1086skinner .json
      • http://www.w3.org/TR/cooluris/ http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
  • 30. 1. Modelling ‘things’ into RDF
    • Using existing RDF vocabularies:
      • DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies
    • Define additional RDF terms where required,
      • copac:BibiographicResource
      • copac:Creator
    • It can be hard to know where to look for vocabs and ontologies
    • Decide on licence – CC BY-NC 2.0, CC0, ODC PDD
  • 31. Vocabularies in Linked Data
    • Common vocabularies to describe the data, e.g. ‘ title ’ ‘ author ’ ‘ contributor ’ mean the same thing
    • Adopt the same vocabularies for expressing meaning
    • Use semantics to link data
    • Want to avoid transformation, mapping, contracts between data providers
  • 32. Commonly used vocabularies (ones we’ve used in bold)
    • Friend-of-a-Friend (FOAF) , vocabulary for describing people.
    • Dublin Core (DC) defines general metadata attributes. See also their new domains and ranges draft .
    • Semantically-Interlinked Online Communities (SIOC) , vocabulary for representing online communities.
    • Description of a Project (DOAP) , vocabulary for describing projects.
    • Simple Knowledge Organization System (SKOS) , vocabulary for representing taxonomies and loosely structured knowledge.
    • Music Ontology provides terms for describing artists, albums and tracks.
    • Review Vocabulary , vocabulary for representing reviews.
    • Creative Commons (CC) , vocabulary for describing license terms.
    • Bibo, vocabulary for bibliographic data
  • 33. Copac RDF DC foaf skos Copac bibo dcterms:title dcterms:identifier Shared use of vocabularies Hub RDF DC foaf skos Hub
  • 34. 2. Transforming in RDF/XML
    • Transform EAD and MODS to RDF/XML based on our models
    • Hub: created XSLT Stylesheet and used Saxon parser
      • http://saxon.sourceforge.net/
      • Saxon runs the XSLT against a set of EAD files and creates a set of RDF/XML files
    • Copac: created in-house Java transformation program
  • 35. 3. Enhancing our data
      • Language - lexvo.org
      • Time periods - reference.data.gov.uk
      • Geolocation - UK Postcodes URIs and Ordnance Survey URIs
      • Names - Virtual International Authority File
        • Matches and links widely-used authority files - http://viaf.org/
      • Names (and subjects) - DBPedia
      • Subjects - Library of Congress Subject Headings
  • 36. Use of ‘SameAs’
    • <sameAs>
    • Estelle Sylvia Pankhurst, 1882-1960: http://archiveshub.ac.uk/data/gb-106-7esp http://viaf.org/viaf/51731588/
    • John William Bradley, fl. 1874: http://archiveshub.ac.uk/data/gb0096ms415 http://viaf.org/viaf/61047183/
  • 37.  
  • 38.  
  • 39.  
  • 40. 4. Load RDF/XML into triple store
    • Using the Talis Platform triple store
    • RDF/XML is HTTP POSTed
    • We’re using Pynappl
      • Python client for the Talis Platform
      • http://code.google.com/p/pynappl/
    • Store provides us with a SPARQL query interface
  • 41. 5. Create Linked Data Views
    • Expose ‘bounded’ descriptions from the triple store over the Web
    • Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)
    • Using Paget ‘Linked Data Publishing Framework’
      • http://code.google.com/p/paget/
      • PHP scripts query Sparql endpoint
  • 42. http://data.archiveshub.ac.uk/id/archivalresource/gb1086skinner
  • 43. http://data.archiveshub.ac.uk/
  • 44. Accessing the Locah Linked Data
    • Hub data released
    • Copac data release imminent
    • Include Linked Data views, Sparql endpoint details, example queries and supporting documentation
  • 45. Reporting on opportunities and barriers
    • Locah Blog (tags: ‘opportunities’ ‘barriers’)
    • Feed into #JiscEXPO programme evidence gathering
    • More at:
      • http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/
      • http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data
  • 46. Feedback Requested!
    • We would like feedback on the project
    • Via blog
      • http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/
      • http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/
      • http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/
    • Via email, twitter, in person
  • 47. Creating a Visualisation Prototype
    • Currently working on Hub visualisation
    • Data queried from Sparql endpoint
    • Use tools such as Simile, Many Eyes, Google Charts
    • Timemap visualisation
      • Googlemaps and Simile
      • http://code.google.com/p/timemap/
  • 48. Visualisation Prototype
    • Using Timemap –
      • Googlemaps and Simile
      • http://code.google.com/p/timemap /
    • Early stages with this
    • Will give location and ‘extent’ of archive.
    • Will link through to Archives Hub
  • 49.  
  • 50. http://socialarchive.iath.virginia.edu/prototype.html
  • 51. The learning process
    • Model the data, not the description
    • The description is one of the entities
    • Understand the importance of URIs
    • Think about your world before others
    • … but external links are important
    • Try to get to grips with terminology
    • Be prepared for unexpected surprises!
  • 52. Risks
    • Can you rely on data sources long-term?
    • Persistence of persistent URIs?
    • New technologies
    • Investment of time – unsure of benefits
    • Licensing issues
  • 53. Licensing
    • Nature of Linked Data: each triple as a piece of data
    • ‘ Ownership’ of data?
    • Data often already freely available (M2M interfaces)
  • 54. Licensing
    • Public Domain Licences: simple, explicit, and permit widest possible reuse. Waive all rights to the data
    • BL, British National Bibiography uses public domain licence
    • Limit commercial uses?
    • Build in community norms: attribution, share alike - to reinforce desire for acknowledgement
    • Legal situation?
  • 55. Thank You
  • 56. Sections of this presentation adapted from materials created by other members of the LOCAH Project This presentation available under creative commons Non Commercial-Share Alike: http://creativecommons.org/licenses/by-nc/2.0/uk/ Attribution and CC licence