Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked data for librarians


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Linked data for librarians

  1. 1. Linked Data Fundamentals Trevor Thornton Senior Applications Developer, NYPL Labs The New York Public Library
  2. 2. Linked Data Data published on the Web in accordance with principles designed to facilitate linkages between resources The potential for linked data in libraries: • Eliminates data silos - makes data accessible on the Web and promotes sharing and re-use • Promotes discovery of related resources through links (to common people, subjects, etc.) • Supports cooperative description (‗open world assumption‘)
  3. 3. Key aspects of linked data • Based on the core Web technologies (HTTP, URIs) • Uses a simple data structure based on atomic statements about resources (RDF) • Can be interpreted by machines (semantic data) • Focus on connecting resources, rather than simply describing them (though it can do both)
  4. 4. HTTP (Hypertext Transfer Protocol) The foundation of data communication for the Web HTTP request HTTP response Client/User agent (e.g. web browser) Web Server
  5. 5. URI (Uniform Resource Identifier) Globally unique identifier for a resource on a computer or a network. HTTP URIs identify resources on the Web.
  6. 6. URI vs. URL URLs (Uniform Resource Locators) are a subset of URIs that, in addition to identifying a resource, provide a means of locating it. A URI does not necessarily point to a document; a URL does. A URI can identify a real-world object.
  7. 7. The Semantic Web Proposed by Tim Berners-Lee in a 2001 article in Scientific American “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation… In the near future, these developments will usher in significant new functionality as machines become much better able to process and „understand‟ the data that they merely display at present.”
  8. 8. The Linked Data Principles Tim Berners-Lee, 2006 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4. Include links to other URIs so that they can discover more things.
  9. 9. RDF (Resource Description Framework) A framework for describing Web resources. A Web resource is anything that can be retrieved or identified on the Web via a URI. RDF descriptions are based on simple subject-predicate-object expressions called ―triples‖.
  10. 10. The RDF Triple Subject - the resource being described Predicate - a property of that resource Object - the value of the property Subject and predicate are defined using URIs. Object can either be a URI or a literal value (text, number, date, etc.) subject predicate object
  11. 11. Here is some metadata… Robert Moses Papers CREATOR: Moses, Robert, 1888-1981 EXTENT: 142 linear feet REPOSITORY: The New York Public Library. Manuscripts and Archives Division.
  12. 12. Here are some triples mss/2071 196 mss/2071 ‘142 linear feet’ mss/2071 units/mss ms/creator ms/extent al/vocab/arch#held By Robert Moses Papers Robert Moses Papers Robert Moses Papers creator Moses, Robert, 1888-1981 extent repository NYPL Manuscripts & Archives
  13. 13. A set of related triples = a graph mss/2071 196 ‘142 linear feet’ mss/2071 ms/creator ms/extent al/vocab/arch#held By
  14. 14. This is another graph oclc/834874 399 196 ms/creator ms/subject
  15. 15. Put the graphs together to make a new graph mss/2071 6196 ‘142 linear feet’ mss/2071 s/creator ms/extent /arch#heldBy 399 ms/creator s/subject Robert Moses Papers The Power Broker oclc/834874
  16. 16. RDF serialization formats ‗Serialization‘ = to record one or more RDF graphs in a machine-readable file. There are 2 basic options: RDF in a standalone text file: • RDF XML • N3 (Notation 3) • Turtle (Terse RDF Triple Language) • N-Triples RDF embedded in HTML • RDFa (RDF in attributes)
  17. 17. <> <> <> . <> <> ‗142 linear feet‘ . <> <> <> . Basic triples in N-Triples N-Triples is the most basic expression of RDF.
  18. 18. @prefix dcterms: <>. @prefix arch: <>. <> dcterms:creator; dcterms:extent ‗142 linear feet‘; arch:heldBy Basic triples in N3/Turtle Statements about the same resource are grouped together. Property URIs are shortened using prefixes (‗q-names‘).
  19. 19. Basic triples in RDF-XML <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf=―‖ xmlns:dcterms="‖ xmlns:arch=""> <rdf:Description rdf:about=""> <dcterms:creator rdf:resource="‖ /> <dcterms:extent>142 linear feet</dcterms:extent> <arch:heldBy rdf:resource="‖ /> </rdf:Description> </rdf:RDF>
  20. 20. RDFa (RDF in Attributes) RDFa allows RDF data to be embedded within HTML. Rendered HTML: The Power Broker, by Robert Caro, is a biography of Robert Moses. HTML code: <div about=―‖ prefix=―dcterms:> The Power Broker, by <span property=―dcterms:creator‖ resource=―‖>Robert Caro</span>, is a biogrpahy of <span property=―dcterms:subject‖ resource=―‖>Robert Moses</span> </div>
  21. 21. RDF Ontologies/vocabularies • Define categories of things and the relationships that they can have to each other • Provide the semantics that allow data to be interpreted by machines • Establish rules of inference – what can be assumed to be true based on what is asserted by a triple
  22. 22. RDFS (RDF Schema) A basic vocabulary for ontology development. RDFS defines RDF classes and properties. Class: a category of resources; a resource in such a category is said to be an instance of the class Property: a relation between a subject and object in a triple
  23. 23. Classes and subClasses The subClassOf property (used in defining a class) allows a broad class to serve as the basis of a more specific class. Defining a class (A) as a subClassOf another class (B) means that any instance of A can be inferred to also be an instance of B. Class B Class A
  24. 24. A simple Class/subClass example Based on these class definitions: ‗Dog‘ is a Class ‗Poodle‘ is a Class ‗Poodle‘ is a subClassOf ‗Dog‘ And the statement: Fido is a Poodle. It can be inferred that: Fido is a Dog.
  25. 25. RDFS Properties The predicates in RDF triples are properties. Properties themselves have two important properties: domain: asserts that the subject of the triple is an instance of specific class range: asserts that the object of the triple is an instance of specific class
  26. 26. OWL (Web Ontology Language) Provides an extended set of properties used in ontology/vocabulary definitions (used in conjunction with RDFS) • Equivalence/disjunction • Advanced property definitions • Restrictions and cardinality owl:sameAs: A property that asserts that two resources are the same (i.e. two URIs refer to the same thing)
  27. 27. SKOS (Simple Knowledge Organization System) Defines classes and properties to support the use of thesauri, classification schemes, subject heading systems and taxonomies in RDF • Classes: skos:ConceptScheme, skos:Concept • Properties: skos:broader, skos:narrower, skos:related, skos:prefLabel, skos:altLabel
  28. 28. Library of Congress Linked Data Service ( • Provides URIs for LC controlled vocabularies, thesauri, language codes, classification schemes • Most terms defined using SKOS + RDF representation of MADS (where applicable) • Complete vocabularies available as free downloads
  29. 29. FOAF (Friend of a Friend) • Provides a vocabulary for describing people and their relationships to each other and to the things they make and do • Originally intended for web-based social networks, FOAF has gained wider acceptance in describing historical figures and their relationships • Classes: Agent, Person, Organization, Group • Properties: knows, name, based_near
  30. 30. VIAF (Virtual International Authority File) • Clusters names in authority files from numerous national libraries and other agencies • Named entities vs. just names • OCLC is actively establishing links between VIAF and Wikipedia, building an invaluable resource for libraries/archives/museums to provide context for their collections
  31. 31. Dublin Core Metadata Initiative • Terms for general use in describing resources • Properties relating to simple and qualified Dublin Core elements • Classes for general material types (Text, Image, PhysicalObject, etc.) • Classes for other resources referenced by DCMI properties (FileFormat, RightsStatement, ProvenanceStatement, etc.)
  32. 32. • Cooperative project between Bing, Google and Yahoo to provide mechanism to describe web content via standardized vocabularies • Structured data is included in HTML content via microdata (similar to RDFa) • Basis of Google Knowledge Graph • OCLC now provides linked data for all records in WorldCat
  33. 33. DbPedia • Crowd-sourced community effort to extract structured information from Wikipedia • Enables sophisticated queries against Wikipedia • Makes Wikipedia data freely available for re-use
  34. 34. Other useful/notable linked data sources Vocabularies/ontologies • Bibliographic ontology • Archival ontology • Relationship ontology Data sources • GeoNames, Europeana, MusicBrainz,,, BBC, Project Gutenberg…
  35. 35. The obligatory linked data cloud slide Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
  36. 36. Technical things to know a little about • Triplestore – a database for storing RDF data • SPARQL (SPARQL Protocol and RDF Query Language) The primary query language for RDF data (analogous to SQL for relational databases) • SPARQL endpoint – Web service that provides direct access to RDF data stores via SPARQL queries • HTTP content negotiation – process for delivering content (data) in different formats (e.g. RDF vs. HTML) based on HTTP request
  37. 37. Linked data attribution A growing concern in the linked data community is the need to include attribution with data in order to determine whether or not it can/should be trusted. • RDF reification – allows source attribution to be associated with an RDF triple • Named graphs – Extension of RDF that allows attribution and other metadata to be associated with RDF descriptions • Quad stores – Similar to triplestores but with an additional element that connects the triple with its source
  38. 38. Linked Open Data Linked data that is freely usable, reusable, and redistributable — subject, at most, to attribution and ‗share alike‘ requirements
  39. 39. Open data licensing A nonprofit organization that enables the sharing and use of creativity and knowledge through free legal tools. CC provides alternatives to ―all rights reserved‖ copyright.
  40. 40. Creative Commons LicensesOPENDATA(: Attribution (CC BY) Allows distribution and reuse in any way as long as you get credit Attribution-ShareAlike (CC BY-SA) Allows distribution and reuse in any way as long as you get credit and derivative works are released under the same license Attribution-NoDerivs (CC BY-ND) Requires that the original is used unchanged and in whole, with credit to you Attribution-NonCommercial (CC BY-ND) Allows distribution and reuse in any way, for non-commercial purposes only, as long as you get credit Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) Requires that the original is used unchanged and in whole, with credit to you, provided that derivative works are released under the same license Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) Only permits use as-is, for non commercial purposes, and with credit to you – the most restrictive CC license available NOTOPENDATA):
  41. 41. CC0 (‘CC Zero’) • Allows creators to waive all rights to work and to place it as completely as possible into the public domain. • Designed to make it as clear as is legally possible that any use of your content is allowed • Quickly becoming the preferred license for open data
  42. 42. LC Bibliographic Framework Initiative • Developing a new bibliographic framework (to replace MARC) based on linked data principles • First draft of the Bibliographic Framework (BIBFRAME) model published in November 2012
  43. 43. LC Bibliographic Framework Initiative