Linked Data Fundamentals
Trevor Thornton
Senior Applications Developer, NYPL Labs
The New York Public Library
Linked Data
Data published on the Web in accordance with principles
designed to facilitate linkages between resources
The ...
Key aspects of linked data
• Based on the core Web technologies (HTTP, URIs)
• Uses a simple data structure based on atomi...
HTTP (Hypertext Transfer Protocol)
The foundation of data communication for the Web
HTTP request
HTTP response
Client/User...
URI (Uniform Resource Identifier)
Globally unique identifier for a resource on a computer
or a network.
HTTP URIs identify...
URI vs. URL
URLs (Uniform Resource Locators) are a subset of URIs
that, in addition to identifying a resource, provide a m...
The Semantic Web
Proposed by Tim Berners-Lee in a 2001 article in Scientific
American
“The Semantic Web is not a separate ...
The Linked Data Principles
Tim Berners-Lee, 2006
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look...
RDF (Resource Description Framework)
A framework for describing Web resources.
A Web resource is anything that can be retr...
The RDF Triple
Subject - the resource being described
Predicate - a property of that resource
Object - the value of the pr...
Here is some metadata…
Robert Moses Papers
CREATOR:
Moses, Robert, 1888-1981
EXTENT:
142 linear feet
REPOSITORY:
The New Y...
Here are some triples
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/52866
196
http://archives.nypl.org/
mss/2071...
A set of related triples = a graph
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/52866
196
‘142 linear feet’
htt...
This is another graph
http://www.worldcat.org/
oclc/834874
http://viaf.org/viaf/44312
399
http://viaf.org/viaf/52866
196
h...
Put the graphs together to make a new graph
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/5286
6196
‘142 linear ...
RDF serialization formats
‗Serialization‘ = to record one or more RDF graphs in a
machine-readable file. There are 2 basic...
<http://archives.nypl.org/mss/2071> <http://purl.org/dc/terms/creator>
<http://viaf.org/viaf/52866196> .
<http://archives....
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix arch: <http://purl.org/archival/vocab/arch#>.
<http://archives.nypl....
Basic triples in RDF-XML
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf=―http://www.w3.org/1999/02/22-rdf-synta...
RDFa (RDF in Attributes)
RDFa allows RDF data to be embedded within HTML.
Rendered HTML:
The Power Broker, by Robert Caro,...
RDF Ontologies/vocabularies
• Define categories of things and the relationships that they
can have to each other
• Provide...
RDFS (RDF Schema)
A basic vocabulary for ontology development.
RDFS defines RDF classes and properties.
Class: a category ...
Classes and subClasses
The subClassOf property (used in defining a class) allows a
broad class to serve as the basis of a ...
A simple Class/subClass example
Based on these class definitions:
‗Dog‘ is a Class
‗Poodle‘ is a Class
‗Poodle‘ is a subCl...
RDFS Properties
The predicates in RDF triples are properties.
Properties themselves have two important properties:
domain:...
OWL (Web Ontology Language)
Provides an extended set of properties used in
ontology/vocabulary definitions (used in conjun...
SKOS
(Simple Knowledge Organization System)
Defines classes and properties to support the use of
thesauri, classification ...
Library of Congress Linked Data Service
(id.loc.gov)
• Provides URIs for LC controlled vocabularies, thesauri,
language co...
FOAF (Friend of a Friend)
• Provides a vocabulary for describing people and their
relationships to each other and to the t...
VIAF (Virtual International Authority File)
• Clusters names in authority files from numerous national
libraries and other...
Dublin Core Metadata Initiative
• Terms for general use in describing resources
• Properties relating to simple and qualif...
Schema.org
• Cooperative project between Bing, Google and Yahoo to
provide mechanism to describe web content via
standardi...
DbPedia
• Crowd-sourced community effort to extract structured
information from Wikipedia
• Enables sophisticated queries ...
Other useful/notable linked data sources
Vocabularies/ontologies
• Bibliographic ontology
• Archival ontology
• Relationsh...
The obligatory linked data cloud slide
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-...
Technical things to know a little about
• Triplestore – a database for storing RDF data
• SPARQL (SPARQL Protocol and RDF ...
Linked data attribution
A growing concern in the linked data community is the need
to include attribution with data in ord...
Linked Open Data
Linked data that is freely usable, reusable, and
redistributable — subject, at most, to attribution and ‗...
Open data licensing
A nonprofit organization that enables the sharing and use of
creativity and knowledge through free leg...
Creative Commons LicensesOPENDATA(:
Attribution (CC BY)
Allows distribution and reuse in any way as long as you get credit...
CC0 (‘CC Zero’)
• Allows creators to waive all rights to work and to place it
as completely as possible into the public do...
LC Bibliographic Framework Initiative
• Developing a new bibliographic framework (to replace
MARC) based on linked data pr...
LC Bibliographic Framework Initiative
Upcoming SlideShare
Loading in …5
×

Linked data for librarians

340
-1

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
340
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked data for librarians

  1. 1. Linked Data Fundamentals Trevor Thornton Senior Applications Developer, NYPL Labs The New York Public Library
  2. 2. Linked Data Data published on the Web in accordance with principles designed to facilitate linkages between resources The potential for linked data in libraries: • Eliminates data silos - makes data accessible on the Web and promotes sharing and re-use • Promotes discovery of related resources through links (to common people, subjects, etc.) • Supports cooperative description (‗open world assumption‘)
  3. 3. Key aspects of linked data • Based on the core Web technologies (HTTP, URIs) • Uses a simple data structure based on atomic statements about resources (RDF) • Can be interpreted by machines (semantic data) • Focus on connecting resources, rather than simply describing them (though it can do both)
  4. 4. HTTP (Hypertext Transfer Protocol) The foundation of data communication for the Web HTTP request HTTP response Client/User agent (e.g. web browser) Web Server
  5. 5. URI (Uniform Resource Identifier) Globally unique identifier for a resource on a computer or a network. HTTP URIs identify resources on the Web. http://www.yourdomain.org/something
  6. 6. URI vs. URL URLs (Uniform Resource Locators) are a subset of URIs that, in addition to identifying a resource, provide a means of locating it. A URI does not necessarily point to a document; a URL does. A URI can identify a real-world object.
  7. 7. The Semantic Web Proposed by Tim Berners-Lee in a 2001 article in Scientific American “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation… In the near future, these developments will usher in significant new functionality as machines become much better able to process and „understand‟ the data that they merely display at present.”
  8. 8. The Linked Data Principles Tim Berners-Lee, 2006 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4. Include links to other URIs so that they can discover more things.
  9. 9. RDF (Resource Description Framework) A framework for describing Web resources. A Web resource is anything that can be retrieved or identified on the Web via a URI. RDF descriptions are based on simple subject-predicate-object expressions called ―triples‖.
  10. 10. The RDF Triple Subject - the resource being described Predicate - a property of that resource Object - the value of the property Subject and predicate are defined using URIs. Object can either be a URI or a literal value (text, number, date, etc.) subject predicate object
  11. 11. Here is some metadata… Robert Moses Papers CREATOR: Moses, Robert, 1888-1981 EXTENT: 142 linear feet REPOSITORY: The New York Public Library. Manuscripts and Archives Division.
  12. 12. Here are some triples http://archives.nypl.org/ mss/2071 http://viaf.org/viaf/52866 196 http://archives.nypl.org/ mss/2071 ‘142 linear feet’ http://archives.nypl.org/ mss/2071 http://data.nypl.org/org_ units/mss http://purl.org/dc/ter ms/creator http://purl.org/dc/ter ms/extent http://purl.org/archiv al/vocab/arch#held By Robert Moses Papers Robert Moses Papers Robert Moses Papers creator Moses, Robert, 1888-1981 extent repository NYPL Manuscripts & Archives
  13. 13. A set of related triples = a graph http://archives.nypl.org/ mss/2071 http://viaf.org/viaf/52866 196 ‘142 linear feet’ http://archives.nypl.org/ mss/2071 http://purl.org/dc/ter ms/creator http://purl.org/dc/ter ms/extent http://purl.org/archiv al/vocab/arch#held By
  14. 14. This is another graph http://www.worldcat.org/ oclc/834874 http://viaf.org/viaf/44312 399 http://viaf.org/viaf/52866 196 http://purl.org/dc/ter ms/creator http://purl.org/dc/ter ms/subject
  15. 15. Put the graphs together to make a new graph http://archives.nypl.org/ mss/2071 http://viaf.org/viaf/5286 6196 ‘142 linear feet’ http://archives.nypl.org/ mss/2071 http://purl.org/dc/term s/creatorhttp://purl.org/dc/ter ms/extent http://purl.org/archival/vocab /arch#heldBy http://viaf.org/viaf/44312 399 http://purl.org/dc/ter ms/creator http://purl.org/dc/term s/subject Robert Moses Papers The Power Broker http://www.worldcat.org/ oclc/834874
  16. 16. RDF serialization formats ‗Serialization‘ = to record one or more RDF graphs in a machine-readable file. There are 2 basic options: RDF in a standalone text file: • RDF XML • N3 (Notation 3) • Turtle (Terse RDF Triple Language) • N-Triples RDF embedded in HTML • RDFa (RDF in attributes)
  17. 17. <http://archives.nypl.org/mss/2071> <http://purl.org/dc/terms/creator> <http://viaf.org/viaf/52866196> . <http://archives.nypl.org/mss/2071> <http://purl.org/dc/terms/extent> ‗142 linear feet‘ . <http://archives.nypl.org/mss/2071> <http://purl.org/archival/vocab/arch#heldBy> <http://archives.nypl.org/mss/2071> . Basic triples in N-Triples N-Triples is the most basic expression of RDF.
  18. 18. @prefix dcterms: <http://purl.org/dc/terms/>. @prefix arch: <http://purl.org/archival/vocab/arch#>. <http://archives.nypl.org/mss/2071> dcterms:creator http://viaf.org/viaf/52866196; dcterms:extent ‗142 linear feet‘; arch:heldBy http://archives.nypl.org/mss/2071. Basic triples in N3/Turtle Statements about the same resource are grouped together. Property URIs are shortened using prefixes (‗q-names‘).
  19. 19. Basic triples in RDF-XML <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf=―http://www.w3.org/1999/02/22-rdf-syntax-ns#‖ xmlns:dcterms="http://purl.org/dc/terms/‖ xmlns:arch="http://purl.org/archival/vocab/arch#"> <rdf:Description rdf:about="http://archives.nypl.org/mss/2071"> <dcterms:creator rdf:resource="http://viaf.org/viaf/52866196‖ /> <dcterms:extent>142 linear feet</dcterms:extent> <arch:heldBy rdf:resource="http://archives.nypl.org/mss/2071‖ /> </rdf:Description> </rdf:RDF>
  20. 20. RDFa (RDF in Attributes) RDFa allows RDF data to be embedded within HTML. Rendered HTML: The Power Broker, by Robert Caro, is a biography of Robert Moses. HTML code: <div about=―http://www.worldcat.org/oclc/834874‖ prefix=―dcterms: http://purl.org/dc/terms/> The Power Broker, by <span property=―dcterms:creator‖ resource=―http://viaf.org/viaf/44312399‖>Robert Caro</span>, is a biogrpahy of <span property=―dcterms:subject‖ resource=―http://viaf.org/viaf/52866196‖>Robert Moses</span> </div>
  21. 21. RDF Ontologies/vocabularies • Define categories of things and the relationships that they can have to each other • Provide the semantics that allow data to be interpreted by machines • Establish rules of inference – what can be assumed to be true based on what is asserted by a triple
  22. 22. RDFS (RDF Schema) A basic vocabulary for ontology development. RDFS defines RDF classes and properties. Class: a category of resources; a resource in such a category is said to be an instance of the class Property: a relation between a subject and object in a triple
  23. 23. Classes and subClasses The subClassOf property (used in defining a class) allows a broad class to serve as the basis of a more specific class. Defining a class (A) as a subClassOf another class (B) means that any instance of A can be inferred to also be an instance of B. Class B Class A
  24. 24. A simple Class/subClass example Based on these class definitions: ‗Dog‘ is a Class ‗Poodle‘ is a Class ‗Poodle‘ is a subClassOf ‗Dog‘ And the statement: Fido is a Poodle. It can be inferred that: Fido is a Dog.
  25. 25. RDFS Properties The predicates in RDF triples are properties. Properties themselves have two important properties: domain: asserts that the subject of the triple is an instance of specific class range: asserts that the object of the triple is an instance of specific class
  26. 26. OWL (Web Ontology Language) Provides an extended set of properties used in ontology/vocabulary definitions (used in conjunction with RDFS) • Equivalence/disjunction • Advanced property definitions • Restrictions and cardinality owl:sameAs: A property that asserts that two resources are the same (i.e. two URIs refer to the same thing)
  27. 27. SKOS (Simple Knowledge Organization System) Defines classes and properties to support the use of thesauri, classification schemes, subject heading systems and taxonomies in RDF • Classes: skos:ConceptScheme, skos:Concept • Properties: skos:broader, skos:narrower, skos:related, skos:prefLabel, skos:altLabel
  28. 28. Library of Congress Linked Data Service (id.loc.gov) • Provides URIs for LC controlled vocabularies, thesauri, language codes, classification schemes • Most terms defined using SKOS + RDF representation of MADS (where applicable) • Complete vocabularies available as free downloads
  29. 29. FOAF (Friend of a Friend) • Provides a vocabulary for describing people and their relationships to each other and to the things they make and do • Originally intended for web-based social networks, FOAF has gained wider acceptance in describing historical figures and their relationships • Classes: Agent, Person, Organization, Group • Properties: knows, name, based_near
  30. 30. VIAF (Virtual International Authority File) • Clusters names in authority files from numerous national libraries and other agencies • Named entities vs. just names • OCLC is actively establishing links between VIAF and Wikipedia, building an invaluable resource for libraries/archives/museums to provide context for their collections
  31. 31. Dublin Core Metadata Initiative • Terms for general use in describing resources • Properties relating to simple and qualified Dublin Core elements • Classes for general material types (Text, Image, PhysicalObject, etc.) • Classes for other resources referenced by DCMI properties (FileFormat, RightsStatement, ProvenanceStatement, etc.)
  32. 32. Schema.org • Cooperative project between Bing, Google and Yahoo to provide mechanism to describe web content via standardized vocabularies • Structured data is included in HTML content via microdata (similar to RDFa) • Basis of Google Knowledge Graph • OCLC now provides Schema.org linked data for all records in WorldCat
  33. 33. DbPedia • Crowd-sourced community effort to extract structured information from Wikipedia • Enables sophisticated queries against Wikipedia • Makes Wikipedia data freely available for re-use
  34. 34. Other useful/notable linked data sources Vocabularies/ontologies • Bibliographic ontology • Archival ontology • Relationship ontology Data sources • GeoNames, Europeana, MusicBrainz, data.gov, nytimes.com, BBC, Project Gutenberg…
  35. 35. The obligatory linked data cloud slide Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  36. 36. Technical things to know a little about • Triplestore – a database for storing RDF data • SPARQL (SPARQL Protocol and RDF Query Language) The primary query language for RDF data (analogous to SQL for relational databases) • SPARQL endpoint – Web service that provides direct access to RDF data stores via SPARQL queries • HTTP content negotiation – process for delivering content (data) in different formats (e.g. RDF vs. HTML) based on HTTP request
  37. 37. Linked data attribution A growing concern in the linked data community is the need to include attribution with data in order to determine whether or not it can/should be trusted. • RDF reification – allows source attribution to be associated with an RDF triple • Named graphs – Extension of RDF that allows attribution and other metadata to be associated with RDF descriptions • Quad stores – Similar to triplestores but with an additional element that connects the triple with its source
  38. 38. Linked Open Data Linked data that is freely usable, reusable, and redistributable — subject, at most, to attribution and ‗share alike‘ requirements
  39. 39. Open data licensing A nonprofit organization that enables the sharing and use of creativity and knowledge through free legal tools. CC provides alternatives to ―all rights reserved‖ copyright.
  40. 40. Creative Commons LicensesOPENDATA(: Attribution (CC BY) Allows distribution and reuse in any way as long as you get credit Attribution-ShareAlike (CC BY-SA) Allows distribution and reuse in any way as long as you get credit and derivative works are released under the same license Attribution-NoDerivs (CC BY-ND) Requires that the original is used unchanged and in whole, with credit to you Attribution-NonCommercial (CC BY-ND) Allows distribution and reuse in any way, for non-commercial purposes only, as long as you get credit Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) Requires that the original is used unchanged and in whole, with credit to you, provided that derivative works are released under the same license Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) Only permits use as-is, for non commercial purposes, and with credit to you – the most restrictive CC license available NOTOPENDATA):
  41. 41. CC0 (‘CC Zero’) • Allows creators to waive all rights to work and to place it as completely as possible into the public domain. • Designed to make it as clear as is legally possible that any use of your content is allowed • Quickly becoming the preferred license for open data
  42. 42. LC Bibliographic Framework Initiative • Developing a new bibliographic framework (to replace MARC) based on linked data principles • First draft of the Bibliographic Framework (BIBFRAME) model published in November 2012
  43. 43. LC Bibliographic Framework Initiative
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×