Linked Open Data Fundamentals for Libraries, Archives and Museums

1,302 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,302
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • LC, BNF, DNB20 agencies from 16 countries
  • Linked Open Data Fundamentals for Libraries, Archives and Museums

    1. 1. Linked Open Data Fundamentals For Libraries, Archives & Museums Trevor Thornton Senior Applications Developer, NYPL Labs New York Public Library
    2. 2. Workshop Topics• What Linked Open Data is• Potential benefits of Linked Open Data for libraries, archives and museums• Overview of technical concepts• Licenses for open data (legal issues)• Tour of relevant Linked Open Data sources (element sets, controlled vocabularies, published data sets)• General considerations for implementation
    3. 3. Linked Open Data (LOD)DataFor libraries, archives and museums, this is includes any type of digitalinformation that describes resources or aids in their discovery (metadata).It also includes data produced through original research (scientific/statisticaldata, geospatial data, etc.)Linked DataData published on the Web in accordance with principles designed tofacilitate linkages between resourcesLinked Open DataLinked data that is freely usable, reusable, and redistributable — subject, atmost, to attribution and ‘share alike’ requirements
    4. 4. The value of our data• Our data is a crucial tool in serving our missions to collect, preserve and provide access to resources• We are dedicated to standards of quality and accuracy in the data we create• The creation and management of data represents a significant investment on the part of cultural heritage institutions
    5. 5. Benefits of Linked Open Data• Puts information on the web, where people are looking for it• People can use your data in new ways, opening opportunities for scholarship and innovation• Expands discoverability of your collections• Allows for continuous improvement of your data by linking it to a growing pool of other data
    6. 6. The emerging data commons Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    7. 7. A very brief history oflinked data StarringTim Berners-Lee Photo: Paul Clarke
    8. 8. 1990 (more or less)Tim Berners-Lee invents the World Wide Web to publish hypertext documents on the Internet. It includes 3 essential technologies: URI (Uniform Resource Identifier) HTTP (Hypertext Transfer protocol) HTML (Hypertext Markup Language)
    9. 9. 2001 Tim Berners-Lee proposes ‘The Semantic Web’ in an article in Scientific American“The Semantic Web is not a separate Web but an extension of thecurrent one, in which information is given well-definedmeaning, better enabling computers and people to work incooperation…In the near future, these developments will usher in significant newfunctionality as machines become much better able to process and‘understand’ the data that they merely display at present.”
    10. 10. 2006 In a document discussing design issues for the Semantic Web, Berners-Lee introduces linked data as a crucial component:“The Semantic Web isnt just about putting data on the web. It isabout making links, so that a person or machine can explore theweb of data. With linked data, when you have some of it, youcan find other, related, data.” He outlines 4 basic principles…
    11. 11. The Linked Data Principles1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).4. Include links to other URIs so that they can discover more things.
    12. 12. THE TECHNICAL PART STARTS NOW
    13. 13. URI (Uniform Resource Identifier)Globally unique identifier for a resource on a computer or a network. HTTP URIs identify resources on the Web. http://www.yourdomain.org/something
    14. 14. URI vs. URLURLs (Uniform Resource Locators) are a subset of URIs that, in addition to identifying a resource, provide a means of locating it.A URI does not necessarily point to a document. A URL does . A URI can identify a real-world object.
    15. 15. HTTP (Hypertext Transfer Protocol)The foundation of data communication for the Web HTTP request Client/User agent Web (e.g. web browser) Server HTTP response
    16. 16. RDF Resource Description Framework A framework for describing Web resources.A Web resource is anything that can be retrieved or identified on the WWW via a URI. RDF descriptions are based on simple subject-predicate-object expressions called “triples”.
    17. 17. The RDF Triple predicate subject object Subject - the resource being described Predicate - a property of that resource Object - the value of the property Subject and predicate are defined using URIs.Object can either be a URI or a ‘literal’ (text, number, date, etc.)
    18. 18. A basic triple creator James Joyce
    19. 19. A basic triple creator James Joycehttp://www.worldcat.org/oclc/746309573 http://purl.org/dc/terms/creator http://viaf.org/viaf/44300643
    20. 20. Another basic triple subject Dublin, Irelandhttp://www.worldcat.org/oclc/746309573 http://purl.org/dc/terms/subject http://dbpedia.org/resource/Dublin
    21. 21. One more basic triple date created 1918/1922http://www.worldcat.org/oclc/746309573 http://purl.org/dc/terms/created
    22. 22. RDF data as a graph http://www.worldcat.org/oclc/746309 573 date created subject http://purl.org/dc/terms/created http://purl.org/dc/terms/subject creator http://purl.org/dc/terms/creator Dublin, Ireland1918/1920 http://dbpedia.org/resource/Du blin James Joyce http://viaf.org/viaf/44300643
    23. 23. RDF serialization formats‘Serialization’ = to record one or moreRDF graphs in a machine-readable file. There are 2 basic options: RDF in a standalone text file: • RDF XML • N3 (Notation 3) • Turtle (Terse RDF Triple Language) • N-Triples RDF embedded in HTML • RDFa (RDF in attributes)
    24. 24. Basic triples in N-Triples<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/creator> <http://viaf.org/viaf/44300643> .<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Dublin> .<http://www.worldcat.org/oclc/746309573> <http://purl.org/dc/terms/created> 1918/1922 . N-Triples is the most basic expression of RDF.
    25. 25. Basic triples in N3/Turtle@prefix dcterms: <http://purl.org/dc/terms/>.<http://www.worldcat.org/oclc/746309573> dcterms:creator http://viaf.org/viaf/44300643; dcterms:subject http://dbpedia.org/resource/Dublin; dcterms:created 1918/1922. Statements about the same resource are grouped together. Property URIs are shortened using prefixes.
    26. 26. Basic triples in RDF-XML<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:dcterms="http://purl.org/dc/terms/"> <rdf:Description rdf:about="http://www.worldcat.org/oclc/746309573"> <dcterms:creator rdf:resource="http://viaf.org/viaf/44300643"/> <dcterms:subject rdf:resource="http://dbpedia.org/resource/Dublin"/> <dcterms:created>1918/1922</dcterms:created> </rdf:Description></rdf:RDF>
    27. 27. RDFa (RDF in Attributes) RDFa allows RDF data to be embedded within HTML content.Rendered HTML:Ulysses is a novel by the Irish author James Joyce.HTML code:<div about=“http://www.worldcat.org/oclc/746309573” prefix=“dcterms: http://purl.org/dc/terms/> Ulysses is a novel by the Irish author <span property=“dcterms:creator” resource=“http://viaf.org/viaf/44300643”>James Joyce</span></div>
    28. 28. RDF OntologiesOntologies/vocabularies define categories of things and the relationships that they can have to each other. Ontologies provide the semantics that allow data to be interpreted by machines. Rules of inference – what can be assumed to be true based on what is asserted by a triple.
    29. 29. RDFS (RDF Schema) A basic vocabulary for ontology development. RDFS defines RDF classes and properties.Class – a category of resources; a resource insuch a category is said to be an instance of theclassProperty – a relation between a subjectresource and an object resource in a triple.
    30. 30. OWL (Web Ontology Language) Provides an extended set of properties used in ontology/vocabulary definitions (used in conjunction with RDFS)• Equivalence/disjunction• Advanced property definitions• Restrictions and Cardinality
    31. 31. SKOS (Simple Knowledge Organization System)Set of vocabularies created to support the use ofthesauri, classification schemes, subject heading systems and taxonomies in RDF• Concept schemes (names, topics, geographic terms, etc.)• Preferred/alternate labels• Broader/narrower concepts
    32. 32. Triplestore A database for storing RDF data. Often a triplestore is part of a suite of applications that might include:• Triplestore• Inference engine – provides the ‘intelligence’ required to interpret data based on RDFS/OWL ontologies• Query engine – supports access to data based on user-supplied queries
    33. 33. SPARQL (SPARQL Protocol and RDF Query Language)• The primary query language for RDF data (analogous to SQL for relational databases)• SPARQL endpoint – Web service that provides direct access to RDF datastores via SPARQL queries
    34. 34. Publishing Linked Data Establish URIs for your resources• Within a domain that you control (yourlibrary.org)• Consult with your IT staff on strategies for formulating URIs, for example:  Subdomain (data.yourlibrary.org/something)  Reserve a path within your domain, (yourdomain.org/data/something)
    35. 35. Publishing Linked Data Decide what happens when users (human or machine) try to access your URIs via the Web1. Nothing (Not recommended)2. Something – User is provided with information about the resource  URI directs to RDF file Good for machines, not for humans  URI directs to an HTML representation of the resource Good for humans, useless for machines – Not recommended  URI directs to an HTML representation of the resource with RDFa embedded Good for humans, OK for machines  URI directs to either RDF file or HTML representation based on what the user prefers (content negotiation)
    36. 36. HTTP Content Negotiation HTTP request Client/User agent Web (e.g. web browser) Server HTTP responseHTTP Request HTTP Response• Resource URI (+ method) • Status code• Headers (Information about • Headers (Information the requestor) about the response)• Message body (optional) • Message body (optional)
    37. 37. HTTP ‘Accept’ Header Part of the HTTP request that specifies what types of data the client can accept• Web browsers HTML, JPEG, GIF, text, or other formats that browser can display – unsupported formats are either displayed as text or prompt user to download file• Semantic web applications RDF XML, N3, Turtle, or other RDF serialization
    38. 38. HTTP Status Codes Part of the HTTP response that classifies the nature of the response1xx : Informational2xx : SuccessExample: 200 OK3xx : RedirectionExamples: 301 Moved Permanently, 303 See OtherResponse will include ‘Location’ header with URI for new resource4xx : ErrorExample: 404 Not Found
    39. 39. HTTP Content Negotiation via 303 Redirect HTTP request URI: http://example.org/something Accepts: HTML, JPEG, GIF, etc. Web serverWeb browser (running some kind of content negotiation service)
    40. 40. HTTP Content Negotiation via 303 Redirect HTTP request URI: http://example.org/something Accepts: HTML, JPEG, GIF, etc. HTTP response Status: 303 See Other Location: Web serverWeb browser http://example.org/something.html (running some kind of content negotiation service)
    41. 41. HTTP Content Negotiation via 303 Redirect HTTP request URI: http://example.org/something Accepts: HTML, JPEG, GIF, etc. HTTP response Status: 303 See Other Location: Web serverWeb browser http://example.org/something.html (running some kind of content negotiation service) HTTP request URI: http://example.org/something.html Accepts: HTML, JPEG, GIF, etc.
    42. 42. HTTP Content Negotiation via 303 Redirect HTTP request URI: http://example.org/something Accepts: HTML, JPEG, GIF, etc. HTTP response Status: 303 See Other Location: Web serverWeb browser http://example.org/something.html (running some kind of content negotiation service) HTTP request URI: http://example.org/something.html Accepts: HTML, JPEG, GIF, etc. HTTP response Status: 200 OK
    43. 43. Trust The rapid growth of the Web is attributable inlarge part to the fact that it allows anyone to say anything about anything (provable facts, subjective opinions, blatant lies and everything in between) This is also true of the linked data web.Libraries, archives and museums are expectedto provide ‘factual’, objective data and depend on trusted sources.
    44. 44. Linked data attribution A growing concern in the linked data community isthe need to include attribution with data in order todetermine whether or not it can/should be trusted.• RDF reification – allows source attribution to be associated with an RDF triple• Named graphs – Extension of RDF that allows attribution and other metadata to be associated with RDF descriptions• Quad stores – Similar to triplestores but with an additional element that connects the triple with its source
    45. 45. THE TECHNICAL PART IS NOW OVER
    46. 46. Linked Open DataDataFor libraries, archives and museums, this is includes any type of digitalinformation that describes resources or aids in their discovery (metadata).Also includes data produced through original research (scientific/statisticaldata, geospatial data, etc.)Linked DataData published on the Web in accordance with principles designed tofacilitate linkages between resourcesLinked Open DataLinked data that is freely usable, reusable, and redistributable — subject, atmost, to attribution and ‘share alike’ requirements
    47. 47. Open data licensingLicensing your data is not the same as licensingyour assets. Typically permitted uses of data are much less restrictive. You can often provide free, open use of your data even if use of your assets are completely restricted. TALK TO YOUR LEGAL DEPARTMENT FIRST.
    48. 48. Open data licensing A nonprofit organization that enables thesharing and use of creativity and knowledge through free legal tools. CC provides an alternative to standard “all rights reserved” copyright.
    49. 49. Creative Commons Licenses Three-Layer Design:LEAGAL CODEThe actual license as a legaldocument (accessible on the Web)COMMONS DEEDThe human-readable versionof the licenseMACHINE-READABLE CODEAllows license info to beexpressed in RDF
    50. 50. Creative Commons Licenses CC licenses allow creators to specify a combination of 4 restrictions on use Attribution Non-Commercial Any use must give Only non-commercial credit to the creator uses are permitted Share Alike No Derivative Works Any use must be made The original may only be used available under the same in whole and unchanged terms as the originalLicenses specify that any restrictions may be waived with permission of the rights holder.
    51. 51. OPEN DATA (: Creative Commons Licenses Attribution (CC BY) Allows distribution and reuse in any way as long as you get credit Attribution-ShareAlike (CC BY-SA) Allows distribution and reuse in any way as long as you get credit and derivative works are released under the same license Attribution-NoDerivs (CC BY-ND) Requires that the original is used unchanged and in whole, with credit to youNOT OPEN DATA ): Attribution-NonCommercial (CC BY-ND) Allows distribution and reuse in any way, for non-commercial purposes only, as long as you get credit Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) Requires that the original is used unchanged and in whole, with credit to you, provided that derivative works are released under the same license Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) Only permits use as-is, for non commercial purposes, and with credit to you – the most restrictive CC license available
    52. 52. CC0 (‘CC Zero’) Allows creators to waive all rights to work and to place it as completely as possible into the public domain.• Laws vary from jurisdiction to jurisdiction as to what rights are automatically granted and how and when they expire or may be voluntarily relinquished• Ambiguity with regard to rights can limit creative re-use• CC0 is designed to make it as clear as is legally possible that any use of your content is allowed• Quickly becoming the preferred license for open data AGAIN, TALK TO YOUR LEGAL DEPARTMENT FIRST!
    53. 53. LINKED DATA SOURCES
    54. 54. DCMI Termsdublincore.org/documents/dcmi-terms/General purpose metadata terms maintained by the Dublin Core Metadata Initiative
    55. 55. Bibliographic Ontology bibliontology.comAn extensive vocabulary of terms for describing bibliographic resources
    56. 56. FOAF (Friend of a Friend) foaf-project.orgProvides a vocabulary for describing people and theirrelationships to each other and the things they create
    57. 57. LC Linked Data Service id.loc.govLibrary of Congress authorities as linked data (Name Authority File, Subject Headings, Thesaurus of Graphic Materials, etc.)
    58. 58. Virtual International Authority File viaf.org Links names from multiple authority files to create cluster records representing the entities identified
    59. 59. GeoNames geonames.orgAggregates geographic data from a wide variety of sources and makes it available as LOD
    60. 60. New York Times data.nytimes.com150 years of subjects from New York Times articles – data source for Times Topics pages
    61. 61. Data.govOpen access to datasets held or generated by the US Federal Government
    62. 62. DBpedia dbpedia.org Crowd-sourced community effort to extract structuredinformation from Wikipedia and to make it available on the Web
    63. 63. Freebase freebase.com A large collaborative knowledge base consisting of metadatacomposed mainly by its community members (owned by Google)
    64. 64. Google Knowledge Graph Google uses data from Freebase and other sourcesto provide related information based on search queries
    65. 65. Schema.orgA set of vocabularies developed by Google, Bing (Microsoft) and Yahoo! for adding semantic data to web pages
    66. 66. OCLC WorldCat oclc.org/worldcat Earlier this year, OCLC added linked data to records inWorldCat, using Schema.org vocabularies and proposed extensions for library data
    67. 67. SOME CONSIDERATIONS
    68. 68. Start small Linked Open Data is not an ‘all or nothing’ proposition Start by publishing data about specific collections or items of special interestConsider incorporating Linked Open Datainto online exhibitions or special projects
    69. 69. Engage the linked data communityLet people know what you’re up to, and ask for feedback – you will get it.
    70. 70. Be creative In addition to publishing data aboutyour own collections, think about how you can incorporate data from other sources into your projects Consider collaborations with other institutions
    71. 71. Utilize your internal resources Cataloging/MetadataCurators/Subject Matter Experts IT Staff Legal Department
    72. 72. me:trevorthornton@nypl.org nypl labs: www.nypl.org/labs

    ×