Linked Data in ScholarlyCommunicationAAHEP5 Information Provider Summit, Sept. 22nd 2011Bernhard Haslhofer | Cornell Unive...
Overview                  •      Linked Data Vision and Goals                  •      Enabling Technologies (by example)  ...
Web Architecture                                                   HTML                     HTML                HTML      ...
Web Architecture                                       HTML                  HTML    HTML   HTML   HTMLAAHEP5 Information ...
Web Architecture              •      A set of simple standards                        •      Uniform document encoding (HT...
But sometimes we need to access to the                     underlying (meta)data.AAHEP5 Information Provider Summit | Itha...
Web Services and Web APIs                                      API                  API       API   API   APIAAHEP5 Inform...
Web Services and Web APIs                  •      Each Web API has a proprietary interface                  •      Datasou...
Linked Data Vision                  •      Create a single global data space based on                         the Architec...
Linked Data Principles                  •      Use URIs as names for things                  •      Use HTTP URIs so that ...
Web of Linked Data              •      A set of simple standards                        •      Uniform data model (RDF)   ...
Web of Linked Data                  •      Data publishers                        •      assign URIs to their information ...
Linking Open Data Project                  •      Open Data: a philosophy, practice, or policy that                       ...
Linked Data in the IndustryAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
Linked Data in Libraries                  •      Strong uptake in recent years                        •      Library of Co...
Overview                  •      Linked Data Vision and Goals                  •      Enabling Technologies (by example)  ...
Uniform Resource Identifiers (URIs)                  •      Name and identify things (resources)                  •      De...
Resource Description Framework (RDF)                  •      A data model for representing data on the Web                ...
RDF Serialization (RDF/XML, Turtle, N-              Triples, RDF/JSON)                  •      Data formats for serializin...
RDF Vocabulary Description Language (RDFS)                  •      A language for describing the syntax and               ...
OWL Web Ontology Language                  •      A more expressive (formal) language for                         defining ...
OWL Web Ontology Language                  •      Defines properties for linking resources                                 ...
Simple Knowledge Organization System (SKOS)                  •      A language for describing controlled                  ...
SPARQL                  •      A query language and protocol for                         accessing RDF data on the Web    ...
Overview                  •      Linked Data Vision and Goals                  •      Enabling Technologies (by example)  ...
Information vs. Non-Information Resource                  •      Information Resource                        •      web pa...
Publishing Data                  •      Distinguish between non-information and                         information resour...
Publishing Data                                                 GET http://arxiv.org/resource/1106.5187                   ...
Publishing Data                                                 GET http://arxiv.org/resource/1106.5187                   ...
RDFa / Microformats / Microdata                  •      Mechanisms for embedding structured data                         i...
RDFa Example<div typeof=”bibo:Article” xmlns:bibo=”http://purl.org/ontology/bibo”>  <div xmlns:dc=”http://purl.org/dc/elem...
Microformats Example              <head profile=”http://www.w3.org/2006/03/hcard”>              ...              </head>  ...
RDFa vs. Microformats                         Microformats                                             RDFa      flat names...
Microdata                  •      A very young HTML proposition that                         extends Microformats and addr...
Web Data Publishing Approaches                                                                     Pro            Con     ...
Consuming Linked Data        curl -H “Accept: application/rdf+xml” http://arxiv.org/data/1105.5178        rapper http://ar...
Overview                  •      Linked Data Vision and Goals                  •      Enabling Technologies (by example)  ...
Motivation                  •      Current research articles                        •      are mostly PDFs (a kind of BLOB...
Motivation                  •      But a publication is more:                        •      data sets, conference presenta...
Overall Goal             Publication Document (PDF)                                                    Aggregation for the...
Methodology                  •      Aggregation/Data Extraction: create aggregations from                         availabl...
Aggregation / Data Extraction                  •      scilink-extract library                        •      takes PDFs and...
Analysis                  •      How did HTTP linkage evolve in the past?                  •      Where in their papers do...
Status                  •      Extraction for arxiv.org finished (~ 700K ) PDFs                  •      We need access to f...
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Linked Data in Scholarly Communication
Upcoming SlideShare
Loading in …5
×

Linked Data in Scholarly Communication

1,411 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,411
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linked Data in Scholarly Communication

  1. 1. Linked Data in ScholarlyCommunicationAAHEP5 Information Provider Summit, Sept. 22nd 2011Bernhard Haslhofer | Cornell University, Information Science
  2. 2. Overview • Linked Data Vision and Goals • Enabling Technologies (by example) • Publishing and Consuming Linked Data • The SciLink ProjectAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  3. 3. Web Architecture HTML HTML HTML hyperlinks hyperlinksAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  4. 4. Web Architecture HTML HTML HTML HTML HTMLAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  5. 5. Web Architecture • A set of simple standards • Uniform document encoding (HTML) • Uniform global (!) addressing (URI) • Uniform transportation (HTTP) • Hyperlinks connecting documents • Works pretty well for accessing and exchanging documentsAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  6. 6. But sometimes we need to access to the underlying (meta)data.AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  7. 7. Web Services and Web APIs API API API API APIAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  8. 8. Web Services and Web APIs • Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors, subjects, etc.) are not linkedAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  9. 9. Linked Data Vision • Create a single global data space based on the Architecture of the World Wide Web RDF RDF RDF RDF links RDF linksAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  10. 10. Linked Data Principles • Use URIs as names for things • Use HTTP URIs so that people can look up those names • When someone looks up a URI, provide useful information, using standards (RDF, SPARQL) • Include links to other URIs, so that they can discover more thingsAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  11. 11. Web of Linked Data • A set of simple standards • Uniform data model (RDF) • Uniform global (!) addressing (URI) • Uniform transportation (HTTP) • RDF links connecting entities • Forms a global dataspace and facilitates accessing and exchanging dataAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  12. 12. Web of Linked Data • Data publishers • assign URIs to their information entities • publish RDF instance data about these entities • publish schemas / vocabularies on the Web • link their entities with other entities on the Data Web • can provide query access to their dataAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  13. 13. Linking Open Data Project • Open Data: a philosophy, practice, or policy that data are freely available to everyone without restrictions from copyright, patents, a.s.o • Linked Data: method / best practices for exposing, sharing, and connecting data using URI and RDF • Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sourcesAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  14. 14. Linked Data in the IndustryAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  15. 15. Linked Data in Libraries • Strong uptake in recent years • Library of Congress (vocabularies) • German / Swedish / Hungarian National Library (authorities and/or catalogue data) • Europeana (metadata about ~4M items) • W3C Library Linked Data Incubator GroupAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  16. 16. Overview • Linked Data Vision and Goals • Enabling Technologies (by example) • Publishing and Consuming Linked Data • The SciLink ProjectAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  17. 17. Uniform Resource Identifiers (URIs) • Name and identify things (resources) • Dereferencable HTTP URIs http://springerlink.com/ http://arxiv.org/authors/ author/xyzafbea Haslhofer_B http://eprints.cs.univie.ac.at/ creator/xyzafbea http://arxiv.org/abs/ 1106.5178 http://eprints.cs.univie.ac.at/ 2871AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  18. 18. Resource Description Framework (RDF) • A data model for representing data on the Web • Sets of statements (triples) form a graph http://xmlns.com/foaf/ spec/#term_Person rdf:type http://arxiv.org/authors/ Haslhofer_B Bernhard Haslhofer http://purl.org/ontology/ bibo/Article dcterms:creator foaf:name http://arxiv.org/abs/ dcterms:created rdf:type 1106.5178 2011-06-25T22:55:17 dc:title The Open Annotation Collaboration (OAC) ModelAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  19. 19. RDF Serialization (RDF/XML, Turtle, N- Triples, RDF/JSON) • Data formats for serializing RDF graphs • Used to transfer RDF data between apps <http://arxiv.org/abs/1106.5178> dc:title “The Open Annotation Collaboration Model” . ...AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  20. 20. RDF Vocabulary Description Language (RDFS) • A language for describing the syntax and semantics of vocabularies in a machine understandable way http://xmlns.com/foaf/ rdf:type rdfs:Class spec/#term_Agent rdfs:subclassOf rdf:type rdf:type http://xmlns.com/foaf/ http://purl.org/ontology/ spec/#term_Person bibo/ArticleAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  21. 21. OWL Web Ontology Language • A more expressive (formal) language for defining the syntax and semantics of vocabularies owl:DatatypeProperty owl:ObjectProperty owl:FunctionalProperty rdf:type rdf:type rdf:type http://xmlns.com/foaf/0.1/ http://xmlns.com/foaf/0.1/ http://xmlns.com/foaf/0.1/ name homepage genderAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  22. 22. OWL Web Ontology Language • Defines properties for linking resources owl:sameAs http://arxiv.org/authors/ http://springerlink.com/ Haslhofer_B author/xyzafbea owl:sameAs owl:sameAs http://arxiv.org/abs/ 1106.5178 http://eprints.cs.univie.ac.at/ creator/xyzafbea owl:sameAs http://eprints.cs.univie.ac.at/ 2871NOTE: owl:sameAs is NOT the only linking property.AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  23. 23. Simple Knowledge Organization System (SKOS) • A language for describing controlled vocabularies (taxonomies, thesauri, etc) skos:closeMatch http://springerlink.com/ http://arxiv.org/subjects/ vocab/computer-science skos.rdf#cs skos:broadMatch skos:broadMatch skos:broader http://arxiv.org/subjects/ http://eprints.cs.univie.ac.at/ skos.rdf#cs.DL subjects/multimedia dcterms:subject dcterms:subject http://arxiv.org/abs/ http://eprints.cs.univie.ac.at/ 1106.5178 2871AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  24. 24. SPARQL • A query language and protocol for accessing RDF data on the Web SELECT DISTINCT ?x WHERE { ?x skos:subject <http://arxiv.org/subjects/skos.rdf#cs> } LIMIT 10AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  25. 25. Overview • Linked Data Vision and Goals • Enabling Technologies (by example) • Publishing and Consuming Linked Data • The SciLink ProjectAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  26. 26. Information vs. Non-Information Resource • Information Resource • web pages, images, pdf documents, etc. • all their essential characteristics can be conveyed in a message • e.g., http://arxiv.org/pdf/1106.5178v1 • Non-Information Resource • other things such as people, this meeting, concepts • their essence is not information • e.g., http://arxiv.org/authors/Haslhofer_BAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  27. 27. Publishing Data • Distinguish between non-information and information resource • Sample NIR • http://arxiv.org/resource/1106.5178 • Sample IRs • http://arxiv.org/abs/1106.5178 (HTML) • http://arxiv.org/data/1106.5178 (RDF)AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  28. 28. Publishing Data GET http://arxiv.org/resource/1106.5187 Accept: application/rdf+xml 303 See Other Location: http://arxiv.org/data/1106.5187 GET http://arxiv.org/data/1106.5187 Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ...AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  29. 29. Publishing Data GET http://arxiv.org/resource/1106.5187 Accept: application/rdf+xml 303 See Other Location: http://arxiv.org/data/1106.5187 Oh dear! GET http://arxiv.org/data/1106.5187 Accept: application/rdf+xml 200 OK ... <?xml version="1.0" encoding="utf-8"?> <rdf:RDF ...AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  30. 30. RDFa / Microformats / Microdata • Mechanisms for embedding structured data in Web documents • Define or use a set of attributes to augmented presentation-oriented (HTML) documents with structured data • User agents can extract data from Web documentsAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  31. 31. RDFa Example<div typeof=”bibo:Article” xmlns:bibo=”http://purl.org/ontology/bibo”> <div xmlns:dc=”http://purl.org/dc/elements/1.1/”> <h1 property=”dc:title”>The Open Annotation Collaboration (OAC) Model</h1> .... </div></div>AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  32. 32. Microformats Example <head profile=”http://www.w3.org/2006/03/hcard”> ... </head> <div class=”vcard”> <div class=”fn”>Bernhard Haslhofer</div> </div>AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  33. 33. RDFa vs. Microformats Microformats RDFa flat namespace XML namespaces support HTML4, XHTML 1.1, and support for XHTML 1.1, HTML 5 HTML 5 use latent HTML attributes introduces new metadata attributes vocabulary defined by one open to any RDF-based vocabulary organization/communityAlso see: http://evan.prodromou.name/RDFa_vs_microformatsAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  34. 34. Microdata • A very young HTML proposition that extends Microformats and addresses its shortcomings <div itemscope itemtype="http://schema.org/Article">   <span itemprop="name">The Open Annotation Collaboration (OAC) Model</span>   by <span itemprop="author">Bernhard Haslhofer</span>, <span itemprop=”author”>Rainer Simon</span>, <span itemprop=”author”>Robert Sanderson</span> and <span itemprop=”author”>Herbert Van De Sompel</span>   This article has been tweeted 1 times and contains 0 user comments.   <meta itemprop="interactionCount" content="UserTweets:1"/>   <meta itemprop="interactionCount" content="UserComments:0"/> </div>AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  35. 35. Web Data Publishing Approaches Pro Con Separates data / Technical complexity Linked Data presentation markup Conneg IR / NIR Communication Web Architecture overhead Mixes data / RDFa / Easy to implement presentation markup Microformats / Microdata Search Engines Becomes messyAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  36. 36. Consuming Linked Data curl -H “Accept: application/rdf+xml” http://arxiv.org/data/1105.5178 rapper http://arxiv.org/resource/1105.5178 ... any RDF libraryAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  37. 37. Overview • Linked Data Vision and Goals • Enabling Technologies (by example) • Publishing and Consuming Linked Data • The SciLink ProjectAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  38. 38. Motivation • Current research articles • are mostly PDFs (a kind of BLOB) • point to supplemental or related information by textual references, DOIs, or embedded hypertext links HTML HTML HTML HTML HTMLAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  39. 39. Motivation • But a publication is more: • data sets, conference presentations • blog / wiki entries, software libraries, etc • There is related information on the Web (e.g., Wikipedia)AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  40. 40. Overall Goal Publication Document (PDF) Aggregation for the Document foaf:firstname Jennifer dbpedia: dbpedia: Drosphila Category: dblp: Genome-wide analysis of Notch jmummery Mummery- Signal_transduction signaling in Drosphila by Widmer transgenic RNAi rdfs:seeAlso foaf:lastname skos:subject dcterms: creator Jennifer Mummery-Widmer, .... Outer context Here we report the use of a library of Inner context Drosphila strains...we identified 6 new genes involved in asymmetric cell ore: division. Aggregation Genome-wide dc:title analysis... rdf:type A total of 20,262 transgenic RNAi lines were ex:agg1 screened...Ten flies were recorded in an online database... ore: aggregates ore: aggregates ex: ex: figure4.jpg ore: aggregates ore: aggregates nature07936.pdf ex: ex: movie1.avi database.zip Figure 4. An interaction network for Notch dctype: signalling Dataset Asymmmetric dc:title rdf:type cell division http://www.nature.com/nature/journal/ vaop/ncurrent/pdf/nature07936.pdfAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  41. 41. Methodology • Aggregation/Data Extraction: create aggregations from available information (PDFs, metadata) • Analysis: combine aggregations and learn about established research and publication practices in different areas (focus on linking) • Tool design and Implementation • link discovery • resource synchronization / link maintenance • demonstrator applications operating on data graphAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  42. 42. Aggregation / Data Extraction • scilink-extract library • takes PDFs and metadata as input • creates ORE Aggregations / CSV files • currently implemented for arXiv.org • https://github.com/behas/scilink-extractAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  43. 43. Analysis • How did HTTP linkage evolve in the past? • Where in their papers do scholars make use of HTTP links? • What are the linking to? • Are the links still operational? • ....AAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011
  44. 44. Status • Extraction for arxiv.org finished (~ 700K ) PDFs • We need access to further corpora covering different research areas !!! Work in ProgressAAHEP5 Information Provider Summit | Ithaca, NY | Sept. 22nd, 2011

×