Nanopublications and
Decentralized Publishing
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
Department of Computer Science, VU University Amsterdam
DANS Seminar on
Linked Data in Research and Cultural Heritage
Den Haag
1 May 2017
Problem:
Replication and Re-Use of Research Results
Exemplary Situation: Sue publishes a script that should allow
everybody to replicate her Linked Data based analysis:
# Download data:
wget http://collection.britishmuseum.org/dumps/
....BMCollection-2013-09-25.tar.gz
# Analyze data:
...
Problems:
• What if the resource becomes unavailable at this location?
• What if the third party silently changes the dataset?
• What if the web site gets hacked and the data manipulated?
• What if I am only interested in a small subset of the dataset?
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 2 / 20
Data Publishing, Archiving, and Re-Use
Published data should therefore be:
• Verifiable (Is this really the data I am looking for?)
• Immutable (Can I be sure that it hasn’t been modified?)
• Permanent (Will it be available in 1, 5, 20 years from now?)
• Reliable (Can it be efficiently retrieved whenever needed?)
• Granular (Can I refer to individual data entries?)
• Semantic (Can it be automatically interpreted?)
• Linked (Does it use established identifiers and ontologies?)
• Trustworthy (Can I trust the source?)
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 3 / 20
Nanopublications:
Provenance-Aware Linked Data Publishing
assertion
provenance
publication info
nanopublication
http://nanopub.org / @nanopub org
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 4 / 20
Nanopublication Example
:assertion {
:p occursIn: mesh:D004730 .
:p geneProductOf: hgnc:3763 .
}
:provenance {
:assertion prov:hadPrimarySource
pubmed:12891700 .
}
:pubinfo { :np
dct:created 2014-07-03 ;
pav:createdBy
orcid:0000-0001-6818-334X . }
complete version: https://goo.gl/f7iPKK
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 5 / 20
Identifiability Problem of URIs (Web Links)
http://collection.britishmuseum.org/dumps/
BMCollection-2013-09-25.tar.gz
?
Given a URI for a digital artifact, there is no reliable standard
procedure of checking whether a retrieved file really represents the
correct and original state of that artifact.
Solution: Identifiers that include (iterative) cryptographic hash
values (as applied, for example, by Git and Bitcoin)
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 6 / 20
Trusty URIs: Cryptographic Hash Values for
Verifiable and Immutable Web Identifiers
Basic idea: Use of cryptographic hash values together with URIs as
identifiers for digital artifacts such as nanopublications.
Example of a Trusty URI for RDF content:
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 7 / 20
Verifiable — Immutable — Permanent
Whether or not a given resource is the one a given trusty URI is
supposed to represent can be verified with perfect confidence.
http://trustyuri.net
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 8 / 20
Extended Range of Verifiability Through
Iterative Hashing
http://...RAcbjcRI...
http://...RAQozo2w...
http://...RABMq4Wc...
http://...RAcbjcRI...
http://...RAQozo2w...
http://.../resource23
http://.../resource23
...
http://...RAUx3Pqu...
http://.../resource55
http://...RABMq4Wc...
http://.../resource55
http://...RARz0AX-...
...
http://...RAUx3Pqu...
...
http://...RARz0AX...
...
range of
verifiability
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 9 / 20
Verifiable — Immutable — Permanent
Trusty URI artifacts are immutable, as any change in the content
also changes its URI, thereby making it a new artifact.
http://trustyuri.net
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 10 / 20
Verifiable — Immutable — Permanent
Trusty URI artifacts are permanent, as they can be retrieved from
the cache of third-party websites if otherwise no longer available.
http://trustyuri.net
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 11 / 20
Decentralized and Reliable Publishing with a
Nanopublication Server Network
Nanopublications
with Trusty URIs
Publication
Retrieval
Propagation /
Archiving
http://purl.org/nanopub/monitor
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 12 / 20
Decentralized — Open — Real-time
• No a central authority: Everybody can set up a server and join
the network
• No restrictions on publication: Everybody can upload
nanopublications
• No delay between submission and publication: Nanopublications
are made public immediately
• No updates: If a nanopublication is modified, that makes it a
new nanopublication (enforced by trusty URIs)
• No queries: Only simple identifier-based lookup
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 13 / 20
Defining Datasets with Nanopublication Indexes
(which are themselves Nanopublications)
appends
has sub-index
has
element
(a) (b)
(c) (f)
(d) (e)
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 14 / 20
Existing Nanopublication Datasets
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 15 / 20
Using Nanopublication Datasets
Once published in the network, researchers can fetch and reuse the
data in a reliable and prefectly reproducible manner:
# Download data:
np get -c RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1JjxwFZINHw
# Analyze data:
...
Existing data can be recombined in new indexes; and researchers can
unambiguously refer to the used datasets for new results:
this: prov:wasDerivedFrom nps:RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1Jjx
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. arXiv:1411.2749.
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 16 / 20
Could these techniques and infrastructures allow us to
make a step forward in terms of automation in science?
S C I E N C E B O T S
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 17 / 20
S C I E N C E B O T S
Science bots could cover a wide variety of applications, for example:
nanopub
-lications
PubMed
abstracts
nanopub
-lications
sensor
data
nanopub
-lications
nanopub
-lications
text mining bot
inference bot
sensor bot
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 18 / 20
Panel Questions
• Should we treat software and data like papers when it comes to
giving credit and measuring impact, or do we need something
else than citations?
• Can and should we automate to a certain extent scientific
analyses and their replications?
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 19 / 20
Thank you for your attention!
Questions?
Further information:
• My personal website: http://www.tkuhn.org
• Trusty URIs: http://trustyuri.net
• Nanopublications: http://nanopub.org
Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 20 / 20

Nanopublications and Decentralized Publishing

  • 1.
    Nanopublications and Decentralized Publishing TobiasKuhn http://www.tkuhn.org @txkuhn Department of Computer Science, VU University Amsterdam DANS Seminar on Linked Data in Research and Cultural Heritage Den Haag 1 May 2017
  • 2.
    Problem: Replication and Re-Useof Research Results Exemplary Situation: Sue publishes a script that should allow everybody to replicate her Linked Data based analysis: # Download data: wget http://collection.britishmuseum.org/dumps/ ....BMCollection-2013-09-25.tar.gz # Analyze data: ... Problems: • What if the resource becomes unavailable at this location? • What if the third party silently changes the dataset? • What if the web site gets hacked and the data manipulated? • What if I am only interested in a small subset of the dataset? Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 2 / 20
  • 3.
    Data Publishing, Archiving,and Re-Use Published data should therefore be: • Verifiable (Is this really the data I am looking for?) • Immutable (Can I be sure that it hasn’t been modified?) • Permanent (Will it be available in 1, 5, 20 years from now?) • Reliable (Can it be efficiently retrieved whenever needed?) • Granular (Can I refer to individual data entries?) • Semantic (Can it be automatically interpreted?) • Linked (Does it use established identifiers and ontologies?) • Trustworthy (Can I trust the source?) Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 3 / 20
  • 4.
    Nanopublications: Provenance-Aware Linked DataPublishing assertion provenance publication info nanopublication http://nanopub.org / @nanopub org Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 4 / 20
  • 5.
    Nanopublication Example :assertion { :poccursIn: mesh:D004730 . :p geneProductOf: hgnc:3763 . } :provenance { :assertion prov:hadPrimarySource pubmed:12891700 . } :pubinfo { :np dct:created 2014-07-03 ; pav:createdBy orcid:0000-0001-6818-334X . } complete version: https://goo.gl/f7iPKK Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 5 / 20
  • 6.
    Identifiability Problem ofURIs (Web Links) http://collection.britishmuseum.org/dumps/ BMCollection-2013-09-25.tar.gz ? Given a URI for a digital artifact, there is no reliable standard procedure of checking whether a retrieved file really represents the correct and original state of that artifact. Solution: Identifiers that include (iterative) cryptographic hash values (as applied, for example, by Git and Bitcoin) Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 6 / 20
  • 7.
    Trusty URIs: CryptographicHash Values for Verifiable and Immutable Web Identifiers Basic idea: Use of cryptographic hash values together with URIs as identifiers for digital artifacts such as nanopublications. Example of a Trusty URI for RDF content: .trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70 Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 7 / 20
  • 8.
    Verifiable — Immutable— Permanent Whether or not a given resource is the one a given trusty URI is supposed to represent can be verified with perfect confidence. http://trustyuri.net Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 8 / 20
  • 9.
    Extended Range ofVerifiability Through Iterative Hashing http://...RAcbjcRI... http://...RAQozo2w... http://...RABMq4Wc... http://...RAcbjcRI... http://...RAQozo2w... http://.../resource23 http://.../resource23 ... http://...RAUx3Pqu... http://.../resource55 http://...RABMq4Wc... http://.../resource55 http://...RARz0AX-... ... http://...RAUx3Pqu... ... http://...RARz0AX... ... range of verifiability Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 9 / 20
  • 10.
    Verifiable — Immutable— Permanent Trusty URI artifacts are immutable, as any change in the content also changes its URI, thereby making it a new artifact. http://trustyuri.net Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 10 / 20
  • 11.
    Verifiable — Immutable— Permanent Trusty URI artifacts are permanent, as they can be retrieved from the cache of third-party websites if otherwise no longer available. http://trustyuri.net Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 11 / 20
  • 12.
    Decentralized and ReliablePublishing with a Nanopublication Server Network Nanopublications with Trusty URIs Publication Retrieval Propagation / Archiving http://purl.org/nanopub/monitor Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 12 / 20
  • 13.
    Decentralized — Open— Real-time • No a central authority: Everybody can set up a server and join the network • No restrictions on publication: Everybody can upload nanopublications • No delay between submission and publication: Nanopublications are made public immediately • No updates: If a nanopublication is modified, that makes it a new nanopublication (enforced by trusty URIs) • No queries: Only simple identifier-based lookup Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 13 / 20
  • 14.
    Defining Datasets withNanopublication Indexes (which are themselves Nanopublications) appends has sub-index has element (a) (b) (c) (f) (d) (e) Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 14 / 20
  • 15.
    Existing Nanopublication Datasets TobiasKuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 15 / 20
  • 16.
    Using Nanopublication Datasets Oncepublished in the network, researchers can fetch and reuse the data in a reliable and prefectly reproducible manner: # Download data: np get -c RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1JjxwFZINHw # Analyze data: ... Existing data can be recombined in new indexes; and researchers can unambiguously refer to the used datasets for new results: this: prov:wasDerivedFrom nps:RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1Jjx Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. arXiv:1411.2749. Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 16 / 20
  • 17.
    Could these techniquesand infrastructures allow us to make a step forward in terms of automation in science? S C I E N C E B O T S Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 17 / 20
  • 18.
    S C IE N C E B O T S Science bots could cover a wide variety of applications, for example: nanopub -lications PubMed abstracts nanopub -lications sensor data nanopub -lications nanopub -lications text mining bot inference bot sensor bot Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 18 / 20
  • 19.
    Panel Questions • Shouldwe treat software and data like papers when it comes to giving credit and measuring impact, or do we need something else than citations? • Can and should we automate to a certain extent scientific analyses and their replications? Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 19 / 20
  • 20.
    Thank you foryour attention! Questions? Further information: • My personal website: http://www.tkuhn.org • Trusty URIs: http://trustyuri.net • Nanopublications: http://nanopub.org Tobias Kuhn, VU University Amsterdam Nanopublications and Decentralized Publishing 20 / 20