Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Decentralized Approach to Dissemination,
Retrieval, and Archiving of Data
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
Depar...
Increasing Importance of Scientific Data
https://www.google.com/trends/explore#q=%22data%20science%22
Tobias Kuhn, Departme...
Scientific Data as Supplemental Material
...
http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-infor...
Scientific Data in Open Repositories
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Dat...
We Need Better Data Publishing!
Published data should be:
• Verifiable (Is this really the data I am looking for?)
• Immuta...
Requirement: Automated Low-Level Operations
We need automated low-level operations to publish and retrieve data
entries an...
Nanopublications: Linked Data Containers for
Provenance-Aware Semantic Publishing
assertion
provenance
publication info
na...
Trusty URIs: Cryptographic Hash Values for
Verifiable and Immutable Web Identifiers
Nanopublications with Trusty URIs are .....
Decentralized and Reliable Publishing with a
Nanopublication Server Network
Nanopublications
with Trusty URIs
Publication
...
Defining Datasets with Nanopublication Indexes
(which are themselves Nanopublications)
appends
has sub-index
has
element
(a...
Nanopublication Server Network is
Efficient and Scalable
Our servers can deliver nanopublications about 100 times faster tha...
Nanopublication Datasets
Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishin...
Reliable Low-Level Publish/Retrieve Operations!
Operation to publish data:
$ np publish nanopubs.trig
156026 nanopubs publ...
Future Work
• Improve server protocol
• Develop services on top of the server network
• Establish best practices for versi...
Thank you for your attention!
Questions?
Further information:
• Paper on the approach:
https://peerj.com/articles/cs-78/
•...
Multi-Layer Architecture
applications (analyze/use data)
advanced services (query/analyze data)
core services (find data)
d...
Upcoming SlideShare
Loading in …5
×

A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

323 views

Published on

How can we make scientific publishing more reliable and efficient?

Published in: Science
  • Be the first to comment

A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

  1. 1. A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data Tobias Kuhn http://www.tkuhn.org @txkuhn Department of Computer Science, VU University Amsterdam Open Science for an Open Society Workshop 2016 Conference on Complex Systems Amsterdam, Netherlands 20 September 2016
  2. 2. Increasing Importance of Scientific Data https://www.google.com/trends/explore#q=%22data%20science%22 Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 2 / 15
  3. 3. Scientific Data as Supplemental Material ... http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-information Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 3 / 15
  4. 4. Scientific Data in Open Repositories Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 4 / 15
  5. 5. We Need Better Data Publishing! Published data should be: • Verifiable (Is this really the data I am looking for?) • Immutable (Can I be sure that it hasn’t been modified?) • Permanent (Will it be available in 1, 5, 20 years from now?) • Reliable (Can it be efficiently retrieved whenever needed?) • Granular (Can I refer to individual data entries?) • Semantic (Can it be automatically interpreted?) • Linked (Does it use established identifiers and ontologies?) • Trustworthy (Can I trust the source?) Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 5 / 15
  6. 6. Requirement: Automated Low-Level Operations We need automated low-level operations to publish and retrieve data entries and datasets: publish <dataset-identifier> get <dataset-identifier> (like HTTP POST/GET but verifiable, immutable, permanent, reliable, ...) Approach: Linked Data + Cryptography + Decentralization Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 6 / 15
  7. 7. Nanopublications: Linked Data Containers for Provenance-Aware Semantic Publishing assertion provenance publication info nanopublication http://nanopub.org / @nanopub org Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 7 / 15
  8. 8. Trusty URIs: Cryptographic Hash Values for Verifiable and Immutable Web Identifiers Nanopublications with Trusty URIs are ... Verifiable + Immutable + Permanent .trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70 http://trustyuri.net/ Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 8 / 15
  9. 9. Decentralized and Reliable Publishing with a Nanopublication Server Network Nanopublications with Trusty URIs Publication Retrieval Propagation / Archiving http://npmonitor.inn.ac Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 9 / 15
  10. 10. Defining Datasets with Nanopublication Indexes (which are themselves Nanopublications) appends has sub-index has element (a) (b) (c) (f) (d) (e) Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 10 / 15
  11. 11. Nanopublication Server Network is Efficient and Scalable Our servers can deliver nanopublications about 100 times faster than when a triple store is used (and need much less resources): time from start of test in seconds responsetimeinseconds 0 50 100 150 200 250 3000 50 100 150 200 250 300 0.1 1 10 100 0 20 40 60 80 100 number of clients accessing the service in parallel Virtuoso triple store with SPARQL endpoint nanopublication server Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 11 / 15
  12. 12. Nanopublication Datasets Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 12 / 15
  13. 13. Reliable Low-Level Publish/Retrieve Operations! Operation to publish data: $ np publish nanopubs.trig 156026 nanopubs published at http://np.inn.ac/ which can also be used to publish dataset definitions (indexes): $ np publish index.trig 157 nanopubs published at http://np.inn.ac/ Operation to retrieve data entries: $ np get http://np.inn.ac/RA7Kmmugi8OuCirfe5WKchnJhC3FuhQD and to retrieve entire datasets: $ np get -c http://np.inn.ac/RAY lQruuagCYtAcKAPptkY7EpITw https://github.com/Nanopublication/nanopub-java Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 13 / 15
  14. 14. Future Work • Improve server protocol • Develop services on top of the server network • Establish best practices for versioning, retractions, reviews, etc. • Connect it all to the scientific publishing workflow Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 14 / 15
  15. 15. Thank you for your attention! Questions? Further information: • Paper on the approach: https://peerj.com/articles/cs-78/ • Nanopublications: http://nanopub.org • Trusty URIs: http://trustyuri.net • Nanopublication Server Network: http://npmonitor.inn.ac Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 15 / 15
  16. 16. Multi-Layer Architecture applications (analyze/use data) advanced services (query/analyze data) core services (find data) decentralized server network (provide data) 1 Tobias Kuhn, Department of Computer Science, VU University Amsterdam Decentralized Data Publishing 16 / 15

×