SlideShare a Scribd company logo
Scientific Data Publishing
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
ETH Zurich
COSS Team Retreat
Santorini, Greece
27 June 2015
Increasing Importance of Scientific Data
London Underground staff sorting 4M used tickets to analyse line use in 1939
Image from: http://www.telegraph.co.uk/travel/picturegalleries/9791007/
The-history-of-the-Tube-in-pictures-150-years-of-London-Underground.html?frame=2447159
Tobias Kuhn, ETH Zurich Scientific Data Publishing 2 / 24
Problem:
Replication and Re-Use of Research Results
Exemplary Situation: Sue publishes a script that should allow
everybody to replicate her scientific analysis:
# Download data:
wget http://some-third-party.org/dataset/1.4
# Analyze data:
...
Problems:
• What if the resource becomes unavailable at this location?
• What if the third party silently changes that version of the
dataset?
• What if the web site gets hacked and the data manipulated?
Tobias Kuhn, ETH Zurich Scientific Data Publishing 3 / 24
Data Publishing, Archiving, and Re-Use
Scientific datasets become increasingly important, and these data are
increasingly produced and consumed directly by software.
Published data should therefore be:
• Verifiable (Is this really the data I am looking for?)
• Immutable (Can I be sure that it hasn’t been modified?)
• Permanent (Will it be available in 1, 5, 20 years from now?)
• Reliable (Can it be efficiently retrieved whenever needed?)
• Granular (Can I refer to individual data entries?)
• Semantic (Can it be automatically interpreted?)
• Linked (Does it use established identifiers and ontologies?)
• Trustworthy (Can I trust the source?)
Tobias Kuhn, ETH Zurich Scientific Data Publishing 4 / 24
Nanopublications:
Provenance-Aware Semantic Publishing
(based on RDF)
assertion
provenance
publication info
nanopublication
http://nanopub.org / @nanopub org
Tobias Kuhn, ETH Zurich Scientific Data Publishing 5 / 24
Identifiability Problem of URLs (Web Links)
http://some-third-party.org/dataset/1.4
?
Given a URL for a digital artifact, there is no reliable standard
procedure of checking whether a retrieved file really represents the
correct and original state of that artifact.
Solution: Identifiers that include (iterative) cryptographic hash
values (as applied, for example, by Git and Bitcoin)
Tobias Kuhn, ETH Zurich Scientific Data Publishing 6 / 24
Cryptographic Hash Values
A cryptographic hash value is a short random-looking sequence of
bytes calculated on a given input:
This is some text. ⇒ hRUv0M
The same input always leads to exactly the same value:
This is some text. ⇒ hRUv0M
Even a minimally modified input leads to a completely different value:
This is xome text. ⇒ sCtYbf
The input is not reconstructible from the hash value (in practice):
This is some text. hRUv0M
Given an input and a matching hash value, we can therefore be
perfectly sure that it was exactly that input that led to the hash.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 7 / 24
Iterative Hashing
Hash values can be used as identifiers in an iterative fashion:
This is some text. ⇒ hRUv0M
This text is based on hRUv0M. ⇒ LwGqwX
This depends on LwGqwX. ⇒ civRbq
From a single identifier (such as civRbq), the entire reference tree
can be verified:
This is some text. ⇒ hRUv0M
This text is based on hRUv0M. ⇒ LwGqwX
This depends on LwGqwX. ⇒ civRbq
And any modification can be noticed:
This is xome text. ⇒  hRUv0M
This text is based on hRUv0M. ⇒ LwGqwX
This depends on LwGqwX. ⇒ civRbq
Tobias Kuhn, ETH Zurich Scientific Data Publishing 8 / 24
Trusty URIs: Cryptographic Hash Values for
Verifiable and Immutable Web Identifiers
Basic idea: Use of cryptographic hash values together with URIs
(˜= URLs) as identifiers for digital artifacts such as nanopublications.
Requirements:
• To allow for the verification of entire reference trees, the hash
should be part of the reference (i.e. the URI)
• Format-independent hash for different kinds of content
• To allow for meta-data, digital artifacts should be allowed to
contain self-references (i.e. their own URI)
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 9 / 24
Identifiers with Cryptographic Hash Values
Nanopublications with Trusty URIs are ...
Verifiable
+
Immutable
+
Permanent
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
http://trustyuri.net/
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 10 / 24
Extended Range of Verifiability Through
Iterative Hashing
http://...RAcbjcRI...
http://...RAQozo2w...
http://...RABMq4Wc...
http://...RAcbjcRI...
http://...RAQozo2w...
http://.../resource23
http://.../resource23
...
http://...RAUx3Pqu...
http://.../resource55
http://...RABMq4Wc...
http://.../resource55
http://...RARz0AX-...
...
http://...RAUx3Pqu...
...
http://...RARz0AX...
...
range of
verifiability
http://trustyuri.net
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 11 / 24
Defining Datasets with Nanopublication Indexes
(which are themselves Nanopublications)
appends
has sub-index
has
element
(a) (b)
(c) (f)
(d) (e)
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 12 / 24
A Multi-Layer Architecture for
Reliable Scientific Data Publishing?
User Interfaces
Services:
Finding, querying, filtering,
and aggregating data
Tobias Kuhn, ETH Zurich Scientific Data Publishing 13 / 24
A Multi-Layer Architecture for
Reliable Scientific Data Publishing?
User Interfaces
Decentralized
Data Publishing
Network
hypotheses
Services:
Finding, querying, filtering,
and aggregating data
facts
measurements
opinions
meta-data
assessments
annotations
...
Tobias Kuhn, ETH Zurich Scientific Data Publishing 14 / 24
A Multi-Layer Architecture for
Reliable Scientific Data Publishing?
User Interfaces
Decentralized
Data Publishing
Network
hypotheses
Services:
Finding, querying, filtering,
and aggregating data
Science Bots:
Automated Tasks
facts
measurements
opinions
meta-data
assessments
annotations
...
Tobias Kuhn, ETH Zurich Scientific Data Publishing 15 / 24
Decentralized and Reliable Publishing with a
Nanopublication Server Network
Nanopublications
with Trusty URIs
Publication
Retrieval
Propagation /
Archiving
http://npmonitor.inn.ac
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 16 / 24
Decentralized — Open — Real-time
• No a central authority: Everybody can set up a server and join
the network
• No restrictions on publication: Everybody can upload
nanopublications
• No delay between submission and publication: Nanopublications
are made public immediately
• No updates: If a nanopublication is modified, that makes it a
new nanopublication (enforced by trusty URIs)
• No queries: Only simple identifier-based lookup
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 17 / 24
Using Nanopublication Datasets
Once published in the network, nanopublication indexes can be cited:
[7] Nanopubs converted from OpenBEL’s Small and Large Corpus 20131211.
Nanopublication index, 4 March 2014,
http://np.inn.ac/RAR5dwELYLKGSfrOclnWhjOj-2nGZN_8BW1JjxwFZINHw
Researchers can then fetch and reuse the data in a reliable and
prefectly reproducible manner:
# Download data:
np get -c RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1JjxwFZINHw
# Analyze data:
...
Existing data can be recombined in new indexes; and researchers can
unambiguously refer to the used datasets for new results:
this: prov:wasDerivedFrom nps:RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1Jjx
Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of
Data. ISWC 2015.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 18 / 24
Could these techniques and infrastructures allow us to
make a step forward in terms of automation in science?
S C I E N C E B O T S
Tobias Kuhn, ETH Zurich Scientific Data Publishing 19 / 24
Science Bots —
Scientists’ Little Helpers in the Future?
“Science bots” that autonomously publish results in their own name
could cover a wide variety of applications, for example:
nanopub
-lications
PubMed
abstracts
nanopub
-lications
sensor
data
nanopub
-lications
nanopub
-lications
text mining bot
inference bot
sensor bot
Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion
Proceedings.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 20 / 24
Quality Control with Reputation
Mechanisms and Network Metrics?
Robust automatic calculation of reputation metrics in a decentralized
and open system:
is contributed by
Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion
Proceedings.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 21 / 24
Quality Control with Reputation
Mechanisms and Network Metrics?
Robust automatic calculation of reputation metrics in a decentralized
and open system:
gives positive
assessment for
is contributed by
Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion
Proceedings.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 22 / 24
Quality Control with Reputation
Mechanisms and Network Metrics?
Robust automatic calculation of reputation metrics in a decentralized
and open system:
gives positive
assessment for
is contributed by
Eigenvector
centrality (0-100)
77
100
71 1
0
85
4
0.0
0 0
50
50 0
0.4 0.4
0
Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion
Proceedings.
Tobias Kuhn, ETH Zurich Scientific Data Publishing 23 / 24
Thank you for your attention!
Questions?
Further information:
• Trusty URIs: http://trustyuri.net
• Nanopublications: http://nanopub.org
• Nanopublication Server Network:
http://arxiv.org/abs/1411.2749
• Science Bots: http://arxiv.org/abs/1503.04374
Tobias Kuhn, ETH Zurich Scientific Data Publishing 24 / 24

More Related Content

Similar to Scientific Data Publishing

A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
Tobias Kuhn
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
Tobias Kuhn
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
Tobias Kuhn
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
Tobias Kuhn
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Tobias Kuhn
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
Tobias Kuhn
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
Tobias Kuhn
 
Nanopubs
NanopubsNanopubs
Nanopubs
Tobias Kuhn
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
Beth Plale
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
Tobias Kuhn
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014
Beth Plale
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Enrico Motta
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenlese
eveline wandl-vogt
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
OpenAIRE
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
LIBER Europe
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
Robert H. McDonald
 
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URN
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URNData Mining on Web URL Using Base 64 Encoding to Generate Secure URN
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URN
IJMTST Journal
 
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
Huber Flores
 
How to improve the acceptance of AltMetrics
How to improve the acceptance of AltMetricsHow to improve the acceptance of AltMetrics
How to improve the acceptance of AltMetrics
uherb
 
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
Hong-Linh Truong
 

Similar to Scientific Data Publishing (20)

A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
A Decentralized Network for Publishing Linked Data — Nanopublications, Trusty...
 
Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?Science Bots: A Model for the Future of Scientific Computation?
Science Bots: A Model for the Future of Scientific Computation?
 
Nanopublications and Decentralized Publishing
Nanopublications and Decentralized PublishingNanopublications and Decentralized Publishing
Nanopublications and Decentralized Publishing
 
Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications Semantic Publishing with Nanopublications
Semantic Publishing with Nanopublications
 
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linke...
 
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of DataA Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
 
Data Publishing and Post-Publication Reviews
Data Publishing and Post-Publication ReviewsData Publishing and Post-Publication Reviews
Data Publishing and Post-Publication Reviews
 
Nanopubs
NanopubsNanopubs
Nanopubs
 
HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
Semantic Publishing and Nanopublications
Semantic Publishing and NanopublicationsSemantic Publishing and Nanopublications
Semantic Publishing and Nanopublications
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenlese
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URN
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URNData Mining on Web URL Using Base 64 Encoding to Generate Secure URN
Data Mining on Web URL Using Base 64 Encoding to Generate Secure URN
 
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
AI Sensors and Dashboards: Gauging and Monitoring the Inferences Capabilities...
 
How to improve the acceptance of AltMetrics
How to improve the acceptance of AltMetricsHow to improve the acceptance of AltMetrics
How to improve the acceptance of AltMetrics
 
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
HINC – Harmonizing Diverse Resource Information Across IoT, Network Functions...
 

More from Tobias Kuhn

Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
Tobias Kuhn
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
Tobias Kuhn
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
Tobias Kuhn
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
Tobias Kuhn
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Tobias Kuhn
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
Tobias Kuhn
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Tobias Kuhn
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
Tobias Kuhn
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Tobias Kuhn
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
Tobias Kuhn
 
AceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic WikiAceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic Wiki
Tobias Kuhn
 
How to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural LanguagesHow to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural Languages
Tobias Kuhn
 
Wissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem EnglischWissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem Englisch
Tobias Kuhn
 
An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
Tobias Kuhn
 
Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
Codeco: A Grammar Notation for Controlled Natural Language in Predictive EditorsCodeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
Tobias Kuhn
 

More from Tobias Kuhn (17)

Linked Data Publishing with Nanopublications
Linked Data Publishing with NanopublicationsLinked Data Publishing with Nanopublications
Linked Data Publishing with Nanopublications
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 
The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer The Controlled Natural Language of Randall Munroe’s Thing Explainer
The Controlled Natural Language of Randall Munroe’s Thing Explainer
 
nanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublicationsnanopub-java: A Java Library for Nanopublications
nanopub-java: A Java Library for Nanopublications
 
Meme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation NetworksMeme Extraction from Corpora of Scientific Literature using Citation Networks
Meme Extraction from Corpora of Scientific Literature using Citation Networks
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific LiteratureCitation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
 
Automatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen WikiAutomatische Übersetzung in einem multilingualen, semantischen Wiki
Automatische Übersetzung in einem multilingualen, semantischen Wiki
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
Improving Text Mining with Controlled Natural Language: A Case Study for Prot...
 
AceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic WikiAceWiki: A Natural and Expressive Semantic Wiki
AceWiki: A Natural and Expressive Semantic Wiki
 
AceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic WikiAceWiki: Controlled English in a Semantic Wiki
AceWiki: Controlled English in a Semantic Wiki
 
How to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural LanguagesHow to Evaluate Controlled Natural Languages
How to Evaluate Controlled Natural Languages
 
Wissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem EnglischWissensrepräsentation in kontrolliertem Englisch
Wissensrepräsentation in kontrolliertem Englisch
 
An Introduction to AceWiki
An Introduction to AceWikiAn Introduction to AceWiki
An Introduction to AceWiki
 
Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
Codeco: A Grammar Notation for Controlled Natural Language in Predictive EditorsCodeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors
 

Recently uploaded

Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
bellared2
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Sérgio Sacani
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
muralinath2
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
PANDURANGLAWATE1
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
Sérgio Sacani
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Sérgio Sacani
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
Faculty of Applied Chemistry and Materials Science
 
Potential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptxPotential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Faculty of Applied Chemistry and Materials Science
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Dr. sreeremya S
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
Dr. sreeremya S
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
Faculty of Applied Chemistry and Materials Science
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
Task Train
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Faculty of Applied Chemistry and Materials Science
 
Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....
anushkakharat13
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
kalpnayadav03021986
 
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Sérgio Sacani
 

Recently uploaded (20)

Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
 
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptxellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
ellipticytescausesprognosistreatment-240622051139-23d50b05.pptx
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
 
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary TrackThe Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
The Dynamical Origins of the Dark Comets and a Proposed Evolutionary Track
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
 
Potential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptxPotential of Marine Renewable and Non renewable energy.pptx
Potential of Marine Renewable and Non renewable energy.pptx
 
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra IonBiochar impregnation as slow release fertilizer - Violeta Alexandra Ion
Biochar impregnation as slow release fertilizer - Violeta Alexandra Ion
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
 
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
Direct instructions, towards hundred fold yield,layering,budding,grafting,pla...
 
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physicsTHE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
THE ESSENCE OF CHANGE CHAPTER ,energy,conversion,life is easy,laws of physics
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
 
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdfHow Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
How Does TaskTrain Integrate Workflow and Project Management Efficiently.pdf
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
 
Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....Plant Kingdom BioHack class 11 neet ....
Plant Kingdom BioHack class 11 neet ....
 
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptxAN EMPIRE ACROSS THE THREE CONTINENTS.pptx
AN EMPIRE ACROSS THE THREE CONTINENTS.pptx
 
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
Simulations of pulsed overpressure jets: formation of bellows and ripples in ...
 

Scientific Data Publishing

  • 1. Scientific Data Publishing Tobias Kuhn http://www.tkuhn.org @txkuhn ETH Zurich COSS Team Retreat Santorini, Greece 27 June 2015
  • 2. Increasing Importance of Scientific Data London Underground staff sorting 4M used tickets to analyse line use in 1939 Image from: http://www.telegraph.co.uk/travel/picturegalleries/9791007/ The-history-of-the-Tube-in-pictures-150-years-of-London-Underground.html?frame=2447159 Tobias Kuhn, ETH Zurich Scientific Data Publishing 2 / 24
  • 3. Problem: Replication and Re-Use of Research Results Exemplary Situation: Sue publishes a script that should allow everybody to replicate her scientific analysis: # Download data: wget http://some-third-party.org/dataset/1.4 # Analyze data: ... Problems: • What if the resource becomes unavailable at this location? • What if the third party silently changes that version of the dataset? • What if the web site gets hacked and the data manipulated? Tobias Kuhn, ETH Zurich Scientific Data Publishing 3 / 24
  • 4. Data Publishing, Archiving, and Re-Use Scientific datasets become increasingly important, and these data are increasingly produced and consumed directly by software. Published data should therefore be: • Verifiable (Is this really the data I am looking for?) • Immutable (Can I be sure that it hasn’t been modified?) • Permanent (Will it be available in 1, 5, 20 years from now?) • Reliable (Can it be efficiently retrieved whenever needed?) • Granular (Can I refer to individual data entries?) • Semantic (Can it be automatically interpreted?) • Linked (Does it use established identifiers and ontologies?) • Trustworthy (Can I trust the source?) Tobias Kuhn, ETH Zurich Scientific Data Publishing 4 / 24
  • 5. Nanopublications: Provenance-Aware Semantic Publishing (based on RDF) assertion provenance publication info nanopublication http://nanopub.org / @nanopub org Tobias Kuhn, ETH Zurich Scientific Data Publishing 5 / 24
  • 6. Identifiability Problem of URLs (Web Links) http://some-third-party.org/dataset/1.4 ? Given a URL for a digital artifact, there is no reliable standard procedure of checking whether a retrieved file really represents the correct and original state of that artifact. Solution: Identifiers that include (iterative) cryptographic hash values (as applied, for example, by Git and Bitcoin) Tobias Kuhn, ETH Zurich Scientific Data Publishing 6 / 24
  • 7. Cryptographic Hash Values A cryptographic hash value is a short random-looking sequence of bytes calculated on a given input: This is some text. ⇒ hRUv0M The same input always leads to exactly the same value: This is some text. ⇒ hRUv0M Even a minimally modified input leads to a completely different value: This is xome text. ⇒ sCtYbf The input is not reconstructible from the hash value (in practice): This is some text. hRUv0M Given an input and a matching hash value, we can therefore be perfectly sure that it was exactly that input that led to the hash. Tobias Kuhn, ETH Zurich Scientific Data Publishing 7 / 24
  • 8. Iterative Hashing Hash values can be used as identifiers in an iterative fashion: This is some text. ⇒ hRUv0M This text is based on hRUv0M. ⇒ LwGqwX This depends on LwGqwX. ⇒ civRbq From a single identifier (such as civRbq), the entire reference tree can be verified: This is some text. ⇒ hRUv0M This text is based on hRUv0M. ⇒ LwGqwX This depends on LwGqwX. ⇒ civRbq And any modification can be noticed: This is xome text. ⇒ hRUv0M This text is based on hRUv0M. ⇒ LwGqwX This depends on LwGqwX. ⇒ civRbq Tobias Kuhn, ETH Zurich Scientific Data Publishing 8 / 24
  • 9. Trusty URIs: Cryptographic Hash Values for Verifiable and Immutable Web Identifiers Basic idea: Use of cryptographic hash values together with URIs (˜= URLs) as identifiers for digital artifacts such as nanopublications. Requirements: • To allow for the verification of entire reference trees, the hash should be part of the reference (i.e. the URI) • Format-independent hash for different kinds of content • To allow for meta-data, digital artifacts should be allowed to contain self-references (i.e. their own URI) .trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70 Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, ETH Zurich Scientific Data Publishing 9 / 24
  • 10. Identifiers with Cryptographic Hash Values Nanopublications with Trusty URIs are ... Verifiable + Immutable + Permanent .trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70 http://trustyuri.net/ Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, ETH Zurich Scientific Data Publishing 10 / 24
  • 11. Extended Range of Verifiability Through Iterative Hashing http://...RAcbjcRI... http://...RAQozo2w... http://...RABMq4Wc... http://...RAcbjcRI... http://...RAQozo2w... http://.../resource23 http://.../resource23 ... http://...RAUx3Pqu... http://.../resource55 http://...RABMq4Wc... http://.../resource55 http://...RARz0AX-... ... http://...RAUx3Pqu... ... http://...RARz0AX... ... range of verifiability http://trustyuri.net Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014. Tobias Kuhn, ETH Zurich Scientific Data Publishing 11 / 24
  • 12. Defining Datasets with Nanopublication Indexes (which are themselves Nanopublications) appends has sub-index has element (a) (b) (c) (f) (d) (e) Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, ETH Zurich Scientific Data Publishing 12 / 24
  • 13. A Multi-Layer Architecture for Reliable Scientific Data Publishing? User Interfaces Services: Finding, querying, filtering, and aggregating data Tobias Kuhn, ETH Zurich Scientific Data Publishing 13 / 24
  • 14. A Multi-Layer Architecture for Reliable Scientific Data Publishing? User Interfaces Decentralized Data Publishing Network hypotheses Services: Finding, querying, filtering, and aggregating data facts measurements opinions meta-data assessments annotations ... Tobias Kuhn, ETH Zurich Scientific Data Publishing 14 / 24
  • 15. A Multi-Layer Architecture for Reliable Scientific Data Publishing? User Interfaces Decentralized Data Publishing Network hypotheses Services: Finding, querying, filtering, and aggregating data Science Bots: Automated Tasks facts measurements opinions meta-data assessments annotations ... Tobias Kuhn, ETH Zurich Scientific Data Publishing 15 / 24
  • 16. Decentralized and Reliable Publishing with a Nanopublication Server Network Nanopublications with Trusty URIs Publication Retrieval Propagation / Archiving http://npmonitor.inn.ac Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, ETH Zurich Scientific Data Publishing 16 / 24
  • 17. Decentralized — Open — Real-time • No a central authority: Everybody can set up a server and join the network • No restrictions on publication: Everybody can upload nanopublications • No delay between submission and publication: Nanopublications are made public immediately • No updates: If a nanopublication is modified, that makes it a new nanopublication (enforced by trusty URIs) • No queries: Only simple identifier-based lookup Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, ETH Zurich Scientific Data Publishing 17 / 24
  • 18. Using Nanopublication Datasets Once published in the network, nanopublication indexes can be cited: [7] Nanopubs converted from OpenBEL’s Small and Large Corpus 20131211. Nanopublication index, 4 March 2014, http://np.inn.ac/RAR5dwELYLKGSfrOclnWhjOj-2nGZN_8BW1JjxwFZINHw Researchers can then fetch and reuse the data in a reliable and prefectly reproducible manner: # Download data: np get -c RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1JjxwFZINHw # Analyze data: ... Existing data can be recombined in new indexes; and researchers can unambiguously refer to the used datasets for new results: this: prov:wasDerivedFrom nps:RAR5dwELYLKGSfrOclnWhjOj-2nGZN 8BW1Jjx Kuhn et al. Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data. ISWC 2015. Tobias Kuhn, ETH Zurich Scientific Data Publishing 18 / 24
  • 19. Could these techniques and infrastructures allow us to make a step forward in terms of automation in science? S C I E N C E B O T S Tobias Kuhn, ETH Zurich Scientific Data Publishing 19 / 24
  • 20. Science Bots — Scientists’ Little Helpers in the Future? “Science bots” that autonomously publish results in their own name could cover a wide variety of applications, for example: nanopub -lications PubMed abstracts nanopub -lications sensor data nanopub -lications nanopub -lications text mining bot inference bot sensor bot Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion Proceedings. Tobias Kuhn, ETH Zurich Scientific Data Publishing 20 / 24
  • 21. Quality Control with Reputation Mechanisms and Network Metrics? Robust automatic calculation of reputation metrics in a decentralized and open system: is contributed by Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion Proceedings. Tobias Kuhn, ETH Zurich Scientific Data Publishing 21 / 24
  • 22. Quality Control with Reputation Mechanisms and Network Metrics? Robust automatic calculation of reputation metrics in a decentralized and open system: gives positive assessment for is contributed by Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion Proceedings. Tobias Kuhn, ETH Zurich Scientific Data Publishing 22 / 24
  • 23. Quality Control with Reputation Mechanisms and Network Metrics? Robust automatic calculation of reputation metrics in a decentralized and open system: gives positive assessment for is contributed by Eigenvector centrality (0-100) 77 100 71 1 0 85 4 0.0 0 0 50 50 0 0.4 0.4 0 Kuhn. Science Bots: A Model for the Future of Scientific Computation? SAVE-SD, WWW 2015 Companion Proceedings. Tobias Kuhn, ETH Zurich Scientific Data Publishing 23 / 24
  • 24. Thank you for your attention! Questions? Further information: • Trusty URIs: http://trustyuri.net • Nanopublications: http://nanopub.org • Nanopublication Server Network: http://arxiv.org/abs/1411.2749 • Science Bots: http://arxiv.org/abs/1503.04374 Tobias Kuhn, ETH Zurich Scientific Data Publishing 24 / 24