Transcript of "Data sharing: a look at the issues - Trieste"
a look at the issues
program manager, science commons
trieste, italy - ICTP - 16 oct 2009
This presentation is licensed under the CreativeCommons-Attribution-3.0 license.
before jumping into data ...
(where we left off)
make sharing easy, legal and scalable
building part of the infrastructure for
scientiﬁc revolutions occur when a
sufﬁcient body of data accumulates to
overthrow the dominant theories
we use to frame reality
a so-called paradigm shift
- from thomas kuhn
content needs to be legally and
“ By open access to the literature, we mean its free
availability on the public internet, permitting users
to read, download, copy, distribute, print, search, or link
to the full texts of the articles, crawl them for
indexing, pass them as data to software, or use them for
any other lawful purpose, without ﬁnancial, legal or
technical barriers other than those inseparable from
gaining access to the internet itself.”
Image from the Public Library of Science, licensed to the public, under
“The only constraint on reproduction and distribution,
and the only role for copyright in this domain, should
be to give authors control over the integrity of their
work and the right to be properly acknowledged
(1) KEGG - Kyoto Encyclopedia of Genes and Genomes
“Non-academic users and Academic users intending to use KEGG for
commercial purposes are requested to obtain a license agreement
through KEGG's exclusive licensing agent, Pathway Solutions, for installation
of KEGG at their sites, for distribution or reselling of KEGG data, for
software development or any other commercial activities that make use of
KEGG, or as end users of any third-party application that requires
downloading of KEGG data or access to KEGG data via the KEGG API.
(2) HapMap - human genetic variation data
“The click-wrap license was designed as a temporary tool to continue the
practice of providing rapid access to human genome data [...]. One
consequence of the license requirement was that the [...] license
prevented HapMap data from being integrated into major public
databases, which require that data deposited carry no conditions on
use ...” - Wellcome Trust, Sanger, Dec 2004
database protections based on jurisdiction
“sweat of the brow”
the list goes on ....
attribution = license
citation = norms
which one applies? which is best ﬁt?
“credit where credit is due”
“triggered by making of a copy”
does it apply to facts?
how to attribute? (papers, ontologies, data)
“in a manner speciﬁed by ...”
credit where credit is due
entrenched scientiﬁc norm
we shouldn’t use the law to make it
hard to do the wrong thing ...
is it true? can i trust it?
to what does it connect?
need for a legally accurate and
reducing or eliminating the need to make the
distinction of what’s protected
requires modular, standards based approach
calls for data providers to waive all rights
necessary for data extraction and re-use
requires provider place no additional
obligations (like share-alike) to limit
request behavior (like attribution) through
at best, we’re partially right.
at worst, we’re really wrong.
infrastructure for a data web
the digital commons
law + content + technology +
data without structure and annotation is a
data should ﬂow in an open, public, and
support recombination and reconﬁguration
into computer models, queryable by search
treated as public good
resist the temptation to treat
embrace the potential to treat instead
as a network resource