John Kratz
PhD in Biology from Columbia University
CLIR/DLF Postdoctoral Fellow,
started 12 months ago
Data publication and its importance for data
sharing, reuse, and preservation
Open
certain data should be freely available to
everyone to use & republish as they wish,
without restrictions from copyright, patents or
other mechanisms of control
Data
From Flickr by Ninja M.
Available | Citable | Trustworthy
• Publish means to “make public”.
• You should not have to email the author.
• The data doesn’t have to be open access.
“Email me!”
CC-0 on web
Best practice:
data in a trusted community repository with a
machine-readable license/waiver
Repositories
for data
General content
Non-institutional
Publishers/for-profits
Other
Institutional
Discipline-specific
Repository choices…
Institutional
Discipline-specific
• All data associated
with a paper
• Tells a story
• Clearinghouse for
researcher’s works
• Some of data for a
given paper
• Discoverable
• Integrated systems
• Collection policies
?
Both
Which should a
researcher use?
Which is more
important?
Depends
Repository choices…
Five-element citation: author, year, title,
publisher, identifier
Available | Citable | Trustworthy
Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in
adaptive evolution. Theoretical Population Biology. Published in Dryad.
doi:10.5061/dryad.j8n0p7vc
More later
Available | Citable | Trustworthy
From Flickr by Percival Lowell
For articles: peer review
For data: ?
peer review?validation?
Available | Citable | Trustworthy
Peer review of data
• Experts
• Users
• Community
• Use = validation
Who?
1. Data as supplemental material
Data published alongside a traditional journal article.
Available + citable. Review varies.
Potential issues with long-term availability.
What does a data
publication look like?
From Flickr by subsetsum
2. Data paper:
Data + descriptive “data paper”
Standalone journals: Nature Scientific Data, Geoscience Data Journal,
Ecological Archives
OR
Journals that publish data papers: GigaScience, F1000 Research,
Internet Archaeology
What does a data
publication look like?
From Flickr by subsetsum
3. Standalone data
Data published without a related journal article.
Rich metadata (structured or unstructured)
• Institutional repository
• Open Context
• NASA PDS Peer Review Data
• figshare (but no validation)
What does a data
publication look like?
From Flickr by subsetsum
…“publication” insinuates that we are
beholden to the current broken system of
journal publication. The word itself has
too much baggage.
…bureaucrats, funders, and institutions
have a familiarity with the word and it will
ensure the success of the data
publication goals, regardless of whether
we break the mold in the process.
þ
ý
http://datapub.cdlib.org/2012/03/06/data-publication-an-introduction/
Identifiers & Data Citation
Allows readers to find data products
Get credit for data and publications
Promotes reproducibility
Example:
Sidlauskas, B. 2007. Data from: Testing for unequal rates of
morphological diversification in the absence of a detailed
phylogeny: a case study from characiform fishes. Dryad Digital
Repository. doi:10.5061/dryad.20
Identifiers
• String of characters
• Unique
• Linked to a digital
object
DOI: Digital object
identifier
From Flickr by Plbmak
DOIs
ARKs
Strict
metadata
requirements
Flexible
metadata
guidelines
From
the
scholarly
communication
community
From
the
archives
and
museums
community
Established
“brand
name”
Option-‐rich,
open
source
$$$
$
Comparing two…