2. Current Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual scholarly artefacts, i.e. the data and
computational methods, which support the
scholarship, remain largely inaccessible --- Jon B.
Buckheit and David L. Donoho, WaveLab and reproducible
research, 1995
• Core scientific statements or assertions are intertwined and
hidden in the conventional scholarly narratives
• Lack of transparency, lack of credit for anything other than
“regular” dead tree publication
3. GigaSolution: deconstructing the paper
Publishing all the pieces:
• Data/software available
• Metadata/curation
• Interoperability
• Availability of workflows
• Transparent analyses
Data
Metadata
Methods
Analyses
4. How We Envision Research Publication
(Communicating Science)
Data Sets in
GigaDB
Analyses in
GigaGalaxy
Paper in
GigaScience
Open-access journal Data Publishing Platform
Data Analysis Platform
6. Example in Neuroscience
1. Neuroscience Data
are not typically
shared
2. For most papers: Data
AND Tools are not
typically made
available to the
reviewers
3. Journal Editors think
Reviewers will not
want to review data
GigaScience 2014, 3:3 doi:10.1186/2047-217X-3-3
7. Example in Neuroscience
• Neuroscience Data are not typically shared
• Author Dr. Stephen Eglen said: “One way of encouraging neuroscientists to
share their data is to provide some form of academic credit.”
• We hosted with a DOI: 366 recordings from 12 electrophysiology datasets
• GigaDB is included in Thompson Reuters Data Citation Index
• Data AND Tools are not typically made available to the reviewers
• We made manuscript, data and tools all available to the reviewers.
• We make sure to include reviewers who are able to properly assess the data
itself and rerun the tools
• To reduce burdens- we sometimes select a reviewer who ONLY looks at the
data.
• Journal Editors think Reviewers will not want to review data
• What Reviewer Dr. Thomas Wachtler said: “The paper by Eglen and
colleagues is a shining example of openness in that it enables replicating the
results almost as easily as by pressing a button.”
• What Reviewer Dr. Christophe Pouzat said: “In addition to making the
presented research trustworthy, the reproducible research paradigm
definitely makes the reviewers job more fun!”
8. Data Citation Really is a Major Incentive
On Weds this week- we released the genome sequence
from 3000 Rice strains (13.4 TB of data)
• These data were also deposited in NIH SRA repository
• So why did we do it too?
1. It is linked directly to the Data Paper that provides
details of data production, quality, and basic analysis
2. Authors were hesitant to release these data (a HUGE
community resource) prior to the analysis paper
publication (which, for 3000 strains… would take
years…). The opportunity to have these data citable
(and trackable) encouraged the authors and led to
their releasing these data and doing so in
collaboration with GigaScience’s Biocurator
The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7;
The 3000 Rice Genomes Project (2014) GigaScience Database. http://dx.doi.org/10.5524/200001
9. Consider Cross Journal Support
Competition is good…
….but sometimes we should collaborate
for the community good
• PLoS recent data deposition policies have led to
community concerns about feasibility.
• We support (and applaud) this …we have an even stricter
data deposition policy
• But- PLoS ONE received a submission that was a
comparative study of earthworm morphology and
anatomy using a 3D non-invasive imaging technique
called micro-computed tomography (or microCT) …And
there is no good place to put this
• These data are extremely complex, videos, multiple files-
with several folders of ~10 GB
10. Consider Cross Journal Support
• GigaScience and PLOS ONE collaborated. They published
the main article; we published a Data Note describing the
data itself and hosted all the data on GigaDB under
separate citation.
• With our Aspera Connection- reviewers could download
even the 10 TB folders in ~1/2 hour
• Reviewer Dr. Sarah Faulwetter noted the usefulness of
having these data available, saying: Instead of having to
go through the lengthy process of obtaining the physical
specimen from a museum, I can now download a fairly
accurate representation from the web.
Lenihan et al (2014). GigaScience, 3:6 http://dx.doi.org/10.1186/2047-217X-3-6; Lenihan, et al (2014): GigaScience Database.
http://dx.doi.org/10.5524/100092; Fernández et al (2014) PLOS ONE 9 (5) e96617 http://dx.doi.org/10.1371/journal.pone.0096617
11. Think about what you do… and what you can do…
• Promote- rather than inhibit- prepublication data sharing
• Promote Data Citation in the reference section
– incentivizes data release
– Makes it easier for reader to find
• Promote Data Sharing upon publication
– Consider your data release policies
• Form collaborations with repositories to aid authors in depositing
their work
– Identify community organizations with metadata standards
• Make data available for reviewers (author website, community
repositories, dryad and similar (your publisher?)
– at least do a sanity check
– Use “data reviewers”
No- this isn’t easy, but do what you can now
And work toward the rest
Evolve
12. It’s Time to Move Beyond
Dead Trees
18121665 1869
13. Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Rob Davidson, Data Scientist
Xiao (Jesse) Si Zhe, Database Developer
Amye Kenall, Journal Development Manager
editorial@gigasciencejournal.com
database@gigasciencejournal.com
@GigaScience
facebook.com/GigaScience
blogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
www.gigasciencejournal.com
www.gigadb.org