Article+Data+Tools
Reproducibility, Reuse, & Rapid Release
Laurie Goodman, PhD
Editor-in-Chief
GigaScience
Current Scientific Communication
Via Publication
• Scholarly articles are merely advertisement of scholarship .
The actual...
GigaSolution: deconstructing the paper
Publishing all the pieces:
• Data/software available
• Metadata/curation
• Interope...
How We Envision Research Publication
(Communicating Science)
Data Sets in
GigaDB
Analyses in
GigaGalaxy
Paper in
GigaScien...
It’s not just for ‘Omics anymore
Example in Neuroscience
1. Neuroscience Data
are not typically
shared
2. For most papers: Data
AND Tools are not
typically...
Example in Neuroscience
• Neuroscience Data are not typically shared
• Author Dr. Stephen Eglen said: “One way of encourag...
Data Citation Really is a Major Incentive
On Weds this week- we released the genome sequence
from 3000 Rice strains (13.4 ...
Consider Cross Journal Support
Competition is good…
….but sometimes we should collaborate
for the community good
• PLoS re...
Consider Cross Journal Support
• GigaScience and PLOS ONE collaborated. They published
the main article; we published a Da...
Think about what you do… and what you can do…
• Promote- rather than inhibit- prepublication data sharing
• Promote Data C...
It’s Time to Move Beyond
Dead Trees
18121665 1869
Thanks to:
Scott Edmunds, Executive Editor
Nicole Nogoy, Commissioning Editor
Peter Li, Lead Data Manager
Chris Hunter, Le...
Upcoming SlideShare
Loading in …5
×

Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Rapid Release

1,262 views

Published on

Laurie Goodman's talk at Society for Scholarly Publishing, Boston: Article+Data+Tools Reproducibility, Reuse, & Rapid Release 29th May 2014

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,262
On SlideShare
0
From Embeds
0
Number of Embeds
256
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Rapid Release

  1. 1. Article+Data+Tools Reproducibility, Reuse, & Rapid Release Laurie Goodman, PhD Editor-in-Chief GigaScience
  2. 2. Current Scientific Communication Via Publication • Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995 • Core scientific statements or assertions are intertwined and hidden in the conventional scholarly narratives • Lack of transparency, lack of credit for anything other than “regular” dead tree publication
  3. 3. GigaSolution: deconstructing the paper Publishing all the pieces: • Data/software available • Metadata/curation • Interoperability • Availability of workflows • Transparent analyses Data Metadata Methods Analyses
  4. 4. How We Envision Research Publication (Communicating Science) Data Sets in GigaDB Analyses in GigaGalaxy Paper in GigaScience Open-access journal Data Publishing Platform Data Analysis Platform
  5. 5. It’s not just for ‘Omics anymore
  6. 6. Example in Neuroscience 1. Neuroscience Data are not typically shared 2. For most papers: Data AND Tools are not typically made available to the reviewers 3. Journal Editors think Reviewers will not want to review data GigaScience 2014, 3:3 doi:10.1186/2047-217X-3-3
  7. 7. Example in Neuroscience • Neuroscience Data are not typically shared • Author Dr. Stephen Eglen said: “One way of encouraging neuroscientists to share their data is to provide some form of academic credit.” • We hosted with a DOI: 366 recordings from 12 electrophysiology datasets • GigaDB is included in Thompson Reuters Data Citation Index • Data AND Tools are not typically made available to the reviewers • We made manuscript, data and tools all available to the reviewers. • We make sure to include reviewers who are able to properly assess the data itself and rerun the tools • To reduce burdens- we sometimes select a reviewer who ONLY looks at the data. • Journal Editors think Reviewers will not want to review data • What Reviewer Dr. Thomas Wachtler said: “The paper by Eglen and colleagues is a shining example of openness in that it enables replicating the results almost as easily as by pressing a button.” • What Reviewer Dr. Christophe Pouzat said: “In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewers job more fun!”
  8. 8. Data Citation Really is a Major Incentive On Weds this week- we released the genome sequence from 3000 Rice strains (13.4 TB of data) • These data were also deposited in NIH SRA repository • So why did we do it too? 1. It is linked directly to the Data Paper that provides details of data production, quality, and basic analysis 2. Authors were hesitant to release these data (a HUGE community resource) prior to the analysis paper publication (which, for 3000 strains… would take years…). The opportunity to have these data citable (and trackable) encouraged the authors and led to their releasing these data and doing so in collaboration with GigaScience’s Biocurator The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7; The 3000 Rice Genomes Project (2014) GigaScience Database. http://dx.doi.org/10.5524/200001
  9. 9. Consider Cross Journal Support Competition is good… ….but sometimes we should collaborate for the community good • PLoS recent data deposition policies have led to community concerns about feasibility. • We support (and applaud) this …we have an even stricter data deposition policy • But- PLoS ONE received a submission that was a comparative study of earthworm morphology and anatomy using a 3D non-invasive imaging technique called micro-computed tomography (or microCT) …And there is no good place to put this • These data are extremely complex, videos, multiple files- with several folders of ~10 GB
  10. 10. Consider Cross Journal Support • GigaScience and PLOS ONE collaborated. They published the main article; we published a Data Note describing the data itself and hosted all the data on GigaDB under separate citation. • With our Aspera Connection- reviewers could download even the 10 TB folders in ~1/2 hour • Reviewer Dr. Sarah Faulwetter noted the usefulness of having these data available, saying: Instead of having to go through the lengthy process of obtaining the physical specimen from a museum, I can now download a fairly accurate representation from the web. Lenihan et al (2014). GigaScience, 3:6 http://dx.doi.org/10.1186/2047-217X-3-6; Lenihan, et al (2014): GigaScience Database. http://dx.doi.org/10.5524/100092; Fernández et al (2014) PLOS ONE 9 (5) e96617 http://dx.doi.org/10.1371/journal.pone.0096617
  11. 11. Think about what you do… and what you can do… • Promote- rather than inhibit- prepublication data sharing • Promote Data Citation in the reference section – incentivizes data release – Makes it easier for reader to find • Promote Data Sharing upon publication – Consider your data release policies • Form collaborations with repositories to aid authors in depositing their work – Identify community organizations with metadata standards • Make data available for reviewers (author website, community repositories, dryad and similar (your publisher?) – at least do a sanity check – Use “data reviewers” No- this isn’t easy, but do what you can now And work toward the rest Evolve
  12. 12. It’s Time to Move Beyond Dead Trees 18121665 1869
  13. 13. Thanks to: Scott Edmunds, Executive Editor Nicole Nogoy, Commissioning Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Rob Davidson, Data Scientist Xiao (Jesse) Si Zhe, Database Developer Amye Kenall, Journal Development Manager editorial@gigasciencejournal.com database@gigasciencejournal.com @GigaScience facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog Contact us: Follow us: www.gigasciencejournal.com www.gigadb.org

×